Linear Regression

Let

\begin{displaymath}
y = mx +q + \varepsilon
\end{displaymath} (3.71)

be the equation of the line expressed in explicit form with the measurement error fully incorporated along the $y$ axis. With the error along the $y$ axis, the cost function to be minimized is
\begin{displaymath}
S = \frac{1}{2n} \sum_{i=1}^{n} { \left( m x_i + q - y_i \right)^2 }
\end{displaymath} (3.72)

.

The solution to the problem is the point where the gradient of $S$ in $m$ and $q$ is zero, that is:

\begin{displaymath}
\begin{array}{l}
m = \dfrac{ \bar{(xy)}-\bar{x}\bar{y}}{\b...
...{\text{var}(x)} \\
q = - m \bar{x} + \bar{y} \\
\end{array}\end{displaymath} (3.73)

With $\bar{x}$, the mean value of the samples $x_i$ (using the same formalism, other quantities are also indicated). The line passes through the point $(\bar{x},\bar{y})$, the centroid of the distribution.

It is easy to modify this result if one wishes to minimize the deviation along the $x$ instead of along the $y$, or to represent the equation of the line in implicit form.



Paolo medici
2025-10-22