Logistic Regression

Figure 3.2: Logistic Function
Image fig_logit

There exists a family of linear models that relate the dependent variable to the explanatory variables through a nonlinear function, known as generalized linear models. Logistic regression falls within this class of models, specifically in the case where the variable $y$ is dichotomous, meaning it can only take on values $0$ or $1$. By its nature, this type of problem holds significant importance in classification tasks.

In the case of binary problems, it is possible to define the probability of success and failure.

\begin{displaymath}
\begin{array}{l}
P[Y=1\vert\mathbf{x}]=p(\mathbf{x}) \\
P[Y=0\vert\mathbf{x}]=1-p(\mathbf{x}) \\
\end{array}\end{displaymath} (3.96)

The response of a linear predictor of the form

\begin{displaymath}
y' = \boldsymbol\beta \cdot \mathbf{x} + \varepsilon
\end{displaymath} (3.97)

is not constrained between $0$ and $1$, thus it is unsuitable for this purpose. It is necessary to associate the response of the linear predictor with the response of a certain function $g$, a probability function $p(\mathbf{x})$
\begin{displaymath}
g(p(\mathbf{x}) ) = \boldsymbol\beta \cdot \mathbf{x} + b
\end{displaymath} (3.98)

where $g(p)$, the mean function, is a nonlinear function defined over $[0,1]$. $g(p)$ must be invertible, and the inverse $g^{-1}(y')$ is the link function.

A widely used model for the function $g(p)$ is the logit function defined as:

\begin{displaymath}
logit(p) = \log \frac{p}{1-p} = \boldsymbol\beta \cdot \mathbf{x}
\end{displaymath} (3.99)

The function $\frac{p}{1-p}$, since it represents how many times success is greater than failure, is referred to as the odds-ratio. Consequently, the function (3.99) represents the logarithm of the probability of an event occurring relative to the probability of the same event not occurring (log-odds).

Its inverse function exists and is given by

\begin{displaymath}
\E[Y\vert\mathbf{x}] = p( \mathbf{x} ) = \frac{ e^{\boldsymb...
...dot \mathbf{x}} }{ 1 + e^{\boldsymbol\beta \cdot \mathbf{x}} }
\end{displaymath} (3.100)

`and it is the logistic function.

The maximum likelihood method in this case does not coincide with the least squares method but with

\begin{displaymath}
\mathcal{L}(\boldsymbol\beta) = \prod_{i=1}^{n} f(y_i\vert \...
...y_i} ( \mathbf{x}_i) \left(1 - p^{y_i} ( \mathbf{x}_i) \right)
\end{displaymath} (3.101)

from which the log-likelihood function
\begin{displaymath}
\log \mathcal{L}(\boldsymbol\beta) = \sum_{i=1}^{n} y_i (\bo...
...log \left( 1 + e^{\boldsymbol\beta \cdot \mathbf{x}_i} \right)
\end{displaymath} (3.102)

is derived. The maximization of this function, through iterative techniques, allows for the estimation of the parameters $\boldsymbol\beta$.

Paolo medici
2025-10-22