Logistic Regression

**Figure 3.2:** Logistic Function

There exists a family of linear models that relate the dependent variable to the explanatory variables through a nonlinear function, known as generalized linear models. Logistic regression falls within this class of models, specifically in the case where the variable $y$ is dichotomous, meaning it can only take on values $0$ or $1$ . By its nature, this type of problem holds significant importance in classification tasks.

In the case of binary problems, it is possible to define the probability of success and failure.

$\begin{displaymath} \begin{array}{l} P[Y=1\vert\mathbf{x}]=p(\mathbf{x}) \\ P[Y=0\vert\mathbf{x}]=1-p(\mathbf{x}) \\ \end{array}\end{displaymath}$

(3.96)

The response of a linear predictor of the form

$\begin{displaymath} y' = \boldsymbol\beta \cdot \mathbf{x} + \varepsilon \end{displaymath}$

(3.97)

is not constrained between $0$

and

, thus it is unsuitable for this purpose. It is necessary to associate the response of the linear predictor with the response of a certain function $g$

, a probability function $p(\mathbf{x})$

$\begin{displaymath} g(p(\mathbf{x}) ) = \boldsymbol\beta \cdot \mathbf{x} + b \end{displaymath}$

(3.98)

where

, the mean function, is a nonlinear function defined over $[0,1]$

must be invertible, and the inverse $g^{-1}(y')$ is the link function.

A widely used model for the function $g(p)$ is the logit function defined as:

$\begin{displaymath} logit(p) = \log \frac{p}{1-p} = \boldsymbol\beta \cdot \mathbf{x} \end{displaymath}$

(3.99)

The function $\frac{p}{1-p}$ , since it represents how many times success is greater than failure, is referred to as the odds-ratio. Consequently, the function (3.99) represents the logarithm of the probability of an event occurring relative to the probability of the same event not occurring (log-odds).

Its inverse function exists and is given by

$\begin{displaymath} \E[Y\vert\mathbf{x}] = p( \mathbf{x} ) = \frac{ e^{\boldsymb... ...dot \mathbf{x}} }{ 1 + e^{\boldsymbol\beta \cdot \mathbf{x}} } \end{displaymath}$

(3.100)

`and it is the logistic function.

The maximum likelihood method in this case does not coincide with the least squares method but with

$\begin{displaymath} \mathcal{L}(\boldsymbol\beta) = \prod_{i=1}^{n} f(y_i\vert \... ...y_i} ( \mathbf{x}_i) \left(1 - p^{y_i} ( \mathbf{x}_i) \right) \end{displaymath}$

(3.101)

from which the log-likelihood function

$\begin{displaymath} \log \mathcal{L}(\boldsymbol\beta) = \sum_{i=1}^{n} y_i (\bo... ...log \left( 1 + e^{\boldsymbol\beta \cdot \mathbf{x}_i} \right) \end{displaymath}$

(3.102)

is derived. The maximization of this function, through iterative techniques, allows for the estimation of the parameters $\boldsymbol\beta$ .

Paolo medici
2025-10-22