Despite the Soft Margin, some problems are intrinsically non-separable in the feature space. However, from the understanding of the problem, it is possible to infer that a nonlinear transformation transforms the input feature space
into the feature space
where the separating hyperplane allows for better discrimination of the categories. The discriminant function in the space
is given by
| (4.30) |
To enable separation, the space is typically of higher dimensions than the space
. This increase in dimensions leads to a rise in the computational complexity of the problem and an increased demand for resources. Kernel methods address this issue.
The vector is a linear combination of the training samples (the support vectors in the case of hard margin):
| (4.31) |
At the time of evaluating the discriminant function, it is therefore necessary to utilize the support vectors (at least those with a non-negligible associated parameter ). In fact, SVM with a kernel identifies certain samples from the training set as useful information to determine how close the evaluation sample in question is to them.
The bias is calculated instantaneously from equation (4.32), averaging
| (4.33) |
The most commonly used kernels, due to their ease of evaluation, are Gaussian kernels in the form
| (4.34) |
| (4.35) |
The use of kernel functions, combined with the ability to precalculate all combinations
, allows for the establishment of a common interface between linear and nonlinear training, effectively maintaining the same level of performance.
It is noteworthy that the prediction
takes the form of
| (4.36) |
Paolo medici