SVM (also called Maximum Margin Classifier) is an algorithm that takes the data as an input and outputs a line/hyperplane that separates those classes if possible.
Suppose that we need to separate two classes of a dataset. The task is to find a line to separate them. However, there are many lines which can do that (countless number of lines). How can we choose the best one?
Most of the time, we cannot separate classes in the current dataset easily (not linearly separable data). We need to use kernel trick first (transform from the current dimension to a higher dimension) and then we use SVM. These classes are not linearly separable.
A kernel is a dot product in some feature space:
It also measures the similarity between two points and .
We have some popular kernels,
- Linear kernel: . We use
kernel = 'linear'in
sklearn.svm.SVM. Linear kernels are rarely used in practice.
- Gaussian kernel (or Radial Basic Function -- RBF): . It's used the most. We use
kernel = 'rbf'(default) with keyword
gammafor (must be greater than 0) in
- Exponential kernel: .
- Polynomial kernel: . We use
kernel = 'poly'with keyword
sklearn.svm.SVM. It's more popular than RBF in NLP. The most common degree is $$d = 2$$ (quadratic), since larger degrees tend to overfit on NLP problems. (ref)
- Hybrid kernel: .
- Sigmoidal: . We use
kernel = 'sigmoid'with keyword
coef0for $$r$$ in
We can also define a custom kernel thanks to this help.
Choose whatever kernel performs best on cross-validation data. Andrew NG said in his ML course.
- Compared to both logistic regression and NN, a SVM sometimes gives a cleaner way of learning non-linear functions.
- SVM is better than NN with 1 layer (Perceptron Learning Algorithm) thanks to the largest margin between 2 classes.
- Accurate in high-dimensional spaces & memory effecient.
- Good accuracy and perform faster prediction compared to Naïve Bayes algorithm. (ref)
- Prone to overfitting: if number of features are larger than number of samples.
- Don't provide probability estimation.
- Not efficient if your data is very big!
- It works poorly with overlapping classes
- Sensitive to the type of kernel used.
Some points: (ref)
- Classification, regression and outliers detection.
- Face detection.
- Text and hypertext categorization.
- Detecting spam.
- Classification of images.
1from sklearn.svm import SVC 2 3svc = SVC(kernel='linear') # default = 'rbf' (Gaussian kernel) 4# other kernels: poly, sigmoid, precomputed or a callable 5 6svc = svc.fit(X, y) 7svc.predict(X) 8 9# gives the support vectors 10svc.support_vectors_
There are other parameters of
In the case of linear SVM, we can also use
sklearn.svm.LinearSVC. It's similar to
kernel='linear'but implemented in terms of
libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. (ref)
The Regularization parameter (
Cis larger, hyperplane has smaller margin but do a better job of classification and otherwise. This is how you can control the trade-off between decision boundary and misclassification term.
- Higher values of
C→ a higher possibility of overfitting, the softmargin SVM is equivalent to the hard-margin SVM.
- Lower values of
C→ a higher possibility of underfitting. We admit misclassifications in the training data
We use this in the case of not linearly separable data; It's also called soft-margin linear SVM.
1/n_features): determine the number of points to construct the hyperplane.
- Scikit-learn -- SVM official doc.
- Simplilearn -- How Support Vector Machine Works \| SVM In Machine Learning.
- Tiep Vu -- Bài 19: Support Vector Machine.
- Jeremy Kun -- Formulating the Support Vector Machine Optimization Problem.
- Tiep Vu -- Bài 20: Soft Margin Support Vector Machine.
- Tiep Vu - Bài 21: Kernel Support Vector Machine.
- Alexander Statnikov, Douglas Hardin, Isabelle Guyon, Constantin F. Aliferis -- A Gentle Introduction to Support Vector Machines in Biomedicine.
- Jake VanderPlas -- In-Depth: Support Vector Machines. -- Example: How to code and illustrate hyperplane and support vectors in Python?
- Chris Albon -- Notes about Support Vector Machines.