# Support Vector Machine (SVM)

Anh-Thi Dinh

## What's the idea of SVM?

SVM (also called Maximum Margin Classifier) is an algorithm that takes the data as an input and outputs a line/hyperplane that separates those classes if possible.
Suppose that we need to separate two classes of a dataset. The task is to find a line to separate them. However, there are many lines which can do that (countless number of lines). How can we choose the best one?

## Using SVM with kernel trick

Most of the time, we cannot separate classes in the current dataset easily (not linearly separable data). We need to use kernel trick first (transform from the current dimension to a higher dimension) and then we use SVM. These classes are not linearly separable.
A kernel is a dot product in some feature space:
It also measures the similarity between two points and .
We have some popular kernels,
• Gaussian kernel (or Radial Basic Function -- RBF): . It's used the most. We use kernel = 'rbf' (default) with keyword gamma for (must be greater than 0) in sklearn.svm.SVM.
• Exponential kernel: .
• Polynomial kernel: . We use kernel = 'poly' with keyword degree for and coef0 for in sklearn.svm.SVM. It's more popular than RBF in NLP. The most common degree is $$d = 2$$ (quadratic), since larger degrees tend to overfit on NLP problems. (ref)
• Hybrid kernel: .
• Sigmoidal: . We use kernel = 'sigmoid' with keyword coef0 for $$r$$ in sklearn.svm.SVM.
We can also define a custom kernel thanks to this help.
Choose whatever kernel performs best on cross-validation data. Andrew NG said in his ML course.

• Compared to both logistic regression and NN, a SVM sometimes gives a cleaner way of learning non-linear functions.
• SVM is better than NN with 1 layer (Perceptron Learning Algorithm) thanks to the largest margin between 2 classes.
• Accurate in high-dimensional spaces & memory effecient.
• Good accuracy and perform faster prediction compared to Naïve Bayes algorithm. (ref)
• Prone to overfitting: if number of features are larger than number of samples.
• Don't provide probability estimation.
• Not efficient if your data is very big!
• It works poorly with overlapping classes
• Sensitive to the type of kernel used.

## SVM used for?

Some points: (ref)
• Classification, regression and outliers detection.
• Face detection.
• Text and hypertext categorization.
• Detecting spam.
• Classification of images.
• Bioinformatics.

## Using SVM with Scikit-learn

1from sklearn.svm import SVC
2
3svc = SVC(kernel='linear') # default = 'rbf' (Gaussian kernel)
4# other kernels: poly, sigmoid, precomputed or a callable
5
6svc = svc.fit(X, y)
7svc.predict(X)
8
9# gives the support vectors
10svc.support_vectors_
There are other parameters of sklearn.svm.SVM.
⚠️
In the case of linear SVM, we can also use sklearn.svm.LinearSVC. It's similar to sklearn.svm.SVG with kernel='linear' but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. (ref)

### Meaning of some parameters

The Regularization parameter (C, default C=1.0): if C is larger, hyperplane has smaller margin but do a better job of classification and otherwise. This is how you can control the trade-off between decision boundary and misclassification term.
• Higher values of C → a higher possibility of overfitting, the softmargin SVM is equivalent to the hard-margin SVM.
• Lower values of C → a higher possibility of underfitting. We admit misclassifications in the training data
We use this in the case of not linearly separable data; It's also called soft-margin linear SVM.
Gamma (gamma, default gamma='auto' which uses 1/n_features): determine the number of points to construct the hyperplane.

## References

• Chris Albon -- Notes about Support Vector Machines.