Sometimes we need to "compress" our data to speed up algorithms or to visualize data. One way is to use dimensionality reduction which is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. We can think of 2 approaches:
- Feature selection: find a subset of the input variables.
- Feature projection (also Feature extraction): transforms the data in the high-dimensional space to a space of fewer dimensions. PCA is one of the methods following this approach.
Questions: How can we choose the green arrows like in Figure 1 and 2 (their directions and their magnitudes)?
From a data points, there are many ways of projections, for examples,
Intuitively, the green line is better with more separated points. But how can we choose it "mathematically" (precisely)? We need to know about:
- Mean: finds the most balanced point in the data.
- Variance: measures the spread of data from the mean. However, variance is not enough. There are many different ways in that we get the same variance.
- Covariance: indicates the direction in that data are spreading.
- Subtract the mean to move to the original axes.
- From the original data (a lot of features ), we construct a covariance matrix .
- Find the eigenvalues and correspondent eigenvectors of that matrix (we call them eigenstuffs). Choose couples and (the highest eigenvalues) and we get a reduced matrix .
- Projection original data points to the -dimensional plane created based on these new eigenstuffs. This step creates new data points on a new dimensional space ().
- Now, instead of solving the original problem ( features), we only need to solve a new problem with features ().
1from sklearn.decomposition import PCA
2
3s = np.array([...])
4pca = PCA(n_components=150, whiten=True, random_state=42)
5# pca.fit(s)
6s1 = pca.fit_transform(s)
7
8print (pca.components_) # eigenvectors
9print (pca.explained_variance_) # eigenvalues
Some notable components (see full):
pca.fit(X)
: only fitX
(and then we can usepca
for other operations).
pca.fit_transform(X)
: Fit the model withX
and apply the dimensionality reduction onX
(from(n_samples, n_features)
to(n_samples, n_components)
).
pca.inverse_transform(s1)
: transforms1
back to original data space (2D) - not back tos
!!!
pca1.mean_
: mean point of the data.
pca.components_
: eigenvectors (n_components
vectors).
pca.explained_variance_
: eigenvalues. It's also the amount of retained variance which is corresponding to each components.
pca.explained_variance_ratio_
: the percentage in that variance is retained if we consider on each component.
Some notable parameters:
n_components=0.80
: means it will return the Eigenvectors that have the 80% of the variation in the dataset.
Remark!
When choosing the number of principal components (), we choose to be the smallest value so that for example, of variance, is retained. (ref)
In Scikit-learn, we can use
pca.explained_variance_ratio_.cumsum()
. For example, n_components = 5
and we have,1[0.32047581 0.59549787 0.80178824 0.932976 1.]
then we know that with , we would retain of the variance.
1cumsum = np.cumsum(pca.explained_variance_ratio_)
2d = np.argmax(cumsum >= 0.95) + 1
1pca = PCA(n_components=0.95) # the ratio of variance you wish to preserve
Whitening makes the features:
- less correlated with each other,
- all features have the same variance (or, unit component-wise variances).
- Luis Serrano -- [Video] Principal Component Analysis (PCA). It's very intuitive!
- Stats.StackExchange -- Making sense of principal component analysis, eigenvectors & eigenvalues.
- Scikit-learn -- PCA official doc.
- Jake VanderPlas -- In Depth: Principal Component Analysis.
- Tutorial 4 Yang -- Principal Components Analysis.
- Andrew NG. -- My raw note of the course "Machine Learning" on Coursera.
- Shankar Muthuswamy -- Facial Image Compression and Reconstruction with PCA.
- UFLDL - Stanford -- PCA Whitening.