The key idea is that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points.
DescriptionDensity-based spatial clustering of applications with noise.
- We are not sure the number of clusters (like in KMeans)
- There are outliers or noises in data.
- Arbitrary cluster’s shape.
DBSCAN with Scikit-learn
from sklearn.cluster import DBSCAN clr = DBSCAN(eps=3, min_samples=2)
# or clr.fit_predict(X)
min_samples: min number of samples to be called “dense”
eps: max distance between 2 samples to be in the same cluster. Its unit/value based on the unit of data.
epsindicates higher density necessary to form a cluster.
clustering.labels_: clusters’ labels.
from hdbscan import HDBSCAN