## What?

The key idea is that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points.

### DBSCAN

DescriptionDensity-based spatial clustering of applications with noise.

### HDBSCAN

High DBSCAN.

## When?

- We are not sure the number of clusters (like in KMeans)
- There are outliers or noises in data.
- Arbitrary cluster’s shape.

## In Code

### DBSCAN with Scikit-learn

```
from sklearn.cluster import DBSCAN
clr = DBSCAN(eps=3, min_samples=2)
```

```
clr.fit(X)
clr.predict(X)
```

```
# or
clr.fit_predict(X)
```

Parameters (others):

`min_samples`

: min number of samples to be called “dense”`eps`

: max distance between 2 samples to be in the same cluster. Its unit/value based on the unit of data.- Higher
`min_samples`

+ lower`eps`

indicates higher density necessary to form a cluster.

Components:

`clustering.labels_`

: clusters’ labels.

### HDBSCAN

```
from hdbscan import HDBSCAN
```