K-Means is the most popular clustering method any learner should know. In this note, we will understand the idea of KMeans and how to use it with Scikit-learn. Besides that, we also learn about its variants (K-medois, K-modes, K-medians).

- Randomly choose centroids ().

- Go through each example and assign them to the nearest centroid (assign class of that centroid).

- Move each centroid (of each class) to the average of data points having the same class with the centroid.

- Repeat 2 and 3 until convergence.

Using "Elbow" method to choose the number of clusters .

- A type of
**Partitioning clustering**.

- K-means is sensitive to outliers →
**K-medoids**clustering or**PAM**(Partitioning Around Medoids) is less sensitive to outliers (ref)

```
1from sklearn.cluster import KMeans
2kmeans = KMeans(n_clusters=10, random_state=0) # default k=8
```

```
1kmeans.fit(X)
2kmeans.predict(X)
```

```
1# or
2kmeans.fit_predict(X)
```

Some notable parameters (see full):

`max_iter`

: Maximum number of iterations of the k-means algorithm for a single run.

`kmeans.labels_`

: show labels of each point.

`kmeans.cluster_centers_`

: cluster centroids.

**Advantages**: (ref)- It is simple to understand and easy to implement.
- K-Medoid Algorithm is fast and converges in a fixed number of steps.
- PAM is less sensitive to outliers than other partitioning algorithms.

**Disavdvantages**: (ref)- The main disadvantage of K-Medoid algorithms is that it is not suitable for clustering non-spherical (arbitrary shaped) groups of objects. This is because it relies on minimizing the distances between the non-medoid objects and the medoid (the cluster centre) – briefly, it uses compactness as clustering criteria instead of connectivity.
- It may obtain different results for different runs on the same dataset because the first k medoids are chosen randomly.

**Different from K-Means**:*K-Means*:- Final centers no need to be points in data.
- Measure generally requires Euclidean distance.
- Sensitive to outliers.
*K-Medoids*:- Final Centers is actual points in data. They're called
*medoids*or*exemplars*. - Measures can be arbitrarily dissimilar.
- Robust to outliers.

**The idea**:

- (Like KMeans'): choose randomly points (no need to be a point in data)

- (Like KMeans'): assign labels to points based on chosen points in step 1.

- (Different from KMeans):

- (Like KMeans'): repeat the steps.

It's better than Elbow method to choose the number of clusters $k$.

👉 Check section “Silhouette analysis” in Metrics for clustering.

**Luis Serrano**-- [Video] Clustering: K-means and Hierarchical.

**Andrew NG.**-- My raw note of the course "Machine Learning" on Coursera.