Implementing KMeans Clustering in Python

Created By: Debasis Das(23-Feb-2021)

In this post we will see how KMeans clustering works by using sklearn KMeans Cluster and by manually writing the KMeans Clustering logic and comparing the results.

The manual approach code is a sample to demonstrate the iterative process of starting with random centroids and updating the cluster labels for each point based on its proximity to the centroids and then updating the centroids as the mean of all the points in the cluster for the given centroid and repeat the process (update cluster label, update centroids) for a given number of iterations

KMeans is a very simple algorithm to implement, has a downside of being very sensitive to noise. KMeans will try to assign a cluster to noise points as well. There are known challenges such as identifying the initial cluster centroids and initial cluster centroid selection has an effect on how the clusters will be formed.

Posted in Data Mining, Data Science, Python Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *