By Debasis Das: (17-Feb-2021)
How to manually calculate the SSE for a Clustering.
Clustering such as KMeans has a inertia_ function that gives the total SSE for the clustering, however clustering such as DBScan lacks an inertia_ function and in this sample code we are going to see how we can derive the SSE number for a clustering
import pandas as pd from sklearn.cluster import KMeans from sklearn.decomposition import PCA A = [[10,10], [12,12], [10,11], [15,17], [13.5,15], [14.5,19], [13.5,14.5]] df = pd.DataFrame(A) number_clusters = 2 pca = PCA(n_components=2) pca.fit(df) x_pca = pca.fit_transform(df) km = KMeans( n_clusters=number_clusters, init='random', n_init=10, max_iter=300, tol=1e-04, random_state=0 ).fit(x_pca) kmeans_labels = km.labels_ print("The KMeans Labels are = ",kmeans_labels) print("The Kmeans SSE using inertia_ function =", km.inertia_) manual_SSE = 0 for i in range(number_clusters): cluster = x_pca[km.labels_ == i] if len(cluster) > 0: clusterMean = cluster.mean(axis = 0) manual_SSE += ((cluster - clusterMean) ** 2).sum() print("The KMeans SSE using manual calculation = ",manual_SSE) # Clustering such as DBScan doesnot have a inertia function # and in case one needs to calculate the SSE for a DBScan clustering, # we can use the manual method of SSE Calculation
Output: for the given dataset
The KMeans Labels are = [0 0 0 1 1 1 1] The Kmeans SSE using inertia_ function = 19.04166666666666 The KMeans SSE using manual calculation = 19.041666666666664
As you can see the SSE calculated matches the SSE given by the inertia_ function.
You can use the same manual approach of using the DBScan Cluster and DBScan Labels to come up with the SSE for DBScan Clustering
Leave a Reply