SSE Calculation for a Clustering

By Debasis Das: (17-Feb-2021)

How to manually calculate the SSE for a Clustering.

Clustering such as KMeans has a inertia_ function that gives the total SSE for the clustering, however clustering such as DBScan lacks an inertia_ function and in this sample code we are going to see how we can derive the SSE number for a clustering

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

A = [[10,10],
[12,12],
[10,11],
[15,17],
[13.5,15],
[14.5,19],
[13.5,14.5]]

df = pd.DataFrame(A)
number_clusters = 2
pca = PCA(n_components=2) 
pca.fit(df)
x_pca = pca.fit_transform(df)
km = KMeans(
    n_clusters=number_clusters, init='random',
    n_init=10, max_iter=300, 
    tol=1e-04, random_state=0
).fit(x_pca)


kmeans_labels = km.labels_
print("The KMeans Labels are = ",kmeans_labels)
print("The Kmeans SSE using inertia_ function =", km.inertia_)

manual_SSE = 0
for i in range(number_clusters):
    cluster = x_pca[km.labels_ == i]
    if len(cluster) > 0:
            clusterMean = cluster.mean(axis = 0)
            manual_SSE += ((cluster - clusterMean) ** 2).sum()

print("The KMeans SSE using manual calculation = ",manual_SSE)  


# Clustering such as DBScan doesnot have a inertia function 
# and in case one needs to calculate the SSE for a DBScan clustering, 
# we can use the manual method of SSE Calculation

Output: for the given dataset

The KMeans Labels are =  [0 0 0 1 1 1 1]
The Kmeans SSE using inertia_ function = 19.04166666666666
The KMeans SSE using manual calculation =  19.041666666666664

As you can see the SSE calculated matches the SSE given by the inertia_ function.

You can use the same manual approach of using the DBScan Cluster and DBScan Labels to come up with the SSE for DBScan Clustering

Posted in Data Mining, Data Science Tagged with: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*