How can I divide data into multiple clusters using agglomerative clustering in Python with NumPy and Matplotlib? Ask Question
By   Shahzad Anjum     21-Dec-2024    1

To divide data into multiple clusters using agglomerative clustering in Python with NumPy and Matplotlib, you can follow these steps:

Step 1: Import the necessary libraries

python

Copy code

import numpy as np

from matplotlib import pyplot as plt

from scipy.cluster.hierarchy import dendrogram, linkage

 

Step 2: Prepare the data Create an array of data points that you want to cluster. The data should be stored in a NumPy array, where each row represents a data point and each column represents a feature.

Step 3: Perform hierarchical clustering Use the linkage function from scipy.cluster.hierarchy to perform agglomerative clustering on the data. The linkage function takes the data array as input and returns a linkage matrix, which contains information about the hierarchical clustering.

python

Copy code

Z = linkage(data, 'ward')

 

In this example, 'ward' linkage criterion is used, which minimizes the variance within each cluster. You can also experiment with other linkage criteria like 'single', 'complete', or 'average'.

Step 4: Plot the dendrogram Use the dendrogram function from scipy.cluster.hierarchy to visualize the hierarchical clustering result as a dendrogram.

python

Copy code

plt.figure(figsize=(10, 5))

dendrogram(Z)

plt.xlabel('Data Points')

plt.ylabel('Distance')

plt.title('Dendrogram')

plt.show()

 

Step 5: Cut the dendrogram to form clusters Based on the dendrogram, you can decide on the number of clusters you want to create. Use the fcluster function from scipy.cluster.hierarchy to cut the dendrogram at a specific distance threshold and form clusters.

python

Copy code

from scipy.cluster.hierarchy import fcluster

max_d = 10  # adjust this threshold based on the dendrogram

clusters = fcluster(Z, max_d, criterion='distance')

 

Here, max_d is the maximum distance threshold for forming clusters. You can adjust this value to control the number of clusters.

Step 6: Visualize the clusters Finally, you can use Matplotlib to plot the data points with different colors representing different clusters.

python

Copy code

plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='viridis')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.title('Agglomerative Clustering')

plt.show()

 

Putting it all together, here's a complete example:

python

Copy code

import numpy as np

from matplotlib import pyplot as plt

from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

 

# Step 1: Import the necessary libraries

 

# Step 2: Prepare the data

data = np.array([[1, 2], [2, 3], [3, 3], [6, 7], [7, 8], [8, 8]])

 

# Step 3: Perform hierarchical clustering

Z = linkage(data, 'ward')

 

# Step 4: Plot the dendrogram

plt.figure(figsize=(10, 5))

dendrogram(Z)

plt.xlabel('Data Points')

plt.ylabel('Distance')

plt.title('Dendrogram')

plt.show()

 

# Step 5: Cut the dendrogram to form clusters

max_d = 4

clusters = fcluster(Z, max_d, criterion='distance')

 

# Step 6: Visualize the clusters

plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='viridis')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.title('Agglomerative Clustering')

plt.show()

 

You can adjust the data, distance threshold

Solutions


10357
Copyright Future Minutes © 2015- 2024 All Rights Reserved.   Terms of Service  |   Privacy Policy |  Contact US|  Pages|  Whats new?
Update on: Dec 20 2023 05:10 PM