分層聚類:結構化區域與非結構化區域?
通過示例構建一個swiss roll數據集,并在其位置上運行分層聚類。
獲取更多信息, 看層次聚類
在第一步中,分層聚類是在沒有結構連接約束的情況下執行的,并且只基于距離,而在第二步中,聚類被限制在k-最近鄰圖上:這是一種結構優先的分層聚類。
一些在沒有連接約束的情況下學習到的聚類不尊重 swiss roll的結構,并且跨越流形的不同褶皺進行擴展。相反,當相反的連接約束時,聚類形成了swiss roll的一個很好的部分。


Compute unstructured hierarchical clustering...
Elapsed time: 0.07s
Number of points: 1500
Compute structured hierarchical clustering...
Elapsed time: 0.13s
Number of points: 1500
# Authors : Vincent Michel, 2010
# Alexandre Gramfort, 2010
# Gael Varoquaux, 2010
# License: BSD 3 clause
print(__doc__)
import time as time
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as p3
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_swiss_roll
# #############################################################################
# Generate data (swiss roll dataset)
n_samples = 1500
noise = 0.05
X, _ = make_swiss_roll(n_samples, noise=noise)
# Make it thinner
X[:, 1] *= .5
# #############################################################################
# Compute clustering
print("Compute unstructured hierarchical clustering...")
st = time.time()
ward = AgglomerativeClustering(n_clusters=6, linkage='ward').fit(X)
elapsed_time = time.time() - st
label = ward.labels_
print("Elapsed time: %.2fs" % elapsed_time)
print("Number of points: %i" % label.size)
# #############################################################################
# Plot result
fig = plt.figure()
ax = p3.Axes3D(fig)
ax.view_init(7, -80)
for l in np.unique(label):
ax.scatter(X[label == l, 0], X[label == l, 1], X[label == l, 2],
color=plt.cm.jet(np.float(l) / np.max(label + 1)),
s=20, edgecolor='k')
plt.title('Without connectivity constraints (time %.2fs)' % elapsed_time)
# #############################################################################
# Define the structure A of the data. Here a 10 nearest neighbors
from sklearn.neighbors import kneighbors_graph
connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)
# #############################################################################
# Compute clustering
print("Compute structured hierarchical clustering...")
st = time.time()
ward = AgglomerativeClustering(n_clusters=6, connectivity=connectivity,
linkage='ward').fit(X)
elapsed_time = time.time() - st
label = ward.labels_
print("Elapsed time: %.2fs" % elapsed_time)
print("Number of points: %i" % label.size)
# #############################################################################
# Plot result
fig = plt.figure()
ax = p3.Axes3D(fig)
ax.view_init(7, -80)
for l in np.unique(label):
ax.scatter(X[label == l, 0], X[label == l, 1], X[label == l, 2],
color=plt.cm.jet(float(l) / np.max(label + 1)),
s=20, edgecolor='k')
plt.title('With connectivity constraints (time %.2fs)' % elapsed_time)
plt.show()
腳本的總運行時間:(0分0.524秒)
Download Python source code:plot_ward_structured_vs_unstructured.py
Download Jupyter notebook:plot_ward_structured_vs_unstructured.ipynb