sklearn.cluster.FeatureAgglomeration?

class sklearn.cluster.FeatureAgglomeration(n_clusters=2, *, affinity='euclidean', memory=None, connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=<function mean>, distance_threshold=None)

[源碼]

聚集特征

類似于聚合聚類，但遞歸合并特征而不是樣本。

在用戶指南中閱讀更多內容。

參數	說明
n_clusters	int, default=2 要查找的聚類數目。如果`distance_threshold` 不是`None`, 它就必須是`None`。
affinity	str or callable, default=’euclidean’ 用于計算連接的度量。可以是“euclidean”, “l1”, “l2”, “manhattan”, “cosine”, 或者“precomputed”。如果linkage是“ward”，只有“euclidean”是被接受的。
memory	str or object with the joblib.Memory interface, default=None 用于儲存樹計算的輸出。默認情況下，不執行緩存。如果給出一個字符串，它就是緩存目錄的路徑。
connectivity	array-like or callable, default=None 連接矩陣。為每個樣本定義遵循給定數據結構的相鄰樣本。這可以是連接矩陣本身，也可以是可調用的，能將數據轉換為連接矩陣，例如從kneighbors_graph派生的連接矩陣。默認為None，分層聚類算法是一種非結構化的聚類算法。
compute_full_tree	‘auto’ or bool, default=’auto’ 提前停止n_clusters樹的構建。如果聚類的數量與樣本數相比并不少，這對于減少計算時間是非常有用的。此選項僅在指定連接矩陣時才有用。還請注意，當改變聚類數量并使用緩存時，計算所有樹可能更有利。如果`distance_threshold`是None, 它必須是True。通過默認的`compute_full_tree`設置是“auto”，當`distance_threshold`不是None的時候就等價于True,或者`n_clusters`的最大值在100~`0.02 * n_samples`之間。否則，“auto”等同False。
linkage	{“ward”, “complete”, “average”, “single”}, default=”ward” 使用哪種聯動標準。該算法將合并聚類，以最小化這一標準。 - Ward最小化會合并的聚類的差異。 - 平均使用兩組每次觀測的平均距離。 - 完全或最大連接使用兩個集合的所有觀測值之間的最大距離。 - 單次使用兩組所有觀測值之間的最小距離。
pooling_func	callable, default=np.mean 這將聚集特征的值組合成一個值，并且應該接受一個形狀為[M，N]的數組和關鍵字參數`axis=1`，并將其縮小為一個大小為[M]的數組。
distance_threshold	float, default=None 連接距離閾值高于該閾值，聚類將不會合并。如果不是None，則`n_clusters`必須為None，并且`compute_full_tree`必須為True。

屬性	說明
n_clusters_	int 通過算法找到的聚類數。如果`distance_threshold=None`=None，則它將等價于給定的`n_clusters`。
labels_	array-like of (n_features,) 每一特征的聚類標簽
n_leaves_	int 層次樹中的葉子節點數
n_connected_components_	int 圖中連通分量的估計數新版本0.21中：`n_connected_components_`被添加以代替`n_components_`
children_	array-like of shape (n_nodes-1, 2) 每個非葉節點的子節點。小于`n_features`的值對應于原始樣本樹的葉子。大于或等于`n_features`的節點`i`是一個非葉節點，具有子節點`children_[i - n_features]`。或者，在第i次迭代時，將children[i][0]和children[i][1]合并成節點`n_features + i`。

示例

>>> import numpy as np
>>> from sklearn import datasets, cluster
>>> digits = datasets.load_digits()
>>> images = digits.images
>>> X = np.reshape(images, (len(images), -1))
>>> agglo = cluster.FeatureAgglomeration(n_clusters=32)
>>> agglo.fit(X)
FeatureAgglomeration(n_clusters=32)
>>> X_reduced = agglo.transform(X)
>>> X_reduced.shape
(1797, 32)

方法	說明
`fit`(self, X[, y])	對數據進行分層聚類
`fit_transform`(self, X[, y])	擬合數據，然后轉換它。
`get_params`(self[, deep])	獲取此估計器的參數
`inverse_transform`(self, Xred)	逆變換
`set_params`(self, **params)	設置此估計器的參數
`transform`(self, X)	使用構建的聚類變換一個新的矩陣

__init__(self, n_clusters=2, *, affinity='euclidean', memory=None, connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=<function mean at 0x7f962a37d280>, distance_threshold=None)

[源碼]

初始化self。請參閱help(type(self))以獲得準確的說明。

fit(self, X, y=None, **params)

[源碼]

對數據進行分層聚類

參數	說明
X	array-like of shape (n_samples, n_features) 數據
y	Ignored

返回值	說明
self	-

property fit_predict

根據特征或距離矩陣擬合分層聚類，并返回聚類標簽。

參數	說明
X	array-like, shape (n_samples, n_features) or (n_samples, n_samples) 要聚類的訓練實例，或實例之間的距離，如果`affinity='precomputed'`
y	Ignored 未使用，在此按約定呈現為API一致性。

返回值	說明
labels	ndarray, shape (n_samples,) 類標簽

fit_transform(self, X, y=None, **fit_params)

[源碼]

擬合數據，然后轉換它。

使用可選參數fit_params將轉換器擬合到X和y，并返回轉換版本的X。

參數	說明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y	ndarray of shape (n_samples,), default=None 目標值

返回值	說明
**fit_params	dict 轉換后的數組

get_params(self, deep=True)

[源碼]

獲取此估計器的參數

參數	說明
deep	bool, default=True 如果為True，則將返回此估計器的參數和所包含的作為估計量的子對象。

返回值	說明
params	mapping of string to any 映射到其值的參數名稱

inverse_transform(self, Xred)

[源碼]

逆變換。返回大小為nb_features的向量，其值為Xred，分配給每一組特征。

參數	說明
Xred	array-like of shape (n_samples, n_clusters) or (n_clusters,) 要分配給每一組樣本的值。

返回值	說明
X	array, shape=[n_samples, n_features] or [n_features] 一個大小為n_samples的向量，其值為Xred，值分配給每一組樣本。

set_params(self, **params)

[源碼]

設置此估計器的參數

該方法適用于簡單估計器以及嵌套對象(例如pipelines)。后者具有表單的 <component>__<parameter>參數，這樣就可以更新嵌套對象的每個組件。

參數	說明
**params	dict 估計器參數

返回值	說明書
self	object 估計器實例

transform(self, X)

[源碼]

使用構建的聚類變換一個新的矩陣

參數	列表
X	array-like of shape (n_samples, n_features) or (n_samples,) 輸入數據。

參數	說明
X_trans	{array-like, sparse matrix} of shape (n_samples, n_clusters) N維觀測的M×N陣或M一維觀測的長度M陣

返回值	說明
Y	array, shape = [n_samples, n_clusters] or [n_clusters] 每個特性簇的存儲值

sklearn.cluster.FeatureAgglomeration使用示例?

特征集聚 ?

特征聚集與單變量選擇 ?