sklearn.metrics.adjusted_mutual_info_score?

sklearn.metrics.adjusted_mutual_info_score(labels_true, labels_pred, *, average_method='arithmetic')

源碼

調整兩個群集之間的相互信息。

調整互信息（AMI）是對互信息（MI）分數的調整，以考慮機會。這說明了這樣一個事實，即對于具有大量群集的兩個群集，MI通常更高，而不管是否實際上共享了更多的信息。對于兩個聚類U和V，AMI表示為：

AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

此指標獨立于標簽的絕對值：類別或簇標簽值的排列不會以任何方式改變得分值。

此度量很對稱：將label_true與label_pred對調將返回相同的得分值。當未知真實值時，這對于測量兩個獨立標簽分配策略在同一數據集上的一致性很有用。

請注意，此功能比其他指標（例如蘭德調整指數）慢一個數量級。

在用戶指南中閱讀更多內容。

參數	說明
labels_true	int array, shape = [n_samples] 數據聚類成不相交的子集。
labels_pred	int array-like of shape (n_samples,) 將數據聚類成不相交的子集。
average_method	string, optional (default: ‘arithmetic’) 如何在分母中計算歸一化。可能的選項是‘min’, ‘geometric’, ‘arithmetic’, 和 ‘max’。 0.20版中的新功能。在0.22版中進行了更改：average_method的默認值從‘max’變為‘arithmetic’。

返回值	說明
ami	float (upperlimited by 1.0) 當兩個分區相同（即完全匹配）時，AMI返回值1。隨機分區（獨立標簽）的預期AMI平均約為0，因此可以為負。

另見：

adjusted_rand_score

蘭德調整指數

mutual_info_score

互信息（未經調整）

參考

1 Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance, JMLR

2 Wikipedia entry for the Adjusted Mutual Information

示例

完全標簽既均勻又完整，因此得分為1.0：

>>> from sklearn.metrics.cluster import adjusted_mutual_info_score
>>> adjusted_mutual_info_score([0, 0, 1, 1], [0, 0, 1, 1])
... 
1.0
>>> adjusted_mutual_info_score([0, 0, 1, 1], [1, 1, 0, 0])
... 
1.0

如果類成員完全分散在不同的群集中，則分配完全不完整，因此AMI為空：

>>> adjusted_mutual_info_score([0, 0, 0, 0], [0, 1, 2, 3])
... 
0.0

sklearn.metrics.adjusted_mutual_info_score應用示例?