sklearn.metrics.fowlkes_mallows_score?

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False)

源碼

測量一組數據點的兩個聚類的相似度。

版本0.18中的新功能。

Fowlkes-Mallows指數（FMI）定義為精度和召回率之間的幾何平均值：

FMI = TP / sqrt((TP + FP) * (TP + FN))

其中TP是真正例的數量（即，在labels_true和labels_pred中屬于同一簇的點對的數量），FP是假正例的數量（即，在labels_true中而不在labels_pred中的在同一簇中的點對的數量），FN是假負例（即，屬于labels_pred中相同簇而不屬于labels_True中的在同一簇中的點對的數量）。

分數范圍從0到1。較高的值表示兩個聚類之間的相似性良好。

在用戶指南中閱讀更多內容。

參數	說明
labels_true	int array, shape = (`n_samples`,) 數據聚集成不相交的子集。
labels_pred	array, shape = (`n_samples`, ) 數據聚集成不相交的子集。
sparse	bool 在內部使用稀疏矩陣計算權變矩陣。

返回值	說明
score	float 得到的Fowlkes-Mallows得分。

參考

1 E. B. Fowkles and C. L. Mallows, 1983. “A method for comparing two hierarchical clusterings”. Journal of the American Statistical Association

2 Wikipedia entry for the Fowlkes-Mallows Index

示例

完全標簽既均勻又完整，因此得分為1.0：

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
1.0
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

如果類成員完全分散在不同的群集中，則分配是完全隨機的，因此FMI為空：

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0