sklearn.inspection.permutation_importance?

sklearn.inspection.permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5, n_jobs=None, random_state=None)

[源碼]

特征評估的置換重要性[BRE]。估算器必須是擬合估算器。X可以是用于訓練估算器的數據集，也可以是保留集。特征的排列重要性計算如下。首先，在X定義的（可能不同的）數據集上評估通過評分定義的基線度量。接著，對驗證集中的特征列進行置換，并再次評估度量。排列重要性定義為基線度量和來自排列特征列的度量之間的差異。在用戶指南中閱讀更多內容。

參數	說明
estimator	object 一個已經訓練并且與計分器兼容的估計器。
X	ndarray or DataFrame, shape (n_samples, n_features) 將計算排列重要性的數據。
y	array-like or None, shape (n_samples, ) or (n_samples, n_classes) 有監督的目標，無監督的目標。
scoring	string, callable or None, default=None 使用的評分器。它可以是單個字符串（請參閱評分參數：定義模型評估規則），也可以是可調用的（請參閱從度量函數定義評分策略）。如果為None，則使用估算器的默認評分器。
n_repeats	int, default=5 置換特征的次數。
n_jobs	int or None, default=None 用于計算的作業數。除非在joblib.parallel_backend上下文中，否則None表示1。 -1表示使用所有處理器。有關更多詳細信息，請參見詞匯表。
random_state	int, RandomState instance, default=None 偽隨機數生成器，用于控制每個特征的排列。傳遞一個int通過函數調用獲得可重復的結果。參見：term：詞匯表<random_state>。

返回值	說明
result	Bunch 類字典對象，具有以下屬性。 - importances_meanndarray, shape (n_features, ) 特征重要性超過n_repeats的平均值。 - importances_stdndarray, shape (n_features, ) n_repeats的標準偏差。 - importancesndarray, shape (n_features, n_repeats) 原始排列重要性得分。

參考

BRE

L. Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001. https://doi.org/10.1023/A:1010933404324

示例

>>>
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.inspection import permutation_importance
>>> X = [[1, 9, 9],[1, 9, 9],[1, 9, 9],
...      [0, 9, 9],[0, 9, 9],[0, 9, 9]]
>>> y = [1, 1, 1, 0, 0, 0]
>>> clf = LogisticRegression().fit(X, y)
>>> result = permutation_importance(clf, X, y, n_repeats=10,
...                                 random_state=0)
>>> result.importances_mean
array([0.4666..., 0.       , 0.       ])
>>> result.importances_std
array([0.2211..., 0.       , 0.       ])

sklearn.inspection.permutation_importance應用示例?

樹森林的特征重要性 ?

梯度提升回歸 ?

具有多重共線性或相關特征的置換重要性 ?

置換重要性與隨機森林特征重要性(MDI) ?

scikit-learn 0.22中的發布要點 ?