sklearn.feature_selection.RFECV?

class sklearn.feature_selection.RFECV(estimator, *, step=1, min_features_to_select=1, cv=None, scoring=None, verbose=0, n_jobs=None)

[源碼]

具有遞歸特征消除和交叉驗證選擇最佳特征數的特征排序。

有關交叉驗證估算器，請參閱詞匯表條目。

在用戶指南中閱讀更多內容。

參數	說明
estimator	object 一種監督學習估計器，其`fit`方法通過`coef_` 屬性或`feature_importances_`屬性提供有關特征重要性的信息。
step	int or float, optional (default=1) 如果大于或等于1，則`step`對應于每次迭代要刪除的個特征個數。如果在（0.0，1.0）之內，則`step`對應于每次迭代要刪除的特征的百分比（向下舍入）。請注意，為了達到`min_features_to_select`，最后一次迭代刪除的特征可能少于`step`。
min_features_to_select	int, (default=1) 最少要選擇的特征數量。即使原始特征數量與`min_features_to_select`之間的差不能被`step`整除，也會對這些特征數進行評分。 0.20版中的新功能。
cv	int, cross-validation generator or an iterable, optional 確定交叉驗證拆分策略。可能的輸入是： - 無，要使用默認的5倍交叉驗證 - 整數，用于指定折數。 - CV分配器 - 可迭代的產生（訓練，測試）拆分為索引數組。對于整數或無輸入，如果`y`是二分類或多分類，則使用`sklearn.model_selection.StratifiedKFold`。如果估計器是分類器，或者`y`既不是二分類也不是多分類，則使用`sklearn.model_selection.KFold`。有關可在此處使用的各種交叉驗證策略，請參閱用戶指南。在0.22版本中更改：`cv`無的默認值從3更改為5。
scoring	string, callable or None, optional, (default=None) 字符串(參見模型評估文檔)或具有`scorer(estimator, X, y)`簽名的scorer可調用對象或函數。
verbose	int, (default=0) 控制輸出的詳細程度。
n_jobs	int or None, optional (default=None) 跨折時要并行運行的核心數。除非在上下文中設置了`joblib.parallel_backend`參數，否則`None`表示1 。 `-1`表示使用所有處理器。有關更多詳細信息，請參見詞匯表。 0.18版本中的新功能。

屬性	說明
n_features_	int 利用交叉驗證所選特征的數量。
support_	array of shape [n_features] 選定特征的掩碼。
ranking_	array of shape [n_features] 特征排序，使ranking_[i]對應第i個特征的排序位置。選擇的(即估計的最佳)特征被排在第1位。
grid_scores_	array of shape [n_subsets_of_features] 交叉驗證得分， `grid_scores_[i]`對應于第i個特征子集的CV得分。
estimator_	object 擬合簡化后的數據集的外部估算器。

另見

RFE

遞歸特征消除

注

grid_scores_的大小等于 ceil((n_features - min_features_to_select) / step) + 1，其中step是每次迭代刪除的特征數量。

如果基估計器也可以輸入，則允許NaN 或Inf。

參考

1 Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

示例

下面的示例顯示如何檢索Friedman＃1數據集中的先驗未知的5個信息特征。

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

方法

方法	說明
`decision_function`(X)	計算`X`的決策函數。
`fit`(X, y[, groups])	擬合RFE模型并自動調整所選特征數量。
`fit_transform`(X[, y])	擬合數據并對其進行轉換。
`get_params`([deep])	獲取此估計器的參數。
`get_support`([indices])	獲取所選特征的掩碼或整數索引。
`inverse_transform`(X)	反向轉換操作
`predict`(X)	將X簡化為選定的特征，然后使用基估計器進行預測。
`predict_log_proba`(X)	預測X的類對數概率。
`predict_proba`(X)	預測X的類概率。
`score`(X, y)	將X簡化為選擇的特征，然后返回基估計器的分數。
`set_params`(**params)	設置此估算器的參數。
`transform`(X)	將X縮小為選定的特征。

__init__(estimator, *, step=1, min_features_to_select=1, cv=None, scoring=None, verbose=0, n_jobs=None)

[源碼]

初始化self，參見help(type(self))獲取更多信息。

decision_function(X)

[源碼]

計算X的決策函數。

參數	說明
X	{array-like or sparse matrix} of shape (n_samples, n_features) 輸入樣本。在內部，如果將稀疏矩陣提供給 `csr_matrix`,它將轉換為 `dtype=np.float32`。

返回值	說明
score	array, shape = [n_samples, n_classes] or [n_samples] 輸入樣本的決策函數。類的順序與屬性classes_中的順序相對應。回歸和二分類產生一個形狀為[n_samples]的數組。

fit（X，y，groups = None ）

[源碼]

擬合RFE模型并自動調整所選特征的數量。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練向量，其中`n_samples`是樣本數量， `n_features`是特征數量。
y	array-like of shape (n_samples,) 目標值（用于分類的整數，用于回歸的實數）。
groups	array-like of shape (n_samples,) or None 將數據集拆分為訓練集和測試集時使用的樣本的標簽分組。僅與“ Group” cv 實例（例如`GroupKFold`）結合使用。

fit_transform(X, y=None, **fit_params)

[源碼]

擬合數據，然后對其進行轉換。

使用可選參數fit_params將轉換器擬合到X和y，并返回X的轉換值。

參數	說明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y	ndarray of shape (n_samples,), default=None 目標值
**fit_params	dict 其他擬合參數。

返回值	說明
X_new	ndarray array of shape (n_samples, n_features_new) 轉換后的數組。

get_params(deep=True)

[源碼]

獲取此估計器的參數。

參數	說明
deep	bool, default=True 如果為True，則將返回此估算器和所包含子對象的參數。

返回值	說明
params	mapping of string to any 參數名稱映射到其值。

get_support(indices=False)

[源碼]

獲取所選特征的掩碼或整數索引。

參數	說明
indices	boolean (default False) 如果為True，則返回值將是一個整數數組，而不是布爾掩碼。

返回值	說明
support	array 從特征向量中選擇保留特征的索引。如果`indices`為False，則為形狀為[＃輸入特征]的布爾數組，其中元素為True時（如果已選擇其對應的特征進行保留）。如果`indices`為True，則這是一個形狀為[＃輸出特征]的整數數組，其值是輸入特征向量的索引。

inverse_transform(X)

[源碼]

反向轉換操作。

參數	說明
X	array of shape [n_samples, n_selected_features] 輸入樣本。

返回值	說明
X_r	array of shape [n_samples, n_original_features] `X`中插入的列名為零的特征將被`transform`刪除。

predict(X)

[源碼]

將X簡化為選定的特征，然后使用基估計器進行預測。

參數	說明
X	array of shape [n_samples, n_features] 輸入樣本。

返回值	說明
y	array of shape [n_samples] 預測目標值。

predict_log_proba（X ）

[源碼]

預測X的類對數概率。

參數	說明
X	array of shape [n_samples, n_features] 輸入樣本。

返回值	說明
p	array of shape [n_samples] 輸入樣本的類對數概率。類的順序與屬性classes_中的順序相對應。

predict_proba（X ）

[源碼]

參數	說明
X	{array-like or sparse matrix} of shape (n_samples, n_features) 輸入樣本。在內部，如果將稀疏矩陣提供給`csr_matrix`，它將轉換為 `dtype=np.float32`。

返回值	說明
p	array of shape (n_samples, n_classes) 輸入樣本的分類概率。類的順序與屬性classes_中的順序相對應。

score（X，y ）

將X簡化為選擇的特征，然后返回基估計器的分數。

參數	說明
X	array of shape [n_samples, n_features] 輸入樣本。
y	array of shape [n_samples] 目標值。

set_params(**params)

[源碼]

設置此估算器的參數。

該方法適用于簡單的估計器以及嵌套對象（例如管道）。后者具有<component>__<parameter>形式的參數，以便可以更新嵌套對象的每個組件。

參數	說明
**params	dict 估計器參數。

返回值	說明
self	object 估計器實例。

transform(X)

[源碼]

將X縮小為選定的特征。

參數	說明
X	array of shape [n_samples, n_features] 輸入樣本。

返回值	說明
X_r	array of shape [n_samples, n_selected_features] 僅具有所選特征的輸入樣本。

sklearn.feature_selection.RFECV使用示例?

帶交叉驗證的遞歸特征消除 ?