sklearn.ensemble.StackingRegressor?

class sklearn.ensemble.StackingRegressor(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)

[源碼]

帶有最終分類器的估計器堆棧。

堆疊泛化包括堆疊個別估計器的輸出，并使用分類器來計算最終的預測。疊加允許使用每個估計器的強度，使用它們的輸出作為最終估計器的輸入。

注意，estimators_是在完整的X上擬合而來的，而final_estimator_是通過使用cross_val_predict對基礎估計器進行交叉驗證的預測來進行訓練的。

0.22版本新功能。

在用戶指南中閱讀更多內容。

參數	說明
estimators	list of (str, estimator) 將被堆疊在一起的基礎估計器。列表中的每個元素都被定義為一個字符串元組(即名稱)和一個`estimator`實例。可以使用`set_params`將評估器設置為“drop”。
final_estimator	estimator, default=None 一個分類器，它將被用來組合基礎估計器。默認的分類器是`RidgeCV`。
cv	int, cross-validation generator or an iterable, default=None 確定`cross_val_predict`中用于訓練`final_estimator`的交叉驗證拆分策略。`cv`可能的輸入有: - None，使用默認的5折交叉驗證。 - Integer, 用于指定(分層的)K-Fold中的折疊數。 - 用作交叉驗證生成器的對象。 - 一個可迭代的產生的訓練、測試分割。對于`integer`/`None`，如果估計器是一個分類器，并且`y`是二進制的或多類的，則使用`StratifiedKFold`。所有其他情況下，使用`K-Fold`。參考用戶指南，了解這里可以使用的各種交叉驗證策略。注意：如果訓練樣本的數量足夠大，那么分割的數量再大也沒有什么好處。事實上，訓練時間會增加。cv不是用于模型評估，而是用于預測。
n_jobs	int, default=None 所有并行`estimators fit`作業數量。除非在`joblib.parallel_backend`中，否則`None`表示是1。-1表示使用所有處理器。參見Glossary了解更多細節。
passthrough	bool, default=False 當為`False`時，只使用估計器的預測作為`final_estimator`的訓練數據。當為真時，`final_estimator`將在預測和原始訓練數據上進行訓練。
verbose	int, default=0 冗余水平。

屬性	說明
estimators_	list of estimators 估計器參數的元素，已在訓練數據上擬合。如果一個估計器被設置為“drop”，那么它將不會出現在`estimators_`中。
named_estimators_	`Bunch` 屬性來按名稱訪問任何擬合的子估計器。
final_estimator_	estimator 給出估計器的輸出來預測的`estimators_`。

參考文獻

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

實例

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.svm import LinearSVR
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import StackingRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> estimators = [
...     ('lr', RidgeCV()),
...     ('svr', LinearSVR(random_state=42))
... ]
>>> reg = StackingRegressor(
...     estimators=estimators,
...     final_estimator=RandomForestRegressor(n_estimators=10,
...                                           random_state=42)
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=42
... )
>>> reg.fit(X_train, y_train).score(X_test, y_test)
0.3...

方法

方法	說明
`fit`(X[, y, sample_weight])	擬合估計器。
`fit_transform`(X[, y])	擬合估計器和變換數據集。
`get_params`([deep])	從集成中得到估計器的參數。
`predict`(X, predict_params)**	預測`X`的目標值。
`score`(X, y[, sample_weight])	返回給定測試數據和標簽的平均精度。
`set_params`(params)**	從集成中設置估計器的參數。
`transform`(X)	返回每個估計器`X`的類標簽或概率。

__init__(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)

[源碼]

初始化self。有關準確的簽名，請參見help(type(self))。

fit(X, y, sample_weight = None)

[源碼]

擬合估計器。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練向量，其中n_samples為樣本數量，n_features為特征數量。
y	array-like of shape (n_samples,) 目標值。
sample_weight	array-like of shape (n_samples,), default=None 樣本權重。如果沒有，那么樣本的權重相等。注意，只有當所有的潛在估計器都支持樣本權值時，才支持此方法。

返回值	說明
self	object

fit_transform(X, y=None, **fit_params)

[源碼]

擬合數據，然后轉換它。

使用可選參數fit_params將transformer與X和y匹配，并返回X的轉換版本。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練向量，其中n_samples為樣本數量，n_features為特征數量。
y	ndarray of shape (n_samples,), default=None 目標值。
**fit_params	dict 其他擬合參數。

返回值	說明
X_new	ndarray array of shape (n_samples, n_features_new) 轉化后的數組。

get_params(deep=True)

[源碼]

從集成中得到估計器的參數。

參數	說明
deep	deep : bool, default = True 將其設置為True將獲得各種分類器以及分類器的參數。

property n_features_in_

fit 過程中可見的特征數量。

predict(X, **predict_params)

[源碼]

預測X的目標值。

該方法適用于簡單估計器和嵌套對象(如pipline)。后者具有形式為<component>_<parameter>的參數，這樣就可以更新嵌套對象的每個組件。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練向量，其中n_samples為樣本數量，n_features為特征數量。
**predict_params	dict of str -> obj 由`final_estimator`調用的`predict`的參數。請注意，這可能用于從一些使用`return_std`或`return_cov`的估計器返回不確定性。請注意，它只會在最終的估計器中考慮不確定性。

返回值	說明
y_pred	ndarray of shape (n_samples,) or (n_samples, n_output) 預測后的目標值。

score(X, y, sample_weight=None)

[源碼]

返回預測的決定系數R^2。

定義系數R^2為(1 - u/v)，其中u為(y_true - y_pred) ** 2).sum()的殘差平方和，v為(y_true - y_true.mean()) ** 2).sum()的平方和。最好的可能的分數是1.0，它可能是負的(因為模型可以任意地更糟)。一個常數模型總是預測y的期望值，而不考慮輸入特征，得到的R^2得分為0.0。

參數	說明
X	array-like of shape (n_samples, n_features) 測試樣品。對于某些估計器，這可能是一個預先計算的內核矩陣或一列通用對象，而不是`shape= (n_samples, n_samples_fitting)`，其中`n_samples_fitting`是用于擬合估計器的樣本數量
y	array-like of shape (n_samples,) or (n_samples, n_outputs) `X`的正確值。
sample_weight	array-like of shape (n_samples,), default=None 樣本權重。

返回值	說明
score	float self.predict(X) 關于y的平均準確率。

注意：

調用回歸變器的score時使用的R2 score，與0.23版本的multioutput='uniform_average'中r2_score的默認值保持一致。這影響了所有多輸出回歸的score方法(除了MultiOutputRegressor)。

set_params(**params)

[源碼]

從集成中設置估計器的參數。

有效的參數鍵可以用get_params()列出。

參數	說明
**params	keyword arguments 使用例如`set_params(parameter_name=new_value)`的特定參數。此外，為了設置堆料估算器的參數，還可以設置疊加估算器的單個估算器，或者通過將它們設置為“`drop`”來刪除它們。

transform(X)

[源碼]

返回每個估計量X的類標簽或概率。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練向量，其中`n_samples`為樣本數量，`n_features`為特征數量。

返回值	說明
y_preds	*ndarray of shape (n_samples, n_estimators) or (n_samples, n_classes n_estimators)** 每個估計器的預測輸出。

sklearn.ensemble.StackingRegressor使用示例?

使用stacking的組合預測器 ?