sklearn.ensemble.BaggingRegressor?

class sklearn.ensemble.BaggingRegressor(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)

[源碼]

Bagging分類器。

Bagging分類器是一個集合元估計器，它使每個基本分類器擬合原始數據集的隨機子集，然后將其單個預測（通過投票或平均）進行匯總以形成最終預測。這類元估計器通過將隨機化引入其構造過程中，并對其進行整體化，來減少黑盒估計器（例如決策樹）方差。

該算法涵蓋了文獻中的幾篇著作。當將數據集的隨機子集繪制為樣本的隨機子集時，該算法稱為Pasting[1]。如果抽取樣本進行替換，則該方法稱為Bagging [2]。當將數據集的隨機子集繪制為要素的隨機子集時，該方法稱為Random Subspaces[3]。最后，當基于樣本和特征的子集建立基本估計器時，該方法稱為Random Patches[4]。

在用戶指南中閱讀更多內容。

0.15版的新功能。

參數	說明
base_estimator	object, default=None 基本估計器適合數據集的隨機子集。如果為None，則基本估計器為決策樹。
n_estimators	int，default = 10 集合中基本估計器的數量。
max_samples	int or float, default=1.0 從X抽取以訓練每個基本估計器的樣本數量（默認情況下`bootstrap`為替換，請參見有關更多詳細信息）。 - 如果為int，則抽取`max_samples`樣本。 - 如果為float，則抽取樣品。`max_samples * X.shape[0]`。
max_features	int or float, default=1.0 從X繪制以訓練每個基本估計器的要素數量（默認情況下不進行替換，`bootstrap_features`有關更多詳細信息，請參見）。 - 如果為int，則繪制`max_features`特征。 - 如果為float，則繪制特征。`max_features * X.shape[1]`
bootstrap	bool, default=True 是否抽取樣本進行替換。如果為False，則執行不替換的采樣。
bootstrap_features	bool, default=False 是否用替換繪制特征。
oob_score	bool，defalut = False 是否使用現成的樣本來估計泛化誤差。
warm_start	bool，defalut = False 設置為True時，請重用上一個調用的解決方案以適合并在集合中添加更多估計器，否則，僅適合一個全新的集合。請參閱Glossary。 0.17版中的新功能：`warm_start`構造函數參數。
n_jobs	int, default=None `fit`和并行運行的作業數`predict`。除非用于`joblib.parallel_backend`上下文中，否則`None`表示1 。`-1`表示使用所有處理器。有關更多詳細信息，請參見Glossary。
random_state	int or RandomState, default=None 控制原始數據集的隨機重采樣（sample wise 和 feature wise）。如果基本估算器接受`random_state`屬性，則會為集合中的每個實例生成一個不同的種子。為多個函數調用傳遞可重復輸出的int值。請參閱Glossary。
verbose	int, default=0 在擬合和預測時控制冗余程度。

屬性	說明
base_estimator_	estimator 通過集成成長而來的基本估計器。
n_features_	int `fit`執行時的功能數量。
estimators_	list of estimators 擬合基礎估計器的集合。
estimators_samples_	list of arrays 每個基本估計器的抽取樣本的子集。
estimators_features_	list of arrays 每個基本估計器的繪制要素子集。
classes_	ndarray of shape (n_classes,) 類標簽。
n_classes_	int or list 類數。
oob_score_	float 使用"袋外"估計獲得的訓練數據集的分數。該屬性僅在`oob_score`為True 時存在。
oob_decision_function_	ndarray of shape (n_samples, n_classes) 用訓練集上的實際估計值計算的決策函數。如果`n_estimators`較小，則有可能在`bootstrap`過程中不會遺漏任何數據點。在這種情況下， `oob_decision_function_`可能包含NaN。該屬性僅在`oob_score`為True 時存在。

參考文獻：

[1] L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

[2] L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

[3] T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

[4] G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

實例：

>>> from sklearn.svm import SVR
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=4,
...                        n_informative=2, n_targets=1,
...                        random_state=0, shuffle=False)
>>> regr = BaggingRegressor(base_estimator=SVR(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> regr.predict([[0, 0, 0, 0]])
array([-2.8720...])

方法

方法	說明
`fit`(X, y[, sample_weight])	從訓練集中構建一個bagging評估器的集合。
`get_params`([deep])	獲取此估計器的參數。
`predict`(X)	預測X的回歸目標。
`score`(X, y[, sample_weight])	返回預測的決定系數R^2。
`set_params`(params)**	設置此估算器的參數。

__init__(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)

[源碼]

初始化self。有關準確的簽名，請參見help(type(self))。

property estimators_samples_

每個基本估計器的抽取樣本的子集。

返回一個動態生成的索引列表，這些索引標識用于擬合集合的每個成員的樣本，即袋裝樣本。

注意：在每次調用屬性時都會重新創建列表，以通過不存儲采樣數據來減少對象內存占用。因此，獲取屬性可能比預期要慢。

fit(X, y, sample_weight=None)

[源碼]

從訓練集中構建一個bagging評估器的集合。

set(X, y)

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練輸入樣本。僅當基本估計器支持稀疏矩陣時，才接受。
y	array-like of shape (n_samples,) 目標值(分類中的類標簽，回歸中的實數)。
sample_weight	array-like of shape (n_samples,), default=None 樣品重量。如果為None，則對樣本進行平均加權。請注意，僅當基本估算器支持樣本加權時才支持此功能。

返回值	說明
self	object

get_params(deep=True)

[源碼]

獲取此估計器的參數。

參數	說明
deep	bool, default=True 如果為True，則將返回此估算器和作為估算器的所包含子對象的參數。

返回值	說明
params	mapping of string to any 參數名稱與其值相對應。

predict(X)

[源碼]

返回預測的決定系數R^2。

計算輸入樣本的預測回歸目標作為集成估計量的平均預測回歸目標。

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 訓練輸入樣本。僅當基本估計器支持稀疏矩陣時，才接受。

返回值	說明
y	ndarray of shape (n_samples,) 被預測的類。

score(X, y, sample_weight=None)

[源碼]

返回預測的決定系數R^2。

決定系數R^2為(1 - u/v)，其中u為((y_true - y_pred) ** 2).sum()的殘差平方和，v為(y_true - y_true.mean()) ** 2).sum()的平方和。最好的可能的分數是1.0，它可能是負的(因為模型可以任意地更糟)。常數模型總是預測y的期望值，而不考慮輸入特征，得到的R^2得分為0.0。

參數	說明
X	array-like of shape (n_samples, n_features) 測試樣本。對于一些估計器，這可能會被一個預先計算的內核矩陣或一列通用對象替代，而不是shape= (n_samples, n_samples_fitted)，其中n_samples_fitted是用于擬合估計器的樣本數量。
y	array-like of shape (n_samples, ) or (n_samples, n_outputs) X的正確值。
sample_weight	array-like of shape (n_sample, ), default = None 樣本權重。

返回值	說明
score	float self.predict(X)關于 y的決定系數R^2

注意：

調用回歸變器的score時使用的R2 score，與0.23版本的multioutput='uniform_average'中r2_score的默認值保持一致。這影響了所有多輸出回歸的score方法(除了MultiOutputRegressor)。

set_params(**params)

[源碼]

設置該估計器的參數。

該方法適用于簡單估計器和嵌套對象(如pipline)。后者具有形式為<component>_<parameter>的參數，這樣就可以更新嵌套對象的每個組件。

參數	說明
**params	dict 估計器參數

返回值	說明
self	object 估計實例。

sklearn.ensemble.BaggingRegressor使用示例?

單個估計器 vs bagging：偏差-方差分解 ?