sklearn.model_selection.learning_curve?

sklearn.model_selection.learning_curve(estimator, X, y, *, groups=None, train_sizes=array([0.1, 0.33, 0.55, 0.78, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=None, pre_dispatch='all', verbose=0, shuffle=False, random_state=None, error_score=nan, return_times=False)

[源碼]

學習曲線。

確定不同訓練集大小的交叉驗證訓練和測試分數。

交叉驗證生成器將整個數據集在訓練和測試數據中分割k次。具有不同大小的訓練集的子集將用于訓練估計器，并計算每個訓練子集的大小和測試集的分數。然后，對每個訓練子集大小的k次運行的分數取平均值。

在用戶指南中閱讀更多內容。

參數	說明
estimator	object type that implements the “fit” and “predict” methods 每次驗證都會克隆的該類型的對象。
X	array-like of shape (n_samples, n_features) 用于訓練的向量，其中n_samples是樣本數量，n_features是特征數量。
y	array-like of shape (n_samples,) or (n_samples, n_outputs) 相對于X的標簽，用于分類或回歸；無監督學習為None。
groups	array-like of shape (n_samples,), default=None 將數據集切分為訓練集或測試集時使用的樣本的分組標簽。僅與“ Group” cv 實例（例如`GroupKFold`）結合使用。
train_sizes	array-like of shape (n_ticks,), default=np.linspace(0.1, 1.0, 5) 用于生成學習曲線的相對或絕對數量的訓練樣本。如果dtype為float，則視為訓練集最大規模的一部分（由所選驗證方法確定），即它必須在（0，1]之內，否則將被解釋為訓練集的絕對大小。請注意，為進行分類，樣本數量通常必須足夠大，每個類至少包含一個樣本。
cv	int, cross-validation generator or an iterable, default=None 確定交叉驗證切分策略。cv值可以輸入： - None，默認使用5折交叉驗證 - int，用于指定`(Stratified)KFold`的折數 - CV splitter, - 可迭代輸出訓練集和測試集的切分作為索引數組對于int或 None輸入，如果估計器是分類器，并且`y`是二分類或多分類，則使用`StratifiedKFold`。在所有其他情況下，均使用`KFold`。有關可在此處使用的各種交叉驗證策略，請參閱用戶指南。在版本0.22中：如果`cv`為None，默認值從3折更改為5折。
scoring	str or callable, default=None 一個str(參見模型評估文檔)或一個信息為`scorer(estimator, X, y)`的可調用對象或函數的評分器，它應該只返回一個值。
exploit_incremental_learning	bool, default=False 如果估計器支持增量學習，這將用于加快擬合不同訓練集大小的速度。
n_jobs	int, default=None 用于進行計算的CPU數量。 `None`除非在`joblib.parallel_backend`環境中，否則表示1 。 undefined表示使用所有處理器。有關更多詳細信息，請參見詞匯表。
pre_dispatch	int or str, default=’all’ 并行執行的預調用CPU數（默認為全部）。該選項可以減少分配的內存。str可以是類似“ 2 * n_jobs”的表達式。
verbose	int, default=0 控制詳細程度：越高，消息越多。
shuffle	bool, default=False 是否在基于"train_sizes''為前綴之前對訓練數據進行打亂。
random_state	int or RandomState instance, default=None 在`shuffle`為true時使用。為多個函數調用傳遞可重復輸出的int值。請參閱詞匯表。
error_score	‘raise’ or numeric, default=np.nan 估計器擬合出現錯誤時，分配給分數的值。如果設置為“ raise”，則會引發錯誤。如果給出數值，則引發FitFailedWarning。此參數不會影響重新擬合步驟，這將總是引發錯誤。 0.20版中的新功能。
return_times	bool, default=False 是否返回擬合和計算得分的時間。

返回值	說明
train_sizes_abs	array of shape (n_unique_ticks,) 已用于生成學習曲線的訓練樣本數。請注意，標記號的數目可能少于n_ticks，因為重復的條目將被刪除。
train_scores	array of shape (n_ticks, n_cv_folds) 訓練集準確率。
test_scores	array of shape (n_ticks, n_cv_folds) 測試集準確率。
fit_times	array of shape (n_ticks, n_cv_folds) 擬合花費的時間，以秒為單位。僅在`return_times` 為True時存在。
score_times	array of shape (n_ticks, n_cv_folds) 計算準確率花費的時間，以秒為單位。僅在`return_times` 為True時存在。

注

參見examples/model_selection/plot_learning_curve.py

sklearn.model_selection.learning_curve使用示例?

核嶺回歸與SVR的比較 ?

繪制學習曲線 ?