sklearn.datasets.fetch_rcv1?

sklearn.datasets.fetch_rcv1(*, data_home=None, subset='all', download_if_missing=True, random_state=None, shuffle=False, return_X_y=False)

[源碼]

加載RCV1多標簽數據集（分類）。

如有必要，請下載。

版本：RCV1-v2，向量，全集，多標簽主題。

類	103
樣本總數	804414
維度	47236
特征	real, between 0 and 1

在用戶指南中閱讀更多內容。

版本0.17中的新功能。

參數	說明
data_home	string, optional 為數據集指定另一個下載和緩存文件夾。默認情況下，所有scikit-learn數據都存儲在“?/ scikit_learn_data”子文件夾中。
subset	string, ‘train’, ‘test’, or ‘all’, default=’all’ 選擇要加載的數據集：“train”用于訓練集（23149個樣本），“test”用于測試集（781265個樣本），“all”表示同時加載，如果shuffle為False，則首先使用訓練樣本。這是按照LYRL2004官方時間順序進行的。
download_if_missing	boolean, default=True 如果為False，則在數據不在本地可用時引發IOError，而不是嘗試從源站點下載數據。
random_state	int, RandomState instance, default=None 確定用于數據集shuffle的隨機數生成。為多個函數調用傳遞可重復輸出的int值。請參閱詞匯表。
shuffle	bool, default=False 是否shuffle數據集。
return_X_y	boolean, default=False. 如果為True，則返回（dataset.data，dataset.target）而不是Bunch對象。請參閱下文，以獲取有關dataset.data和dataset.target對象的更多信息。 0.20版中的新功能。

返回值說明

dataset Bunch
類字典對象，具有以下屬性。
- data:scipy csr array, dtype np.float64, shape (804414, 47236)
數組具有0.16％的非零值。
- target:scipy csr array, dtype np.uint8, shape (804414, 103)
每個樣本在其類別中的值為1，在其他類別中的值為0。數組具有3.15％的非零值。
- sample_id:numpy array, dtype np.uint32, shape (804414,)
每個樣本的標識號，按dataset.data中的順序。
- target_namesnumpy array, dtype object, length (103)
每個target的名稱（RCV1主題），按dataset.target中的順序排列。
- DESCR:string
RCV1數據集的描述。

(data, target) tuple if return_X_y is True
0.20版中的新功能。

返回值	說明
dataset	`Bunch` 類字典對象，具有以下屬性。 - data:scipy csr array, dtype np.float64, shape (804414, 47236) 數組具有0.16％的非零值。 - target:scipy csr array, dtype np.uint8, shape (804414, 103) 每個樣本在其類別中的值為1，在其他類別中的值為0。數組具有3.15％的非零值。 - sample_id:numpy array, dtype np.uint32, shape (804414,) 每個樣本的標識號，按dataset.data中的順序。 - target_namesnumpy array, dtype object, length (103) 每個target的名稱（RCV1主題），按dataset.target中的順序排列。 - DESCR:string RCV1數據集的描述。
(data, target)	tuple if `return_X_y` is True 0.20版中的新功能。