串聯多個特征提取方法?

在許多實際示例中,有很多方法可以從數據集中提取要素。 通常,結合幾種方法以獲得良好的性能是有益的。 本示例說明如何使用FeatureUnion組合通過PCA和單變量選擇獲得的特征。

使用該轉換器將功能組合在一起的好處是,它可以在整個過程中進行交叉驗證和網格搜索。

本示例中使用的組合對該數據集沒有特別幫助,僅用于說明FeatureUnion的用法。

輸入:

# 作者: Andreas Mueller <amueller@ais.uni-bonn.de>
#
# 執照: BSD 3 clause

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris = load_iris()

X, y = iris.data, iris.target

# 這個數據集太高維了。 最好做PCA:
pca = PCA(n_components=2)

# 也許某些原始功能還不錯?
selection = SelectKBest(k=1)

# 通過PCA和單變量選擇構建估算器:

combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

# 使用組合特征轉換數據集:
X_features = combined_features.fit(X, y).transform(X)
print("Combined space has", X_features.shape[1], "features")

svm = SVC(kernel="linear")

# 對k,n_components和C進行網格搜索:

pipeline = Pipeline([("features", combined_features), ("svm", svm)])

param_grid = dict(features__pca__n_components=[123],
                  features__univ_select__k=[12],
                  svm__C=[0.1110])

grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)

輸出:

Combined space has 3 features
Fitting 5 folds for each of 18 candidates, totalling 90 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.867, total=   0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=1.000, total=   0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.900, total=   0.0s
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.867, total=   0.0s
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.0s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.933, total=   0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.1s remaining:    0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.900, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.867, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.900, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.900, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.933, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.933, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.933, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.900, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.967, total=   0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV]  features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total=   0.0s
[Parallel(n_jobs=1)]: Done  90 out of  90 | elapsed:    0.5s finished
Pipeline(steps=[('features',
                 FeatureUnion(transformer_list=[('pca', PCA(n_components=3)),
                                                ('univ_select',
                                                 SelectKBest(k=1))])),
                ('svm', SVC(C=10, kernel='linear'))])

腳本的總運行時間:(0分鐘0.477 秒)。