串聯多個特征提取方法?
在許多實際示例中,有很多方法可以從數據集中提取要素。 通常,結合幾種方法以獲得良好的性能是有益的。 本示例說明如何使用FeatureUnion組合通過PCA和單變量選擇獲得的特征。
使用該轉換器將功能組合在一起的好處是,它可以在整個過程中進行交叉驗證和網格搜索。
本示例中使用的組合對該數據集沒有特別幫助,僅用于說明FeatureUnion的用法。
輸入:
# 作者: Andreas Mueller <amueller@ais.uni-bonn.de>
#
# 執照: BSD 3 clause
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris = load_iris()
X, y = iris.data, iris.target
# 這個數據集太高維了。 最好做PCA:
pca = PCA(n_components=2)
# 也許某些原始功能還不錯?
selection = SelectKBest(k=1)
# 通過PCA和單變量選擇構建估算器:
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
# 使用組合特征轉換數據集:
X_features = combined_features.fit(X, y).transform(X)
print("Combined space has", X_features.shape[1], "features")
svm = SVC(kernel="linear")
# 對k,n_components和C進行網格搜索:
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
param_grid = dict(features__pca__n_components=[1, 2, 3],
features__univ_select__k=[1, 2],
svm__C=[0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
輸出:
Combined space has 3 features
Fitting 5 folds for each of 18 candidates, totalling 90 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.867, total= 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.900, total= 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 7 out of 7 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.867, total= 0.0s
[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 9 out of 9 | elapsed: 0.1s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.867, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 90 out of 90 | elapsed: 0.5s finished
Pipeline(steps=[('features',
FeatureUnion(transformer_list=[('pca', PCA(n_components=3)),
('univ_select',
SelectKBest(k=1))])),
('svm', SVC(C=10, kernel='linear'))])
腳本的總運行時間:(0分鐘0.477 秒)。