グリッドサーチ
グリッドサーチは、パラメータをチューニングしてモデルの汎化性能を向上させる方法。
sklearnでグリッドサーチ
パラメータの辞書を作ります。
GridSearchCVに、
- モデル
- パラメータ辞書
- 交差検証の分割数
渡します。
後の、学習と推論(予測)の作業はいつも通りの、fitとpredictで実行します。
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
iris = load_iris()
train_X, val_X, train_y, val_y = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)
params_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
'gamma':[0.001, 0.01, 0.1, 1, 10, 100]}
grid_search = GridSearchCV(SVC(), params_grid, cv=5)
grid_search = grid_search.fit(train_X, train_y)
params = grid_search.best_params_
print("best_paramerters is ",params)
print("best_score is ", grid_search.best_score_)
pred = grid_search.predict(val_X)
print("accuracy is ", accuracy_score(pred, val_y))
上のgrid_searchのparamsに最適なパラメータが格納されているので、シンプルにSVCを再実装して、予測を行うこともできます。
svc = SVC(**params) svc.fit(train_X, train_y) pred = svc.predict(val_X) print(accuracy_score(pred, val_y))
グリッドサーチ×1つ抜き交差検証法
言うまでもなく、1つ抜き交差検証法はデータ数が大きくなると膨大な時間がかかる。
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold, LeaveOneOut
iris = load_iris()
train_X, val_X, train_y, val_y = train_test_split(iris.data, iris.target, test_size=0.2, random_state=0)
params_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
'gamma':[0.001, 0.01, 0.1, 1, 10, 100]}
loo = LeaveOneOut()
grid_search = GridSearchCV(SVC(), params_grid, cv=loo)
grid_search = grid_search.fit(train_X, train_y)
params = grid_search.best_params_
print("best_paramerters is ",params)
print("best_score is ", grid_search.best_score_)
pred = grid_search.predict(val_X)
print("accuracy is ", accuracy_score(pred, val_y))
参考
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html