Gridsearch

Posted: 05/02/2018/Under: Machine Learning/By: Vaibhav

Grid search is good for tuning hyper-parameters. Hyper-parameters are parameters that are not directly learnt within estimators. We will compare the SVM models for different C and gamma values using Gridsearch.

Refer to the blog on SVM if you want to learn more about Support Vector Machines.

We will directly dive into a practical example of breast cancer classifier problem and optimizing the classifier using gridsearch.

The project is to determine breast cancer using the Scikit Learn breast cancer dataset.

Reference: Breast Cancer Dataset

Load Data

from sklearn.datasets import load_breast_cancer cancer = load_breast_cancer()

The data set is presented in a dictionary form so lets check the Keys elements present in it.

cancer.keys()
Out:
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

The full description is available with the DESCR key as also mentioned in the Scikit Learn documentation. You can execute the below command to check the description.

print(cancer['DESCR'])

The feature names provide names of all the features present in the data.

cancer['feature_names']

Set up DataFrames for features and target values

df_feat = pd.DataFrame(cancer['data'],columns=cancer['feature_names']) df_target = pd.DataFrame(cancer['target'],columns=['Cancer'])

Train Test Split

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), test_size=0.30, random_state=101)

Model Training

from sklearn.svm import SVC
model = SVC()
model.fit(X_train,y_train)
Out:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Predictions and Evaluations

predictions = model.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))
Out:
[[  0  66]
 [  0 105]]

Print classification report:

print(classification_report(y_test,predictions))
 Out: 
             precision    recall  f1-score   support

          0       0.00      0.00      0.00        66
          1       0.61      1.00      0.76       105

avg / total       0.38      0.61      0.47       171

The model is wrong as all data is classified as only class 1. Either the model parameters needs to be adjusted or the data has to be normalized.

Applying Gridsearch

First lets just look at applying gridsearch without any normalization. SVM has two hyper-parameters ‘C’ and ‘gamma’ and we will use the standard starting parameters which are generally used for them. Since gridsearch does the computation for all combinations of test values, it is an expensive operation. We should pass only limited values and if we find that the model is still increasing as we go to the highest or lowest values of the range of data passed, we should then try for next set of values beyond the initial range.

param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['rbf']} from sklearn.model_selection import GridSearchCV grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)

Now we need to fit the model. What fit does is a bit more involved then usual. First, it runs the same loop with cross-validation, to find the best parameter combination. Once it has the best combination, it runs fit again on all data passed to fit (without cross-validation), to built a single new model using the best parameter setting.

grid.fit(X_train,y_train)
Out:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=3)

Inspect the best parameters found by GridSearchCV in the bestparams attribute, and the best estimator in the best_estimator_ attribute:

grid.best_params_
Out:
{'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}

Display the full settings of best estimator.

grid.best_estimator_
Out:
SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Re-run the predictions on this grid object and print confusion matrix.

grid_predictions = grid.predict(X_test)
print(confusion_matrix(y_test,grid_predictions))
 Out: 
[[ 60   6]
 [  3 102]]

Print classification report.

print(classification_report(y_test,grid_predictions))
Out:
             precision    recall  f1-score   support

          0       0.95      0.91      0.93        66
          1       0.94      0.97      0.96       105

avg / total       0.95      0.95      0.95       171

Inference

We have got much better results with Grid Search. 162 values have been correctly predicted across classes.

Try Data Normalization

Now lets see if data normalization has any improvements. It could happen if the feature values are on very different scales.

from sklearn.preprocessing import MinMaxScaler
data = df_feat.copy()
scaler = MinMaxScaler()
print(scaler.fit(data))
print(scaler.data_max_)
print(scaler.transform(data))
data = scaler.transform(data)

Test Train Split

X_train, X_test, y_train, y_test = train_test_split(data, np.ravel(df_target), test_size=0.30, random_state=101)

Model Training

model = SVC()
model.fit(X_train,y_train)
Out:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Run predictions and print confusion matrix.

predictions = model.predict(X_test)
print(confusion_matrix(y_test,predictions))
 Out:
[[ 55  11]
 [  0 105]]

Print classification report.

print(classification_report(y_test,predictions))
Out:
             precision    recall  f1-score   support

          0       1.00      0.83      0.91        66
          1       0.91      1.00      0.95       105

avg / total       0.94      0.94      0.93       171

Inference

A total of 160 True results which is just 2 short than Result marginally worse than Grid search results.

Grid Search Over Normalized Data

We can try to apply Grid search over the normalized data and see if that makes any further improvements.

grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)
grid.fit(X_train,y_train)
Out:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=3)

Display the best parameters.

grid.best_params_
Out:
{'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}

Display the best estimator values.

grid.best_estimator_
Out:
SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Make the new predictions and print confusion matrix.

grid_predictions = grid.predict(X_test)
print(confusion_matrix(y_test,grid_predictions))
 Out:
[[ 63   3]
 [  1 104]]

Print the classification report.

print(classification_report(y_test,grid_predictions))
 Out: 
             precision    recall  f1-score   support

          0       0.98      0.95      0.97        66
          1       0.97      0.99      0.98       105

avg / total       0.98      0.98      0.98       171

Inference

Grid search over the normalized data has given the best results. 167 out of 171 target data has been correctly predicted.

Gridsearch

Load Data

Set up DataFrames for features and target values

Train Test Split

Model Training

Predictions and Evaluations

Applying Gridsearch

Inference

Try Data Normalization

Test Train Split

Model Training

Inference

Grid Search Over Normalized Data

Inference

Pages

Recent Posts

Archives

Categories