Machine LearningPandas

How to Hypertune LightGBM model parameters to get the best accuracy?

How to Hypertune LightGBM model parameters to get the best accuracy?

 In this tutorial, you are going to learn

 

1. How to import LightGBM libraries?

2. How to download the dataset?

4. How to split the dataset into training and testing?

5. How to set parameters for the LightGBM model?

6. How to hyper-tune the parameters?

7. How to train a lightGBM model?

8. How to use “RandomSerachCV” to find the best parameters?

9. How to find the best parameters?

10. How to create a LightGBM model with hyper tuned parameters?

 

Hypertuning Parameters

 

Hyper-tuning means tweaking the parameters of the model to get better predictions and accuracy. To get good results in the LightGBM model, the following parameters should be tuned.

1. “num_leaves” : This parameter controls the model’s complexity. Too high value results in over-fitting. 

2. “min_data_in_leaf” : To control the overfitting in the tree.

3. “max_depth” : To control the depth of the tree. 

Parameters For Accuracy

 

1. “max_bin” : This parameter should be large.

2. “learning_rate” : This parameter should be small.

3. “num_leaves” : To parameter should be large.

Parameters to Avoid Over-fitting

 

1. “max_bin” : This parameter should be small.

2. “lambda_l1” : This parameter should be small.

3. “lambda_l2” : To parameter should be small.

4. “extra_trees” :  This parameter should be used.

1. Import the Libraries

 
The first step is to import all the necessary libraries.
import lightgbm as lgb
import numpy as np 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn import metrics
from sklearn.datasets import load_breast_cancer
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from scipy.stats import uniform as sp_uniform

2. Download Dataset

 

We are going to download the Breast Cancer dataset from Scikti-Learn datasets.

X, y = load_breast_cancer(return_X_y=True)

3. Split Data

 

Once the data is normalized then we need to split the data into training and testing dataset.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

4. Set Parameters for LightGBM model

 

Parameters can be set for the LightGBM model. We are specifying the following parameters 

1. early_stopping_rounds” : To avoid overfitting.

2. “eval_metric” : To specify the Evaluation metric.

3. “eval_set” : To set validation dataset.

4  verbose :  To print the data while training the model.

5. categorical_feature : To specify the categorical columns in the dataset.

parameters={"early_stopping_rounds":20, 
            "eval_metric" : 'auc', 
            "eval_set" : [(X_test,y_test)],
            'eval_names': ['valid'],
            'verbose': 100,
            'categorical_feature': 'auto'}

5. Create Parameters to Tune

 

We are specifying the parameters we want to be tuned. Specifying a range in the parameters will give us the best values.

parameter_tuning ={
             'max_depth': sp_randint(10,50),
             'num_leaves': sp_randint(6, 50), 
             'learning_rate ': [0.1,0.01,0.001],
             'min_child_samples': sp_randint(100, 500), 
             'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
             'subsample': sp_uniform(loc=0.2, scale=0.8), 
             'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
             'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
             'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}

6. Model Training 

 

We are creating an instance of the LightGBM Classifier model. We are going to use the “RandomSearchCV” method to train the model on tuning parameters. This method will give the result from the best model created.

classifier = lgb.LGBMClassifier(random_state=300, silent=True, metric='None', n_jobs=4, n_estimators=5000)

find_parameters = RandomizedSearchCV(
    estimator=classifier, param_distributions=parameter_tuning, 
    n_iter=100,
    scoring='roc_auc',
    cv=5,
    refit=True,
    random_state=300,
    verbose=False)

7. Fit Parameters

find_parameters.fit(X_train, y_train, **parameters)
print('Best score : {} with parameters: {} '.format(find_parameters.best_score_, find_parameters.best_params_))
how to find the best parameters in lightgbm model?

8. Best Parameters

best_parameters = find_parameters.best_params_
best_parameters
how to print the best parameters of lightgbm model?

9. LightGBM Model with Tuned Parameters 

best_parameters_model = lgb.LGBMClassifier(**best_parameters)
best_parameters_model.set_params(**best_parameters)
how to create a ligthgbm model with hyper tuned parameters?

 Summary

 

1. “num_leaves” : This parameter controls the model’s complexity. Too high-value results in over-fitting. 

2. “min_data_in_leaf” : To control the overfitting in the tree.

3. “max_depth” : To control the depth of the tree. 

4. “learning_rate” : To specify the learning rate.

5. “num_leaves” : To specify the number of leaves in the tree.

6. early_stopping_rounds” : To avoid overfitting.

7. “eval_metric” : To specify the Evaluation metric.

8. “eval_set” : To set validation dataset.

9.  verbose :  To print the data while training the model.

10. categorical_feature : To specify the categorical columns in the dataset.

11. RandomSearchCV( ) : To find the best parameters.

 You can find the Github link here.

Leave a Reply

Your email address will not be published. Required fields are marked *