In this tutorial, you are going to learn
1. How to import LightGBM libraries?
2. How to download the dataset?
4. How to split the dataset into training and testing?
5. How to set parameters for the LightGBM model?
6. How to hyper-tune the parameters?
7. How to train a lightGBM model?
8. How to use “RandomSerachCV” to find the best parameters?
9. How to find the best parameters?
10. How to create a LightGBM model with hyper tuned parameters?
Hypertuning Parameters
Hyper-tuning means tweaking the parameters of the model to get better predictions and accuracy. To get good results in the LightGBM model, the following parameters should be tuned.
1. “num_leaves” : This parameter controls the model’s complexity. Too high value results in over-fitting.
2. “min_data_in_leaf” : To control the overfitting in the tree.
3. “max_depth” : To control the depth of the tree.
Parameters For Accuracy
1. “max_bin” : This parameter should be large.
2. “learning_rate” : This parameter should be small.
3. “num_leaves” : To parameter should be large.
Parameters to Avoid Over-fitting
1. “max_bin” : This parameter should be small.
2. “lambda_l1” : This parameter should be small.
3. “lambda_l2” : To parameter should be small.
4. “extra_trees” : This parameter should be used.
1. Import the Libraries
import lightgbm as lgb import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split import seaborn as sns from sklearn import metrics from sklearn.datasets import load_breast_cancer from scipy.stats import randint as sp_randint from scipy.stats import uniform as sp_uniform from sklearn.model_selection import RandomizedSearchCV, GridSearchCV from scipy.stats import uniform as sp_uniform
2. Download Dataset
We are going to download the Breast Cancer dataset from Scikti-Learn datasets.
X, y = load_breast_cancer(return_X_y=True)
3. Split Data
Once the data is normalized then we need to split the data into training and testing dataset.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
4. Set Parameters for LightGBM model
Parameters can be set for the LightGBM model. We are specifying the following parameters
1. “early_stopping_rounds” : To avoid overfitting.
2. “eval_metric” : To specify the Evaluation metric.
3. “eval_set” : To set validation dataset.
4 “verbose“ : To print the data while training the model.
5. “categorical_feature“ : To specify the categorical columns in the dataset.
parameters={"early_stopping_rounds":20, "eval_metric" : 'auc', "eval_set" : [(X_test,y_test)], 'eval_names': ['valid'], 'verbose': 100, 'categorical_feature': 'auto'}
5. Create Parameters to Tune
We are specifying the parameters we want to be tuned. Specifying a range in the parameters will give us the best values.
parameter_tuning ={ 'max_depth': sp_randint(10,50), 'num_leaves': sp_randint(6, 50), 'learning_rate ': [0.1,0.01,0.001], 'min_child_samples': sp_randint(100, 500), 'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4], 'subsample': sp_uniform(loc=0.2, scale=0.8), 'colsample_bytree': sp_uniform(loc=0.4, scale=0.6), 'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100], 'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}
6. Model Training
We are creating an instance of the LightGBM Classifier model. We are going to use the “RandomSearchCV” method to train the model on tuning parameters. This method will give the result from the best model created.
classifier = lgb.LGBMClassifier(random_state=300, silent=True, metric='None', n_jobs=4, n_estimators=5000) find_parameters = RandomizedSearchCV( estimator=classifier, param_distributions=parameter_tuning, n_iter=100, scoring='roc_auc', cv=5, refit=True, random_state=300, verbose=False)
7. Fit Parameters
find_parameters.fit(X_train, y_train, **parameters) print('Best score : {} with parameters: {} '.format(find_parameters.best_score_, find_parameters.best_params_))

8. Best Parameters
best_parameters = find_parameters.best_params_ best_parameters

9. LightGBM Model with Tuned Parameters
best_parameters_model = lgb.LGBMClassifier(**best_parameters) best_parameters_model.set_params(**best_parameters)

Summary
1. “num_leaves” : This parameter controls the model’s complexity. Too high-value results in over-fitting.
2. “min_data_in_leaf” : To control the overfitting in the tree.
3. “max_depth” : To control the depth of the tree.
4. “learning_rate” : To specify the learning rate.
5. “num_leaves” : To specify the number of leaves in the tree.
6. “early_stopping_rounds” : To avoid overfitting.
7. “eval_metric” : To specify the Evaluation metric.
8. “eval_set” : To set validation dataset.
9. “verbose“ : To print the data while training the model.
10. “categorical_feature“ : To specify the categorical columns in the dataset.
11. RandomSearchCV( ) : To find the best parameters.