Machine LearningPython

How to retrain a saved LightGBM model?

how to retrain a saved lightgbm model?

 In this tutorial, you are going to learn

 

1. What is import the LightGBM libraries?

2. How to download the dataset?

3. How to explore the dataset?

4. How to split the dataset into training and testing?

5. How to set parameters for the LightGBM model?

6. How to create a dataset for the LightGBM model?

7. how to implement a LightGBM model for multi-class classification?

8. How to calculate the classification report for the trained model?

9. How to save a trained LightGBM model?

10. How to find the current iteration of the model?

11. How to retrain a LightGBM model?

12. How to find feature importance in the LightGBM model?

 

1. Import the Libraries

 
The first step is to import all the necessary libraries.
import lightgbm as lgb
import numpy as np 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits
from sklearn.metrics import classification_report

2. Download Dataset

 

We are going to download the Digit dataset from Scikti-Learn datasets.

digits = load_digits()

3. Explore Dataset

 

We are going to plot some of the images in the training dataset.

plt.figure(figsize=(4,4))
for value, (images, labels) in enumerate(zip(digits.data[0:5], digits.target[0:5])):
    plt.subplot(1, 5, value + 1)
    plt.imshow(np.reshape(images, (8,8)))
    plt.title('%i\n' % labels)
how to explore dataset in lightgbm?

4. Splitting Data

 

Once the data is normalized then we need to split the data into training and testing dataset.

X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.01, random_state=0)
X_train,X_train_2, y_train, X_test_2 = train_test_split(digits.data, digits.target, test_size=0.5, random_state=0)

5. Set Parameters for LightGBM model

 

Parameters can be set for the LightGBM model. We are specifying the following parameters 

1. learning_rate” : To specify the learning rate in the LightGBM model.

2. “objective” : To set binary or multi-class classification in the LightGBM model.

3. “metric” : To specify loss metric. 

4  max_depth :  To specify the maximum depth of a tree in the LightGBM model.

5. num_class : To specify the number of classes in the dataset.

params={}
params['learning_rate']=0.04
params['boosting_type']='gbdt' 
params['objective']='multiclass'
params['metric']='multi_logloss'
params['max_depth']=10
params['num_class']=10 

6. Create LightGBM Dataset

 

We can create a lightgbm dataset by using the Dataset( ) method. The parameters for this method are

1. The features columns.

2. The label column.

training_dataset=lgb.Dataset(X_train, label=y_train)
testing_dataset=lgb.Dataset(X_test, label=y_test)
retrain_dataset=lgb.Dataset(X_train_2, label=X_test_2)

7. Model Training and Prediction 

 

We are going to train our LIghGBM model with a customized dataset. We need to specify the number of rounds in the parameter.

classifier=lgb.train(params,training_dataset,100)
y_predictions=classifier.predict(X_test)
y_predictions[:5]
how to train a lightgbm model?

8. Rounding Predictions

 

The model predictions will be in probabilities. We need to find out the label using the argmax method. Argmax( ) method will return the index with maximum probability. 

y_predictions_2 = [np.argmax(value) for value in y_predictions]
y_predictions_2[:10]
how to round predictions in lightgbm?

9. Classification Report

print(classification_report(y_predictions_2,y_test))

10. Save LightGBM Model  

 

The trained model can be saved using the save_model( ) method. 

classifier.save_model('lightgbclassifier.txt')

 11. Model Current Iteration

To find out the number of iterations on which our model has been trained we can use the current_iteration( ) method.

print("Option 1 current iter# %d" %model.current_iteration())
how to find current iteration in lgithgbm model?

 12. Retrain Model

To retrain a model, we need to load the saved model. Once the saved model is loaded, we will create a new model instance.

“init_model” parameter is used to specify to load the existing model into the new model instance. In this was the new model instance will have the old iterations history.

claasifier_retrain = lgb.train(params, retrain_dataset, num_boost_round = 100,
                init_model=model)
print("Retrained Model Iteration# %d" %claasifier_retrain.current_iteration())

 14. Feature Importance

We can find out the feature importance in the model using the plot_importance( ) method.

fig, ax = plt.subplots(figsize=(20, 10))
lgb.plot_importance(claasifier_retrain,ax=ax)

 Summary

 

1. wget : To download the data.

2. train_test_split( ) : To split the dataset into training and testing.

3. pairplot( ) : To visualize the data distribution the dataset. 

5. learning_rate” : Parameter to specify learning rate in LightGBM model.

6. “objective” : Parameter to set binary or multi-class classification in LightGBM model.

7. “metric” : Parameter to specify loss metric. 

max_depth :  Parameter to specify maximum depth of a tree in LightGBM model.

9. num_class : Parameter to specify the number of classes in the dataset..

10. Dataset( ) : To create LightGBM dataset,

11. train( ) : To train a LIghtGBM model.

12. predict( ) : To predict values from a trained model.

13. classification_report( ) : To find the classification report of a trained model.

14. save_model( ) : To save a trained model.

15. current_iteration( ) : To find the current iteration of the model.

16. plot_importance( ) : To plot feature importance in LightGBM model.

 You can find the Github link here.

Leave a Reply

Your email address will not be published. Required fields are marked *