How to maintain Model Effectiveness after deployed
When we ready to
deploy a good predictive model which given a good accuracy score on training
and testing dataset, there is one more problem to solve. How long will this model effectively solve
the problem with the same high accuracy and what is the strategy to maintain the
model accuracy. We also need to know what action we need to take when the model's effectiveness
on the decline trend. In this article, I am sharing my strategy for validating and maintaining
the predictive model effectiveness. We can two things we can take before deploying
a model in production
First, if possible
add a low percentage of negative test data as part of model results. Negative
testing is a method
of testing an application that ensures that the model handles the unwanted input the way it
should be. For example, for a recommendation model that recommends potential
customers from large customer datasets to call for a marketing purpose based on the customer.
In this model, including a low recommendation score along with high probabilities
customers will help us to validate the model effectiveness. A good model will have high accuracy for positive data at the same time it should give a
low score for negative ingested data. The high success rate from a negative dataset
is a good indication that the current model's training data should be re-evaluated.
When we get good accuracy on the negative data or low accuracy on the data, (expect
the negative sample) will be an indication that the model has some problems.
Second, I would
recommend developing an autoencoder (AE) model with training data our deployed model. I highly recommend developing this AE
model before we deploy the model into production. Using anomaly detection techniques, pass the
recent model input data to get the reconstruction error value. A high reconstruction
error will indicate that the input deviates from the original training data.
Based on business
objective, from recent model results, get the accuracy score on positive data,
and negative data, and get reconstruction error value from the AE model. With these
three values, we can evaluate the effectiveness of the model. Let us see the
possible actions we can take related to model effectiveness
|
High Reconstruction Error |
Low Reconstruction Error |
||
High Accuracy Negative Test |
High Accuracy Negative Test |
Low Accuracy Negative Test |
Low Accuracy Negative Test |
|
High Model Accuracy |
Retrain with new data |
Retrain with new data |
No Action Needed |
No Action Needed |
Low Model Accuracy High |
Redevelop the model from scratch |
Redevelop the model from scratch |
Tune the model with new features |
Tune the model with new features |
Actions 1: Retrain
the model with new data
When
the model produced high accuracy score and negative test
accuracy is also significantly high, get the reconstruction error value from the AE model with recent data. If the reconstruction
error value is low the data is not changed much. The next best action will be retraining the model with recent data.
Actions 2: Retrain
the model with additional features
When
the model produced a low accuracy score on both positive and negative dataset accuracy is significantly high. , the reconstruction
error value is low that indicates that data is not changed much but we need to tune the model with new features. In
this case, you already have the data you need, but retraining the same model on
the same data is not going to help. The action would be develop new features by
doing the feature-engineering step and retrain the model with additional
features. Remember to preserve the features from the original model.
Actions 3:
Develop a new model from scratch
When
the model produced a low accuracy score on recent positive data and negative test
accuracy is significantly high. Furthermore, the reconstruction error value is also high for recent input data is a clear indication that the new recent input data is
much different and has new features than the model originally trained. The next nest action would be repeating the
process of feature extraction, then build, and train a new model from scratch.
Final thoughts
· How often we needed to validate the model is
depends on the frequency of model consumed and the rate the base data will change
over time. Understanding the business problem will help to determine the
frequency of validating the model. When deploy the model, having a plan to
validate the model will be good practice.
· Even though getting feedback directly from users
and incorporate them into the model is great but in practice, getting timely feedback is hard
and challenge to implement auto-tune the model. Try to find the model results
from the business result, instance a call (call-turned-to-customer) success rate on
recommendation model from the call history data than solely depend on the users’
feedback.
· In practice, ingesting negative test data into the model is not an option for many business problems. In those scenarios, the AE model and average accuracy score from the model in a specific period can be used to validate the effectiveness of the model.
Comments
Post a Comment