How to maintain Model Effectiveness after deployed



When we ready to deploy a good predictive model which given a good accuracy score on training and testing dataset, there is one more problem to solve.  How long will this model effectively solve the problem with the same high accuracy and what is the strategy to maintain the model accuracy. We also need to know what action we need to take when the model's effectiveness on the decline trend. In this article, I am sharing my strategy for validating and maintaining the predictive model effectiveness. We can two things we can take before deploying a model in production

First, if possible add a low percentage of negative test data as part of model results. Negative testing is a method of testing an application that ensures that the model handles the unwanted input the way it should be. For example, for a recommendation model that recommends potential customers from large customer datasets to call for a marketing purpose based on the customer. In this model, including a low recommendation score along with high probabilities customers will help us to validate the model effectiveness. A good model will have high accuracy for positive data at the same time it should give a low score for negative ingested data. The high success rate from a negative dataset is a good indication that the current model's training data should be re-evaluated. When we get good accuracy on the negative data or low accuracy on the data, (expect the negative sample) will be an indication that the model has some problems.

Second, I would recommend developing an autoencoder (AE) model with training data our deployed model. I highly recommend developing this AE model before we deploy the model into production.  Using anomaly detection techniques, pass the recent model input data to get the reconstruction error value. A high reconstruction error will indicate that the input deviates from the original training data.

Based on business objective, from recent model results, get the accuracy score on positive data, and negative data, and get reconstruction error value from the AE model. With these three values, we can evaluate the effectiveness of the model. Let us see the possible actions we can take related to model effectiveness

 

High Reconstruction Error

Low Reconstruction Error

High Accuracy Negative Test

High Accuracy Negative Test

Low Accuracy Negative Test

Low Accuracy

Negative Test

High Model Accuracy

Retrain with new data

Retrain with new data

No Action Needed

No Action Needed

Low Model Accuracy High

Redevelop the model from scratch

Redevelop the model from scratch

Tune the model with new features

Tune the model with new features

Actions 1:  Retrain the model with new data

                When the model produced high accuracy score and negative test accuracy is also significantly high, get the reconstruction error value from the AE model with recent data. If the reconstruction error value is low the data is not changed much. The next best action will be retraining the model with recent data.

Actions 2:  Retrain the model with additional features

                When the model produced a low accuracy score on both positive and negative dataset accuracy is significantly high.  , the reconstruction error value is low that indicates that data is not changed much but we need to tune the model with new features. In this case, you already have the data you need, but retraining the same model on the same data is not going to help. The action would be develop new features by doing the feature-engineering step and retrain the model with additional features. Remember to preserve the features from the original model.

Actions 3:  Develop a new model from scratch

                When the model produced a low accuracy score on recent positive data and negative test accuracy is significantly high. Furthermore, the reconstruction error value is also high for recent input data is a clear indication that the new recent input data is much different and has new features than the model originally trained.  The next nest action would be repeating the process of feature extraction, then build, and train a new model from scratch.

Final thoughts

·        How often we needed to validate the model is depends on the frequency of model consumed and the rate the base data will change over time. Understanding the business problem will help to determine the frequency of validating the model. When deploy the model, having a plan to validate the model will be good practice.

·        Even though getting feedback directly from users and incorporate them into the model is great but in practice, getting timely feedback is hard and challenge to implement auto-tune the model. Try to find the model results from the business result, instance a call (call-turned-to-customer) success rate on recommendation model from the call history data than solely depend on the users’ feedback.

·        In practice, ingesting negative test data into the model is not an option for many business problems. In those scenarios, the AE model and average accuracy score from the model in a specific period can be used to validate the effectiveness of the model.

Thanks

Comments

Popular posts from this blog

Deep Learning: Leverage Transfer Learning

Markdown for Jupyter notebooks cheatsheet