Adhang Muntaha

Logo

Welcome to my GitHub Page!

This is my portfolio on data science and machine learning. It contains some of my projects which I used to hone my knowledge and skills.

Telco Customer Churn Prediction

In this project, I designed a predictive model to determine the probability that customers will leave the service (churn) or continue to use the service (retain) at a telco company and achieve a sensitivity score of 80%.

In working on this project, I used a workflow based on the CRISP-DM model, starting from business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

Project Notebook & Presentation

Dataset & Business Understanding

Dataset Information

Attribute Information

Note: Since this dataset is using CamelCase format for the column names, for this project, I will convert it to snake_case format.

Company Goals
Increasing profit! But how can we achieve it? Some of the way to increase profit are:

Problems

Objectives

Exploratory Data Analysis

What Happened?

27% customers leave us!

Target Distribution

Top 5 Churn Probability

Top 5 Churn Probability

Top 5 Retain Probability

Top 5 Retain Probability

Attribute Associations to Churn Status

Attribute Associations to Churn Status

Data Preprocessing

I do some data preprocessing, such as:

Model Development & Evaluation

I tried several machine learning algorithms, such as:

Overall, boosting methods show a good performance. Then, I tried to compare some feature selection methods and hyperparameter tuning to see if the performance of boosting methods can be improved.

My tuning strategy focuses on optimizing the positive recall value (not the average) to minimize the occurrence of false negatives, which is when we incorrectly predict customers who actually churn as non-churn. This is because the cost of acquiring new customers is more expensive than retaining existing customers. But, I still pay attention to the accuracy score as well.

To do model selection, I use the harmonic mean (F-beta) of accuracy and recall.

  accuracy recall fbeta
Gradient Boosting Classifier 0.775 0.766 0.771
AdaBoost Classifier 0.759 0.783 0.770
CatBoost Classifier 0.761 0.765 0.763
Hist Gradient Boosting 0.756 0.781 0.768
XGBoost 0.761 0.779 0.770
LightGBM 0.762 0.791 0.777

Conclusion

Final Model
LightGBM with feature selection using filter method and get:

Recommendation and Request

Explainable AI

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model.

See papers for details and citations.

Decision Plot

SHAP Multiple Decision Plot

Waterfall Plot

SHAP Waterfall Plot

Model Deployment

I had deployed my model on a web app using Flask and Heroku. You can try it here