Hotel Booking Cancellation Predictor

Binary classification pipeline for predicting hotel booking cancellations using XGBoost, evaluated with ROC-AUC.

Overview

Built for a private Kaggle competition, this project involved predicting whether a hotel booking would be canceled based on provided features. A full machine learning pipeline was developed and tuned, resulting in a cross-validated ROC-AUC score of 0.95. The solution pipeline included extensive feature preprocessing, categorical encoding, normalization, hyperparameter tuning, and cross-validation to ensure generalizable performance.

Technologies

Python, Jupyter Notebook, XGBoost, scikit-learn, scikit-optimize, Pandas, NumPy.

The model training used Stratified K-Fold validation, BayesSearchCV for hyperparameter optimization, one-hot encoding, MinMax scaling, and early stopping to maximize predictive accuracy and prevent overfitting.