Movie Revenue Prediction using Machine Learning Models
Vikranth Udandarao, Pratyush Gupta
TL;DR
The paper tackles predicting movie box office revenue from a rich set of features by systematically comparing linear and ensemble regression models. It adopts a refined single-origin dataset and applies preprocessing, feature encoding, scaling, and log-based feature engineering, followed by hyperparameter tuning with GridSearchCV. Gradient Boosting consistently yields the strongest performance, achieving a final training $R^2 \approx 0.916$ and test $R^2 \approx 0.824$, with favorable MAPE across models; XGBoost and other ensembles are also evaluated, and a CLI enables practical deployment. The work demonstrates a concrete, deployable approach for producers to forecast profitability and make data-driven production decisions.
Abstract
In the contemporary film industry, accurately predicting a movie's earnings is paramount for maximizing profitability. This project aims to develop a machine learning model for predicting movie earnings based on input features like the movie name, the MPAA rating of the movie, the genre of the movie, the year of release of the movie, the IMDb Rating, the votes by the watchers, the director, the writer and the leading cast, the country of production of the movie, the budget of the movie, the production company and the runtime of the movie. Through a structured methodology involving data collection, preprocessing, analysis, model selection, evaluation, and improvement, a robust predictive model is constructed. Linear Regression, Decision Trees, Random Forest Regression, Bagging, XGBoosting and Gradient Boosting have been trained and tested. Model improvement strategies include hyperparameter tuning and cross-validation. The resulting model offers promising accuracy and generalization, facilitating informed decision-making in the film industry to maximize profits.
