Movie Revenue Prediction using Machine Learning Models

Vikranth Udandarao; Pratyush Gupta

Movie Revenue Prediction using Machine Learning Models

Vikranth Udandarao, Pratyush Gupta

TL;DR

The paper tackles predicting movie box office revenue from a rich set of features by systematically comparing linear and ensemble regression models. It adopts a refined single-origin dataset and applies preprocessing, feature encoding, scaling, and log-based feature engineering, followed by hyperparameter tuning with GridSearchCV. Gradient Boosting consistently yields the strongest performance, achieving a final training $R^2 \approx 0.916$ and test $R^2 \approx 0.824$, with favorable MAPE across models; XGBoost and other ensembles are also evaluated, and a CLI enables practical deployment. The work demonstrates a concrete, deployable approach for producers to forecast profitability and make data-driven production decisions.

Abstract

In the contemporary film industry, accurately predicting a movie's earnings is paramount for maximizing profitability. This project aims to develop a machine learning model for predicting movie earnings based on input features like the movie name, the MPAA rating of the movie, the genre of the movie, the year of release of the movie, the IMDb Rating, the votes by the watchers, the director, the writer and the leading cast, the country of production of the movie, the budget of the movie, the production company and the runtime of the movie. Through a structured methodology involving data collection, preprocessing, analysis, model selection, evaluation, and improvement, a robust predictive model is constructed. Linear Regression, Decision Trees, Random Forest Regression, Bagging, XGBoosting and Gradient Boosting have been trained and tested. Model improvement strategies include hyperparameter tuning and cross-validation. The resulting model offers promising accuracy and generalization, facilitating informed decision-making in the film industry to maximize profits.

Movie Revenue Prediction using Machine Learning Models

TL;DR

and test

, with favorable MAPE across models; XGBoost and other ensembles are also evaluated, and a CLI enables practical deployment. The work demonstrates a concrete, deployable approach for producers to forecast profitability and make data-driven production decisions.

Abstract

Paper Structure (36 sections, 6 figures, 2 tables)

This paper contains 36 sections, 6 figures, 2 tables.

Introduction
Motivation
Rationale
Overview
Literature Review
https://en.wikipedia.org/wiki/Linear_regression
https://en.wikipedia.org/wiki/Decision_tree
https://en.wikipedia.org/wiki/Gradient_boosting
https://en.wikipedia.org/wiki/Bootstrap_aggregating
https://en.wikipedia.org/wiki/Random_forest
https://en.wikipedia.org/wiki/XGBoost
Dataset Selection
Modification of Dataset Strategy
Rationale Behind the Dataset Transition
Benefits of the Optimized Dataset
...and 21 more sections

Figures (6)

Figure 1: Distribution of Movies by Country
Figure 2: Histogram of Gross Categories
Figure 3: Null Values
Figure 4: K Best Features
Figure 5: Training R² Score Curve
...and 1 more figures

Movie Revenue Prediction using Machine Learning Models

TL;DR

Abstract

Movie Revenue Prediction using Machine Learning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)