Experimenting with Multi-modal Information to Predict Success of Indian IPOs
Sohom Ghosh, Arnab Maji, N Harsha Vardhan, Sudip Kumar Naskar
TL;DR
This work addresses predicting short-term IPO success in the Indian market by assembling MB and SME IPO datasets and fusing structured features with unstructured text from DRHP/RHP through a Retrieval-Augmented Generation pipeline. It deploys DeBERTa-based classifiers and Nomic embeddings, combined with AutoML ensembles, to predict direction and underpricing for listing-day Open, High, and Close prices, and it investigates Grey Market Premium as a predictor, comparing against zero-shot LLM baselines. The study contributes two curated datasets, a robust multi-modal methodology, and a nuanced analysis showing when text signals and GMP are most informative, with MB generally benefiting more from GMP than SME. The results demonstrate the value of integrating textual financial content with traditional market features for IPO pricing tasks and highlight practical implications for investors and regulators in emerging markets, while also outlining clear avenues for future work in multi-modal embeddings and broader market contexts.
Abstract
With consistent growth in Indian Economy, Initial Public Offerings (IPOs) have become a popular avenue for investment. With the modern technology simplifying investments, more investors are interested in making data driven decisions while subscribing for IPOs. In this paper, we describe a machine learning and natural language processing based approach for estimating if an IPO will be successful. We have extensively studied the impact of various facts mentioned in IPO filing prospectus, macroeconomic factors, market conditions, Grey Market Price, etc. on the success of an IPO. We created two new datasets relating to the IPOs of Indian companies. Finally, we investigated how information from multiple modalities (texts, images, numbers, and categorical features) can be used for estimating the direction and underpricing with respect to opening, high and closing prices of stocks on the IPO listing day.
