A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Zhiyu Zoey Chen; Jing Ma; Xinlu Zhang; Nan Hao; An Yan; Armineh Nourbakhsh; Xianjun Yang; Julian McAuley; Linda Petzold; William Yang Wang

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang

TL;DR

This survey maps the deployment of large language models across finance, healthcare, and law (FHL), detailing tasks, datasets, and evaluation methodologies while scrutinizing domain-specific ethics. It highlights how LLMs advance tasks such as financial reasoning, medical information processing, and legal reasoning, yet reveals persistent gaps in numerical reasoning, information extraction, and robust, trustworthy behavior. The authors synthesize state-of-the-art models (e.g., BloombergGPT, MedPaLM, LawGPT) and discuss strategies like retrieval-augmented generation, multimodal fusion, and instruction tuning to address data scarcity and privacy concerns. They offer a forward-looking agenda emphasizing realistic, cross-domain benchmarks, data diversity, human-in-the-loop governance, and responsible deployment to maximize benefits while mitigating risks in these precision-dependent domains.

Abstract

In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{https://github.com/czyssrs/LLM_X_papers}.

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

TL;DR

Abstract

Paper Structure (31 sections, 6 figures, 9 tables)

This paper contains 31 sections, 6 figures, 9 tables.

Introduction
Related Surveys
Finance
Tasks and Datasets in Financial NLP
Financial LLMs
Evaluation and Analysis
LLM-based Methodologies for Financial Tasks and Challenges
Future Prospects
Medicine and Healthcare
Tasks and Benchmarks for Medical NLP
LLMs for Medicine and Healthcare
Abnormality and Ambiguity Detection
Medical Report Generation
Medical Free-form Instruction Evaluation
Medical-Imaging Classification Via Natural Language
...and 16 more sections

Figures (6)

Figure 1: A summarization of existing financial NLP tasks and representative datasets. The yellow field shows the tasks relatively under-explored for LLMs.
Figure 2: Performance comparison on the FinQA dataset DBLP:conf/emnlp/ChenCSSBLMBHRW21. We compare the execution accuracy following the evaluation standard in the original paper. The fine-tuning method FinQANet is the RoBERTa-based model in DBLP:conf/emnlp/ChenCSSBLMBHRW21; The instruction fine-tuning methods include FinMA DBLP:journals/corr/abs-2306-05443 and InvestLM DBLP:journals/corr/abs-2309-13064; The general-purpose LLMs include LlaMA-65B, GPT-3.5 and GPT-4, with zero-shot (0), few-shot (3 shots), and CoT prompting; We also list the human expert and general crowd performances. Results are sourced from li2023chatgptDBLP:journals/corr/abs-2306-05443DBLP:journals/corr/abs-2309-13064.
Figure 3: A summarization of medical NLP tasks and representative datasets. The yellow field shows the tasks relatively under-explored for LLMs.
Figure 4: High-level illustration of concept bottleneck models yan2023robust. It uses concepts for medical image classification to achieve interpretability and robustness while maintaining accuracy. Left: Classification with a classical neural encoder; Right: Classification with natural language concepts. A Chest X-ray from a healthy old individual may be classified as Covid-19 due to the patient's age, while introducing language can mitigate the effect of these confounding factors.
Figure 5: A summarization of existing legal NLP tasks and datasets. The yellow field shows other legal tasks.
...and 1 more figures

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

TL;DR

Abstract

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Authors

TL;DR

Abstract

Table of Contents

Figures (6)