Can ChatGPT pass the CFA exam? AI researchers tried to find out

Relax, Wall Street. ChatGPT is still a ways off from passing the chartered financial analyst exam and threatening the jobs of financial professionals the world over.
A team of JPMorgan Chase & Co. researchers and university academics tested whether OpenAI’s ChatGPT and GPT-4 chatbots would have a chance at passing the first two levels of the exam. It typically takes humans four years to complete all three levels of the test, which can lead to higher salaries and better job opportunities.
“Based on estimated pass rates and average self-reported scores, we concluded that ChatGPT would likely not be able to pass the CFA Level I and Level II under all tested settings,” the researchers wrote in an 11-page report.“GPT-4 would have a decent chance of passing the CFA Level I and Level II if prompted.”
The researchers include academics and six staffers from JPMorgan’s AI Research organization, including Sameena Shah and Antony Papadimitriou.
The CFA Institute, which offers the credentials, has spent years revamping its tests to ensure professionals seeking an edge in their careers are familiar with forces driving automation. The institute announced it would add questions on artificial intelligence and methods for analyzing big data to its exams in 2017.
Chris Wiese, managing director for education at the CFA Institute, conceded that large-language models will have the ability to answer some exam questions correctly.
“While multiple choice exams and essay questions remain excellent ways to assess learning and understanding in a secure proctored environment, the day-to-day in finance does not present itself only as a series of short, standalone questions,” Wiese said. “This is why to become a CFA charterholder, we also require 4,000 hours of qualifying work experience, a minimum of two references, a strong moral compass, and, coming soon, the completion of hands-on practical skills modules.”
The company is also considering using a form of large-language model technology to assist CFA candidates’ learning, he said.
Every few months, thousands of candidates sit for the three different levels of the test. Recipients of the charter typically spend more than 300 hours studying for each level of the exam.
Pass rates for the exam have drifted down in recent years, with the average pass rate for the first level of exam hitting 37% in August, compared with the 43% average in 2018.
Common Errors
Level I of the CFA features 180 multiple choice questions, while Level II includes case studies and 88 multiple choice questions. Both large language models struggled more on Level II no matter the type of prompting used, the researchers found.
In Level I, though, both ChatGPT and GPT-4 performed best in the sections of the exam focused on derivatives, alternative investments, corporate issuers, equity investments and ethics. However, both chatbots performed relatively poorly on those focused on financial reporting and portfolio management.
In Level II, ChatGPT struggled on the sections focused on alternative investments and fixed income instruments compared with GPT-4, while ChatGPT outperformed in the areas tied to portfolio management and economics.
Most of ChatGPT’s errors were knowledge based, while GPT-4 most commonly made calculation errors.
“The one error type that GPT-4 makes more frequently than ChatGPT was reasoning errors,” the researchers found. “It would seem that, along with GPT-4’s greater ability to reason, it has a greater chance of ‘talking itself’ into incorrect lines of reasoning.”

