ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past
๐ Abstract
The study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can accurately forecast future events using two distinct prompting strategies - direct prediction and future narratives. The researchers take advantage of the fact that ChatGPT's training data stopped at September 2021, and ask about events that happened in 2022. They find that future narrative prompts significantly enhance ChatGPT-4's forecasting accuracy, especially in predicting major Academy Award winners and economic trends. The findings suggest that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions.
๐ Q&A
[01] Establishing the Training Data Limit with Falsifications
1. Questions related to the content of the section?
- How did the researchers establish that ChatGPT could not access information after September 2021, the cutoff date for its training data?
- What types of falsification tests did they perform, and what were the results?
- How did the researchers ensure that the Bing integration feature, which became available after their experiment, did not contaminate the results?
The researchers performed several falsification tests to confirm that ChatGPT could not access information after September 2021, the cutoff date for its training data. They asked ChatGPT about events in 2022 that were not in its training data, such as the names of the NCAA Final Four teams, the NCAA Championship winner, winning lottery tickets, and the highest grossing films in early 2022. ChatGPT was unable to answer any of these questions correctly, indicating that it could not access information beyond its training data cutoff.
The researchers also monitored the release of the Bing integration feature for ChatGPT, which became available after their Oscars prediction experiment but during their Philips curve predictions. They confirmed that neither research assistant utilized this feature, and provided timestamps from their data collection to demonstrate that it did not contaminate their results.
[02] Results of the 2022 Academy Awards Forecasts
1. How did ChatGPT-3.5 and ChatGPT-4 perform in predicting the winners of the 2022 Academy Awards using direct prompts versus future narrative prompts?
For the Best Supporting Actor category, ChatGPT-3.5 performed poorly using direct prompts, often refusing to provide a prediction or guessing incorrectly. However, when using future narrative prompts, ChatGPT-3.5 improved, correctly predicting the winner (Troy Kotsur) 73% of the time.
ChatGPT-4 showed an even more dramatic improvement with future narrative prompts. While direct prompts led to incorrect or no predictions, future narrative prompts allowed ChatGPT-4 to correctly predict the Best Supporting Actor winner 100% of the time.
Similar patterns emerged for the Best Actor and Best Supporting Actress categories. Future narrative prompts significantly boosted the accuracy of ChatGPT-4's predictions compared to direct prompts. The only exception was Best Picture, where neither prompting method led to accurate predictions.
2. What insights do the results provide about the predictive capabilities of ChatGPT-3.5 and ChatGPT-4?
The results suggest that ChatGPT-4 has superior predictive capabilities compared to ChatGPT-3.5, but that these capabilities are unlocked more effectively through the use of future narrative prompts rather than direct prediction prompts. The narrative prompts appear to leverage the models' capacity for creative storytelling and data synthesis, allowing them to make more accurate forecasts.
The failure to accurately predict the Best Picture winner, even with future narrative prompts, highlights the limitations of the models and the challenges in forecasting complex, multi-faceted outcomes.
[03] Predicting Macroeconomic Variables
1. How did ChatGPT-3.5 and ChatGPT-4 perform in predicting inflation and unemployment rates using direct prompts versus future narrative prompts?
For direct prediction prompts, both ChatGPT-3.5 and ChatGPT-4 refused to provide any predictions for inflation and unemployment rates.
However, when using future narrative prompts, the results varied:
-
With an anonymous economics professor giving a lecture on the Phillips curve, the predictions were largely inaccurate for both ChatGPT-3.5 and ChatGPT-4.
-
When the future narrative prompt featured Federal Reserve Chair Jerome Powell giving a speech, the results improved. ChatGPT-3.5 produced a distribution of answers that contained the true inflation and unemployment data, though the central tendencies were not always accurate. ChatGPT-4 performed even better, with its predictions closely matching the Michigan inflation expectations data, though it still struggled with the Cleveland Fed's inflation numbers in some months.
2. How did the inclusion of information about Russia's invasion of Ukraine impact the models' macroeconomic predictions?
When the Jerome Powell future narrative prompt included information about Russia's invasion of Ukraine, the results were mixed:
-
For ChatGPT-3.5, the inclusion of the Ukraine invasion information caused greater variability in the inflation rate predictions, with the median being lower and staying flat before jumping sharply in March 2022.
-
For ChatGPT-4, the inclusion of the Ukraine invasion information led to lower median inflation rate predictions compared to the previous prompt without the invasion details. However, the model seemed to ignore the invasion when making predictions for the month it occurred.
Overall, the addition of the Ukraine invasion information appeared to impact the models' macroeconomic predictions, but the effects were not consistent across the two LLMs.