How close is AI to replacing product managers?
๐ Abstract
The article explores the capabilities of AI models, particularly ChatGPT, in performing various product management tasks. It describes a collaboration between the author, Lenny, and a prompt engineer, Mike Taylor, to test how well AI can handle common PM responsibilities compared to humans. The goal is to establish a benchmark for measuring AI's progress in potentially replacing product managers.
๐ Q&A
[01] Developing a Product Strategy
1. How did the AI-generated product strategy for YouTube Music perform compared to the human-written one? The AI-generated strategy beat the human-written one, with 55% of the votes preferring the AI version. However, 77% of people correctly guessed that the AI version was solution B. The main criticism of the AI version was that it felt more like a list of features rather than a true strategic plan. The author notes that the AI's reasoning ability is an active research area and will likely see major improvements in the future.
2. What strategies can product managers use to make their work less replaceable by AI? Incorporating personal interests and niche references into the work can help make it more noticeably human and less generic. The author also suggests that small details like grammatical errors can help humanize the output.
[02] Defining KPIs
1. How did the AI-generated KPIs for DoorDash perform compared to the human-written ones? The AI-generated KPIs were preferred, with 68% of the votes. 70% of people correctly guessed that solution A was the AI version. The author notes that the comprehensiveness of the AI's answer was a key factor in its success, though some found it too verbose.
2. What strategies can product managers use to make their AI-generated responses appear more human-like? The author suggests finding ways to make the AI response less verbose or wordy, as this was a common tell that the answer was AI-generated. Striking the right balance between comprehensiveness and conciseness seems important.
[03] Estimating ROI of a Feature
1. How did the AI-generated ROI estimates for a new Meta feature perform compared to the human-written ones? The human-written ROI estimates were preferred, with 58% of the votes. Only 65% of people correctly guessed which was the AI version. The author notes that the human answer lacked specific numbers, which may have given the AI an advantage if it had provided more quantitative estimates.
2. What challenges did the author face in evaluating human vs. AI performance on this task? The author found that there was some disagreement on whether the human-written answer was good, as it did not focus on monetization metrics. This highlights the subjectivity in evaluating these types of strategic tasks, as different product managers may have different priorities.
[04] Future Benchmarking Approach
1. What framework does the author propose for expanding the benchmarking of AI vs. human performance on PM tasks? The author suggests aligning the benchmarks with Lenny's framework for categorizing PM skills, which covers areas like shaping the product, shipping the product, and syncing the people. This would provide a more comprehensive view of which parts of the PM role are most automatable.
2. What methodological improvements does the author suggest for future testing? The author proposes using a more automated voting mechanism, avoiding data contamination, and testing across a wider range of models and prompting techniques. They also suggest the need to collect more real-world examples of human performance to use as benchmarks.