March 10, 2025

Product managemement is writing good evals

In a recent talk, OpenAI Chief Product Officer, Kevin Weil, said, "Writing evals, I think, is going to become a core skill for PMs."

In a followup conversation with Lenny, Weil defines the term (18:50):

The easiest way to think about [evals] is almost like a quiz for a model, a test to gauge how well it knows a certain set of subject material, or how good it is at responding to a certain set of questions. So in the same way you take a calculus class and then you have a calculus test that sees if you've learned what you're supposed to learn, you have evals that test how good is the model at creative writing, how good is the model at graduate-level science, how good is the model at competitive coding. So you have these set of evals that basically perform as benchmarks for how smart or capable the model is.

I first encountered the idea of "eval-driven development" via Eugene Yan's excellent 2023 post, Patterns for Building LLM-based Systems & Products. He goes into detail on eval methodologies, but my takeaway was about the importance of incorporating evals into the product development process. He excerpts a Hacker News comment that drives the point home:

How important evals are to the team is a major differentiator between folks rushing out hot garbage and those seriously building products in the space.

Weil's argument for the importance of evals to PMs isn't just a reflection of models becoming more integral to software experiences; it's also a reflection of the changing nature of the work of product management.

Increasingly, the implementation of software ideas is being outsourced to LLMs (via products like Cursor and Lovable). The main role of software creators is to envision the product and guide the model to build and iterate on it.

If you squint, this is what a PM has always done. A good PM writes a product requirements doc, which explains the motivation and goals behind the product, metrics of success, and an outline of how the product will work. This is, as of now, the best way to initialize a project using code generators (see prd.md for more).

So really, writing evals should be an extension of a skill PMs should already have: defining the product and choosing the right indicators of its success.

More on evals and how PMs should approach them in Lenny's Newsletter (guest post by Aman Khan).