Notes on using o3

OpenAI launched their o3 model last week, on April 16. Tyler Cowen called it AGI:

I think it is AGI, seriously. Try asking it lots of questions, and then ask yourself: just how much smarter was I expecting AGI to be?

As I've argued in the past, AGI, however you define it, is not much of a social event per se. It still will take us a long time to use it properly. I do not expect securities prices to move significantly (that AI is progressing rapidly already is priced in, and I doubt if the market cares about "April 16th" per se).

Benchmarks, benchmarks, blah blah blah. Maybe AGI is like porn — I know it when I see it.

And I've seen it.

Whether or not a given model is AGI is a semantic question. It's clear we're in the age of AGI, in that computers are rapidly surpassing humans at many tasks.

I wanted to share a few observations about using o3 on a more practical level.

I asked it to brainstorm approaches to the project idea that eventually became DistillJS. It began with:

0. double check the ROI. if you only call the api a few hundred times a month, shipping a local model may not pencil out. bandwidth + user download time + your build effort can outrun token fees.

Listing this as suggestion "0" is hilarious. Suggestions 1-5 reflected deep technical knowledge and creative problem-solving, but it began the list by basically suggesting this was a dumb idea in the first place.

In its suggested approaches, o3 referenced a few technologies and concepts – e.g. ONNX, LoRA – that I wasn't familiar with. In certain moments I felt like Theodore Twombly during this scene in Her:

In general, o3 is pretty verbose. It's like it's proud of its knowledge. It's kind of a showboat.

And in fact, even though it generated impressive demonstrations of knowledge and creativity, it didn't really engage my curiosity the way a chat with models like GPT-4 did.

With GPT-4 and models oriented more toward back-and-forth, I've gone down many rabbit holes and learned about all kinds of things, like a more productive version of a Wikipedia rabbit hole.

With o3, I felt like I was getting A+ essays and research reports in response to my questions rather than being engaged in conversation.

It's like GPT-4 is geared toward dialogue and o3 is geared toward monologue.