Today Stripe added the ability to copy the text of their docs or view the page as markdown:

This is an example of a company proactively making the contents of their webpages more readily available for LLMs.

However, this is one of over 50 billion webpages. There's a tragedy of the commons problem – most webpages won't make /llms.txt or equivalent available anytime soon.

In the meantime, that's what stringme.dev is for. Just prepend stringme.dev to any URL and get a relatively clean, LLM-optimized plain text response. At least that's the dream.

At one point during development, I used stringme.dev to generate a quick plain text version of Vercel docs to paste it into the Cursor agent. It worked pretty well!

Here are a few updates I've made – though I'm currently battling Vercel functions to get those working properly in production:

  • Incorporated Firecrawl as a better way to scrape webpages and get the HTML.
  • Passing that scraped HTML to Mozilla's Readability.js to isolate the main content.
  • Passing the main content to Open (GPT-4o-mini, for now) to get a good summary and list of key facts. I wanted to do this without using an LLM, but I caved.

The next steps are to:

  • Fix Vercel functions (they're harder to deal with than I thought) or deploy the back-end to something like Fly.io.
  • Implement caching – ideally through Vercel's edge network, but the experience with functions limits our optimism about that.
  • Improve scraping against certain websites that have intense anti-scraping measures.

And as a behind the scenes note: I write these posts in Cursor, all lowercase and kind of a ramble. Then when I'm done I use cmd+K to apply the following edit:

can you punctuate this properly, wrap each line in a <p> tag, and make it slightly more readble