Today Stripe added the ability to copy the text of their docs or view the page as markdown:
We've added /llms.txt and Markdown to @stripe docs: https://t.co/mG9FIaXLTS
— Stripe Developers (@StripeDev) March 17, 2025
Use the .md pages to quickly move Stripe knowledge into your LLM of choice. 📄 pic.twitter.com/qkxhlLtHcL
This is an example of a company proactively making the contents of their webpages more readily available for LLMs.
However, this is one of over 50 billion webpages. There's a tragedy of the commons problem – most webpages won't make /llms.txt or equivalent available anytime soon.
In the meantime, that's what stringme.dev is for. Just prepend stringme.dev to any URL and get a relatively clean, LLM-optimized plain text response. At least that's the dream.
At one point during development, I used stringme.dev to generate a quick plain text version of Vercel docs to paste it into the Cursor agent. It worked pretty well!
Here are a few updates I've made – though I'm currently battling Vercel functions to get those working properly in production:
- Incorporated Firecrawl as a better way to scrape webpages and get the HTML.
- Passing that scraped HTML to Mozilla's Readability.js to isolate the main content.
- Passing the main content to Open (GPT-4o-mini, for now) to get a good summary and list of key facts. I wanted to do this without using an LLM, but I caved.
The next steps are to:
- Fix Vercel functions (they're harder to deal with than I thought) or deploy the back-end to something like Fly.io.
- Implement caching – ideally through Vercel's edge network, but the experience with functions limits our optimism about that.
- Improve scraping against certain websites that have intense anti-scraping measures.
And as a behind the scenes note: I write these posts in Cursor, all lowercase and kind of a ramble. Then when I'm done I use cmd+K to apply the following edit:
can you punctuate this properly, wrap each line in a <p> tag, and make it slightly more readble