LIVEReading: NewsUpdated: 10 min agoSubscribers: 23,400 LIVEReading: NewsUpdated: 10 min agoSubscribers: 23,400
CBW

OpenAI launches GPT-5 with built-in agent mode

GPT-5 ships today with native tool use, a 1M-token window, and a new ‘agent’ runtime that can drive a browser, a terminal, and your filesystem — without LangChain.

OpenAI announced GPT-5 this morning. It is a single model with two switches: one for thinking (slower, more thorough) and one for agent mode (a sandboxed runtime that can use a browser, a shell, and a virtual filesystem on your behalf). The default context window is 1M tokens. Pricing lands at roughly half of GPT-4o for the standard tier.

The headline trick: the model can now answer with a tool call OR a chain of tool calls without you wiring an agent loop. If you set agent_mode=true and pass it a goal, it produces, runs, inspects, and revises its own steps until done — or until it asks for help.

What changed under the hood

GPT-5 was trained with a new RLHF variant OpenAI is calling ‘interactive correction’. Annotators didn’t just rank final answers — they watched a model attempt a task end-to-end and corrected it mid-stream. The result is a model that backtracks more cleanly and asks better clarifying questions when the goal is ambiguous.

Two numbers worth memorizing: SWE-bench Verified is up to 78.4 (from 49.8 for GPT-4o), and the 1M-token window stays coherent in OpenAI’s long-context evals out past 700K tokens. That second number is the interesting one — long-context models often degrade past 100K, but GPT-5 reads a small codebase the way GPT-4o reads a long email.

Agent mode, in practice

Agent mode is a managed runtime. You hand the API a goal, optional starter files, and the maximum cost ceiling. The model gets a shell, a headless browser, and a scratchpad. It can run code, save files, and re-read what it wrote. You get a streamed event log so you can stop it the moment it goes sideways.

The first time I watched it debug its own SQL by reading the error and adding LIMIT 5, then re-running, I closed the laptop and went for a walk.

Indie dev who joined the preview last month

What it’s not

It is not a magic engineer. We tested four of our recent build guides under agent mode, and it nailed the easy ones in a single shot. The medium ones needed one nudge. The spicy ones (the swarm-orchestrator multi-agent one) still failed in the same place humans fail — it could not decide between two libraries with overlapping APIs.

Pricing and rollout

The cheaper price is what will matter for builders. If you were burning $40/month on GPT-4o for a side project, the same usage on GPT-5 is closer to $18. That alone is enough to make agent loops financially boring instead of a Big Deal.

// How to use this

Three things you can ship this weekend

  1. 01

    Swap your existing GPT-4o calls — no code changes

    GPT-5 is a drop-in via the same /chat/completions endpoint. Change the model name. Run your test suite. If your prompts relied on JSON-mode quirks of GPT-4o, double-check those — the new model is stricter about schemas.

    See the model swap guide →
  2. 02

    Build the email agent we’ve been waiting for

    Agent mode + Gmail = an inbox assistant that drafts replies, files threads, and books meetings — all from natural-language goals. Our voice-clone guide already has the Gmail OAuth piece done; you can reuse it.

    Voice Clone guide →
  3. 03

    Stop renting a bigger context

    If you’ve been chunking documents for retrieval just to fit GPT-4o’s window, try the naive ‘paste the whole thing’ approach on GPT-5 first. For docs under 700K tokens it is now usually faster, simpler, and cheaper than your RAG pipeline.

// what we actually tested

What we actually tested

We pointed agent mode at four of our published guides and graded the results. Two passed first try. One needed a single nudge (‘use psycopg, not psycopg2-binary’). One failed exactly where humans fail. We didn’t test the 1M-token window past 400K — we don’t have anything that long worth feeding it.

Numbers in this article from OpenAI’s release post and the SWE-bench leaderboard. We did not verify the SWE-bench number ourselves and will update if we get a re-run.

// daily build

One project. 5 minutes. Daily.

Get tomorrow's best AI project in your email. With a guide that works. Free. No spam.

23,400 builders read this