OpenAI launches GPT-5 with built-in agent mode
GPT-5 ships today with native tool use, a 1M-token window, and a new ‘agent’ runtime that can drive a browser, a terminal, and your filesystem — without LangChain.
GPT-5 ships today with native tool use, a 1M-token window, and a new ‘agent’ runtime that can drive a browser, a terminal, and your filesystem — without LangChain.
OpenAI announced GPT-5 this morning. It is a single model with two switches: one for thinking (slower, more thorough) and one for agent mode (a sandboxed runtime that can use a browser, a shell, and a virtual filesystem on your behalf). The default context window is 1M tokens. Pricing lands at roughly half of GPT-4o for the standard tier.
The headline trick: the model can now answer with a tool call OR a chain of tool calls without you wiring an agent loop. If you set agent_mode=true and pass it a goal, it produces, runs, inspects, and revises its own steps until done — or until it asks for help.
GPT-5 was trained with a new RLHF variant OpenAI is calling ‘interactive correction’. Annotators didn’t just rank final answers — they watched a model attempt a task end-to-end and corrected it mid-stream. The result is a model that backtracks more cleanly and asks better clarifying questions when the goal is ambiguous.
Two numbers worth memorizing: SWE-bench Verified is up to 78.4 (from 49.8 for GPT-4o), and the 1M-token window stays coherent in OpenAI’s long-context evals out past 700K tokens. That second number is the interesting one — long-context models often degrade past 100K, but GPT-5 reads a small codebase the way GPT-4o reads a long email.
Agent mode is a managed runtime. You hand the API a goal, optional starter files, and the maximum cost ceiling. The model gets a shell, a headless browser, and a scratchpad. It can run code, save files, and re-read what it wrote. You get a streamed event log so you can stop it the moment it goes sideways.
The first time I watched it debug its own SQL by reading the error and adding LIMIT 5, then re-running, I closed the laptop and went for a walk.
— Indie dev who joined the preview last month
It is not a magic engineer. We tested four of our recent build guides under agent mode, and it nailed the easy ones in a single shot. The medium ones needed one nudge. The spicy ones (the swarm-orchestrator multi-agent one) still failed in the same place humans fail — it could not decide between two libraries with overlapping APIs.
The cheaper price is what will matter for builders. If you were burning $40/month on GPT-4o for a side project, the same usage on GPT-5 is closer to $18. That alone is enough to make agent loops financially boring instead of a Big Deal.
GPT-5 is a drop-in via the same /chat/completions endpoint. Change the model name. Run your test suite. If your prompts relied on JSON-mode quirks of GPT-4o, double-check those — the new model is stricter about schemas.
See the model swap guide →Agent mode + Gmail = an inbox assistant that drafts replies, files threads, and books meetings — all from natural-language goals. Our voice-clone guide already has the Gmail OAuth piece done; you can reuse it.
Voice Clone guide →If you’ve been chunking documents for retrieval just to fit GPT-4o’s window, try the naive ‘paste the whole thing’ approach on GPT-5 first. For docs under 700K tokens it is now usually faster, simpler, and cheaper than your RAG pipeline.
We pointed agent mode at four of our published guides and graded the results. Two passed first try. One needed a single nudge (‘use psycopg, not psycopg2-binary’). One failed exactly where humans fail. We didn’t test the 1M-token window past 400K — we don’t have anything that long worth feeding it.
Numbers in this article from OpenAI’s release post and the SWE-bench leaderboard. We did not verify the SWE-bench number ourselves and will update if we get a re-run.
Get tomorrow's best AI project in your email. With a guide that works. Free. No spam.