In partnership with

What Just Happened

Ten days after the US government forced Anthropic to pull Fable 5 and Mythos offline, a Tokyo lab shipped the answer. Sakana AI, the Japanese research outfit known for evolutionary model-merging and the "AI Scientist," launched Sakana Fugu and Fugu Ultra today. The pitch is deliberate and pointed: frontier capability on par with the banned models, without the risk of export controls. Where Fable and Mythos got switched off worldwide by a single government directive on June 12, Fugu is built so that no single provider's disappearance can take it down. And the benchmarks back the claim up to a real degree: Fugu Ultra goes shoulder-to-shoulder with Fable 5 and Mythos Preview, the exact models most of the world can no longer access. The most interesting part is how Sakana pulled it off, because Fugu is not a bigger model. It is a fundamentally different idea about what a model even is. Here is what they actually built, what it can do, and the honest catch.

ARTIFICIAL INTELLIGENCE
🌷 What Fugu Actually Is

Fugu is not a monolithic model like Claude or GPT. It is an orchestration model. From your side, you call one OpenAI-compatible endpoint and it behaves like a single model. Internally, Fugu is itself a language model trained to call a pool of other frontier models, including copies of itself, splitting your task across specialists that act as thinker, worker, and verifier, then combining their work into one answer.

The key word is "trained." Fugu is not a hardcoded router with if-then rules about which model handles what. Sakana trained it to learn when to delegate, how the sub-agents should talk to each other, and how to synthesize their outputs into something reliable. That work builds on two of Sakana's peer-reviewed ICLR 2026 papers, Trinity and Conductor, on learned model orchestration. That academic lineage is why this is more than prompt engineering dressed up as a product.

It ships in two tiers, both behind one API. Regular Fugu balances quality and speed for everyday coding, review, and chat, and lets teams opt specific models out of the pool for privacy or compliance. Fugu Ultra coordinates a deeper pool of experts for maximum quality on hard, multi-step problems. The flagship ID is fugu-ultra-20260615.

What It Can Actually Do

Sakana showed off some genuinely striking demos, the kind that test whether an AI can hold itself together across long, complex tasks rather than just answer a prompt.

In one, they had Fugu Ultra play four back-to-back games of blindfold chess, no board ever shown, forcing the system to hold the entire game state in memory across an extended session. It is a clean test of whether the model drifts or loses the thread over time, the exact failure mode that kills most agents in the boring middle of a long task. In another, they tasked Fugu Ultra with designing a mechanical iris in CAD, the multi-blade aperture mechanism in a camera lens, where many parts have to move together precisely to open and close a center hole. That is a test of whether an AI can produce functional, physically coherent engineering, not just plausible-looking output.

Benchmarks Via Sakana AI - They Actually Imply That It Is Better Than Mythos (Unverified)

Early users echo the pattern. A software engineer reported Fugu Ultra surfacing over twenty real issues in a code review where other tools found about three. A cybersecurity engineer said it kept a scoped security assessment inside its boundaries while producing evidence and retest steps, instead of wandering off task. The through-line is the thing agents usually fail at: staying coherent, in scope, and useful across a long, multi-step job.

Build A Company With Only AI Employees!

Six people doing the work. Your headcount is one.

Your finance close runs in #finance. Stripe and QuickBooks reconciled, runway updated, posted Sunday night without you asking.

Engineering review lands in #eng. Viktor pulled the open PRs, left comments on auth-refactor, flagged a dependency blocking api-pagination.

Campaign brief lands in #growth: Meta CPA up 18%, recommendation to pause broad match, a draft landing page already deployed for the variant test.

You hired him on day zero. He lives in Slack and Microsoft Teams alongside your contractors and investors, connects to 3,000+ tools, pushes back when you ship something dumb.

"Viktor is now an integral team member, and after weeks of use we still feel we haven't uncovered the full potential." Patrick, Director, Yarra Web.

Industry Impact
The Honest Benchmark Picture

Here is where you should hold the skepticism your inbox deserves, because "on par with Mythos" is accurate, but the full picture is more interesting than a clean win.

Fugu Ultra genuinely leads on a lot of the suite. It posts the top score on 10 of 11 rows against the individual models it orchestrates, and tops several hard coding and reasoning tests like LiveCodeBench, TerminalBench, and CharXiv reasoning, scoring 73.7 on SWE-Bench Pro. But the wins are not universal, and this is the part most coverage skips. Fable 5 still beats Fugu on both SWE-Bench Pro and Humanity's Last Exam. GPT-5.5 wins the long-context recall test. Opus 4.8 edges it on a cybersecurity benchmark. And in a genuinely funny quirk, on a few tasks the regular Fugu actually scores higher than Fugu Ultra, so more orchestration is not always better.

So the honest claim is exactly the one Sakana makes: Fugu Ultra stands shoulder-to-shoulder with Fable 5 and Mythos. It matches the banned frontier, on a level playing field, while remaining fully usable. That is a real achievement. It is not "Fugu crushes everything," and anyone telling you it beats Mythos outright is selling you the headline instead of the table.

Why The Timing Is The Whole Point

This launch is a direct play on the biggest AI story of the month, and Sakana is not subtle about it.

On June 12, a US export-control directive forced Anthropic to pull Fable 5 and Mythos offline for everyone, worldwide, because the models could not be restricted to approved users in real time. Ten days later they are still down. That event proved a model you depend on can vanish overnight by government order, with no warning, which is the exact risk Fugu is engineered to neutralize. Because Fugu routes across a swappable pool, if any single provider gets restricted, banned, or simply goes down, it reroutes around the gap and keeps working. Sakana cites the Fable and Mythos controls directly as the motivation.

That reframes the whole competition. For two years the race was who builds the biggest model. Fugu is a bet that the next race is who orchestrates models best, and who is most resilient to a world where access to any single model is now a geopolitical variable. It is also, notably, a non-US lab making that bet, which matters when the disruption everyone is routing around is US policy.

The Honest Catch

Two real caveats before you get excited, because they are the substance behind the developer skepticism already circulating.

First, Fugu is a closed-source orchestrator that relies partly on closed-source model APIs, and Sakana has not disclosed what percentage of its pool is open versus closed, or which model handled your specific query. The routing is proprietary and hidden. So the "is this just a fancy wrapper around other people's models?" question is fair, and Sakana has not fully answered it. There is also a genuine cost concern: for hard tasks, Fugu Ultra can run up to around $10 per message, which adds up fast on heavy pipelines.

Second, the un-bannable pitch has a limit. If Fugu's pool leans heavily on US closed models under the hood, then US export controls could still reach it indirectly. The resilience is real in design, but how ban-proof it actually is depends on pool composition Sakana is not disclosing. And it is not available in the EU at launch. So treat "can't be banned" as the design goal and the marketing hook, not a proven guarantee. The idea is exactly right for the moment. Whether it fully delivers is something to verify with your own testing, not take on faith.

What's The Recap?

Sakana AI, the Tokyo lab behind the AI Scientist, launched Sakana Fugu and Fugu Ultra today, ten days after the US forced Anthropic's Fable 5 and Mythos offline with export controls. Fugu is not a monolithic model but a trained orchestration system: you call one API, and it internally routes your task across a pool of frontier models acting as thinker, worker, and verifier, built on Sakana's peer-reviewed ICLR 2026 research. Fugu Ultra benchmarks shoulder-to-shoulder with the banned Fable 5 and Mythos, leading many coding and reasoning tests, though Fable 5 still beats it on a couple and the wins are not universal. Demos like four games of blindfold chess and a CAD mechanical iris show it holding coherence across long, hard tasks. The whole pitch is resilience: because it routes across swappable models, no single ban or outage can take it down, a direct answer to the Fable shutdown. The honest catch: it is a closed orchestrator over partly closed models, the pool composition is undisclosed, it can cost up to $10 a message, and the un-bannable claim is a design goal, not a proven guarantee. The big idea is the right one for this moment. The frontier race is shifting from who builds the biggest model to who coordinates them best, and who can route around a world where any single model might disappear overnight.

Stay building. πŸ€–

Check Out Our Latest YouTube Video

Recommended for you