At WWDC 2026 this week, someone asked Siri to go through a folder of contractor quotes as PDFs, compare them, pick the best option and draft a reply email. Siri did it. Live, on stage, in front of an audience.
That’s not a kitchen timer. That’s not “Hey Siri, what’s the weather.” That’s the kind of task you’d currently hand to Claude or ChatGPT with careful prompting and a bit of luck. Apple just demonstrated it happening through a voice assistant most of us had written off.
It is worth understanding how they got there - because the architecture behind it is genuinely interesting, and a lot of it comes down to a clever solution to a very unglamorous problem: memory.
Two years in the making
Apple first promised this version of Siri at WWDC 2024. On-screen awareness - meaning Siri could see what was on your screen and act on it - was supposed to be the headline feature of Apple Intelligence. It was announced. It was demoed. It was not shipped.
iOS 18 came out without it. iOS 18.4 missed. WWDC 2025 was largely Apple saying “still not ready, coming next year.” Two full years between the promise and the product.
That context matters when you see the live demos this week. This isn’t a concept. People are using it.
The architecture: an orchestrator with eyes
The way to think about the new Siri is as a conductor rather than a performer. At the centre is you - your devices, your data, your context. Around that sits Siri, alongside other input modes like image and text. Then comes a layer of capabilities: world knowledge, on-screen awareness, personal context (your emails, files, calendar), actions it can take in apps.
The key piece is what Apple calls the System Orchestrator. This is the layer that decides - given what you asked, what’s on your screen, what’s in your files and what you’ve been doing - which models to engage, in what order, to produce a useful answer and an actual action.
Think of it like a very capable executive assistant who doesn’t just answer questions but knows when to pull in a specialist, when to look something up, when to read the document on your desk, and when to send the email on your behalf. The orchestrator is that assistant. The models underneath it are the specialists.
The model family
Apple now has five foundation models - the Apple Foundation Models (AFM) generation 3 - split between what runs on your device and what runs on their servers.
On-device, there are two. The first is AFM 3 Core, a refined version of their existing 3-billion-parameter model. This is what runs on any Apple Intelligence-capable device - iPhone 15 Pro and later, iPhone 16 and 17 models, iPads with M1 or later, and Macs with M1 or later.
The second is more interesting and more restricted: AFM 3 Core Advanced, a 20-billion-parameter model that runs with only 1 to 4 billion parameters active at any moment. Apple says it is “unlocked by and optimised for” their most capable silicon - meaning A19 Pro (iPhone 17 Pro) or M3 and M4 chips in Macs and iPads. Devices with only 8GB of RAM are excluded entirely. If you’re on an iPhone 16 Pro or an M2 MacBook Air, you get AFM 3 Core. The Advanced model is for the top end of the lineup. More on how that actually works below.
On the server side, AFM 3 Cloud is the workhorse - fast, efficient, handles most complex queries. ADM 3 Cloud handles image generation and editing. AFM 3 Cloud Pro is the most capable, designed specifically for agentic work - the kind of multi-step reasoning behind the PDF-comparison demo. AFM 3 Cloud Pro runs on NVIDIA GPUs inside Google Cloud infrastructure, which Apple was quietly transparent about. The privacy guarantees still apply, but Apple is not fully self-sufficient at the frontier tier.
The memory trick that makes it work
Here’s where the unglamorous problem gets interesting.
Your phone has two kinds of memory. DRAM - sometimes just called RAM - is the fast, active working memory. It’s what the phone uses right now, while things are running. Think of it as your desk: fast to reach, but limited in size. NAND - often called flash storage - is the slower long-term memory. It’s where your photos, apps and files live when they’re not being actively used. Think of it as your filing cabinet: much more space, but slower to retrieve.
The problem with running a powerful AI model on a phone is that traditional models need to live entirely on the desk - in DRAM - to work at useful speeds. A 20-billion-parameter model doesn’t fit on any phone’s desk. Not even close.
Apple’s solution with AFM 3 Core Advanced is to keep the full model in the filing cabinet (NAND) and work out in advance which parts of it you’ll actually need for a given task. Before generating a response, a lightweight routing system decides which set of “expert” blocks to pull from storage and load onto the desk (DRAM). It loads them once, uses them for the whole response, and periodically refreshes the selection if the task shifts.
The result: a 20-billion-parameter model that uses roughly the DRAM footprint of a 1 to 4 billion parameter model, depending on how demanding the task is. You get meaningfully more intelligence without the phone needing more physical memory. It’s efficient in a way that’s specific to Apple’s hardware - they control the chip, the storage, the operating system and the memory scheduler, so they can tune the filing-cabinet-to-desk transfer in a way no one else building for someone else’s hardware can match.
Google with their Pixel phones could do something similar in principle - they design the Tensor chip and write the Android OS. But Android has to run on thousands of device configurations from dozens of manufacturers. Any assumption Apple bakes into this optimisation would be a compatibility liability for Google. Apple’s advantage here isn’t unique capability; it’s the freedom to commit fully to a single known hardware target.
What the numbers say
Apple’s own human preference evaluations show a genuine generational leap. Their on-device AFM 3 Core model was preferred over the 2025 baseline 45.6% to 23.3%. The server model, AFM 3 Cloud, was preferred over its 2025 predecessor 64.7% to 8.7%. Those are large gaps - not incremental improvements. The voice quality numbers are equally meaningful. Using a standard speech quality scale, AFM 3 Core Advanced scored 4.15 compared to the current production system’s 3.87, with the gap widening on conversational language (4.24 vs 3.82). A 0.1 improvement on that scale is considered highly noticeable to listeners.
The caveat is that all of these are measured against Apple’s own previous models, not against GPT-4o, Gemini or Claude. A full technical report with external benchmarks is promised later this summer.
The real edge: your phone already knows everything
Before any of the model architecture matters, there’s a more fundamental advantage Apple holds that nobody else comes close to matching.
Your iPhone knows more about you than almost anyone in your life. It has your messages, your emails, your calendar, your photos, your location history, your health data, your contacts and the notes you wrote to yourself at 11pm when you couldn’t sleep. It has years of this. It has all of it. Your closest friends have fragments. Your phone has the complete picture.
What Apple has done with the new Siri is take that picture and put it at the centre of every answer. Not generic AI answers drawn from the internet. Answers grounded in your life, your commitments, your history. When Siri is asked “what did I promise to send the builder?”, it isn’t guessing. It’s reading your messages and emails and drafting accordingly. You are the context. The model orbits you.
This is why the architecture diagram Apple showed - with a person at the absolute centre, surrounded by layers of capabilities - isn’t just a marketing choice. It’s the design principle. Most AI tools start with a general-purpose model and ask you to bring your context to it. Apple starts with your context and brings the model to it.
Privacy: the stakes are obvious, and Apple knows it
The personal context advantage is also the obvious concern. If Siri has access to everything your phone knows about you, and your phone knows everything, then the question of where that information goes is not a small one.
Apple’s answer is layered and, to their credit, unusually auditable. As much as possible, processing happens on your device. What can’t run locally goes to Private Cloud Compute - Apple’s secure server infrastructure, purpose-built with the same privacy guarantees as the on-device layer. No user data is stored. No one at Apple can access it. The contents of your queries do not feed future model training.
They have backed this with something rare in the industry: an open invitation for external security researchers to scrutinise the architecture any time, any way. And a $1 million bounty for anyone who can demonstrate a breach of the PCC security model. That is not the behaviour of a company hoping nobody looks too closely.
The complication, as noted earlier, is that AFM 3 Cloud Pro - the most capable server model, the one handling complex agentic tasks - now runs on NVIDIA GPUs inside Google Cloud infrastructure. Apple is explicit that the same PCC rules apply there. The attestation model, the no-retention guarantees, the independent verifiability - all of it extends to those NVIDIA chips sitting in Google’s data centres. Whether you take Apple at their word on that is a judgement call. The architecture is published. The bounty is real. The scrutiny is invited.
For most people using Siri AI day to day, the relevant work is happening on-device anyway. The Google Cloud layer is the ceiling, not the floor.
What it looks like in practice
I’ve watched user videos of a number of people use the new Siri AI this week - on devices, live, not in controlled demos. The results are real. It reads screens, finds things in files, takes actions across apps. It’s noticeably more capable than what Siri has been for years.
It isn’t a wholesale replacement for the dedicated AI tools people are already using. For complex research, long document analysis or nuanced work, purpose-built tools with deeper context windows and more powerful models will still be the right choice. But for the kind of everyday agentic tasks that happen dozens of times a day on a phone - read this, compare that, send this reply - Siri AI is looking closer than it ever has to actually delivering.
For people building their own AI toolkit, this belongs in the inventory. Not at the top, but on the list. The integration layer Apple has built - the personal context, the on-screen awareness, the OS-level permissions - is genuinely difficult to replicate through any third-party app. That’s the moat. And two years late or not, it now appears to be real.
Sources:
- Introducing the Third Generation of Apple’s Foundation Models
- Apple Announces Siri AI at WWDC 2026
- WWDC 2025: Personalized Siri Features Still Not Ready
- iOS 27: Which iPhones are compatible
- Apple Intelligence compatible devices: full list
- The Third Generation of Apple’s Foundation Models and AFM Core Advanced