Here’s what we know so far… A leaked, 170-page Apple document reveals how the company evaluates AI digital assistant responses, focusing on factors like truthfulness, harmfulness, and user satisfaction. Human reviewers assess responses through a structured workflow: evaluating user requests, scoring individual replies, and ranking multiple responses. The goal is to ensure AI generated answers are accurate, safe, and natural.
Table of Contents
Everything you need to know about the leaked AI responses document
Central to this AI ranking is Apple’s “preference ranking” system. This system isn’t just driven by algorithms, it involves real human reviewers assessing how well AI generated responses meet the needs of real users. Compared to some of its big tech peers, Apple’s aim is to make sure answers from AI assistants like Siri aren’t just technically correct but also clear, helpful, and safe.
Reviewers follow a three-part framework. First, they interpret the user’s request, such as what did the user really mean or want? Next, they assess multiple AI generated responses to see how well each one addresses that need. Finally, they rank those responses from most to least effective.
What makes a good AI response in Apple’s eyes?
The leaked document outlines six key areas:
1. Following instruction
Did the AI stick to what the user asked? This includes both direct and implied intentions.
For example:
Explicit: “List three tips in bullet points,” “Write 100 words,” “No commentary.”
Implicit: A request phrased as a question implies the assistant should provide an answer. A follow-up like “Another article please” carries forward context from a previous instruction (eg. to write for a 5-year-old).
2. Language quality
Is the response written in a natural, localised way that feels appropriate for the user?
For example, an American user asking for a reading list shouldn’t just be given American authors unless explicitly requested. Likewise, using the word “soccer” for a British audience instead of “football” counts as a localisation miss.
3. Concision
Apple values conciseness. Responses should be as short as possible without losing meaning or clarity. Two main concerns, distractions and length appropriateness are discussed in the document.
4. Truthfulness
Factual accuracy is non-negotiable. Apple wants AI that won’t spread misinformation.
5. Harmfulness: Responses are reviewed for content that could be offensive, biased, or dangerous, as user safety is a top priority.
6. Satisfaction
This is the most important measure for Apple. Was the user likely satisfied with the answer, considering all the above factors?
It’s interesting to see how Apple prioritises trust and user experience over anything else. This document highlights that Apple seem more focused on making AI useful, safe, and respectful of user intent.
Disclaimer: Search Engine Land received the Apple Preference Ranking Guidelines v3.3 via a vetted source who wishes anonymity. Their team have contacted Apple for comment but have not received a response.
How the AI responses could shape your SEO strategy
While this might sound like a tech story, it could have serious implications for anyone working in SEO, content, or AI-driven search.
AI responses are scored across multiple dimensions
Apple uses a detailed scoring system to evaluate AI output, including harmfulness, truthfulness, helpfulness, and satisfaction. This isn’t just about safety, it’s about quality and relevance, too.
Why it matters: Search engines are already blending AI-generated summaries and overviews into results. Understanding how Apple rates AI content gives you a glimpse into the future of content evaluation, especially as search becomes more conversational.
Truthfulness and accuracy are critical
The document emphasises that AI must avoid hallucinations and stick closely to verifiable facts.
Why it matters: If you’re optimising for AI-driven overviews (like Google’s SGE), your content needs to be clear, reliable, and well-sourced. Misinformation could mean getting skipped over.
Satisfaction is measured by intent alignment
Apple scores how well the AI response matches what the user actually wanted, not just what they typed.
Why it matters: This reinforces the importance of user intent in SEO. You can’t just chase keywords, you need to deeply understand what the searcher is really asking for.
Harmfulness isn’t just about extreme content
The harmfulness metric includes misleading advice, biased language, and unsafe recommendations.
Why it matters: If your content includes recommendations (e.g. travel, health, finance), you’ll need to review it through a safety and ethical lens. This could impact how AI surfaces your content.
It’s not just about tech – it’s about trust
Apple’s document suggests that the future of AI is trust-first. The more trustworthy your content appears, the more likely it is to be amplified by AI systems.
Why it matters: From E-E-A-T (Experience, Expertise, Authoritativeness, and Trust) to AI scoring, the message is clear: trust and quality are your SEO superpowers in the age of AI.
Bottom line? This leak gives us a rare peek behind the curtain. If you’re serious about future-proofing your SEO, it’s time to start thinking like an AI evaluator. Focus on factual, helpful, intent-driven content that users, and machines can trust.
If you’d like more information on this story or for support with your SEO strategy, get in touch with our team of experts. Send us an email to team@modo25.com and we’d be more than happy to discuss your requirements.