Most growth conversations in 2026 mention “AI search” or “answer engine optimization” or “GEO” without a measurement framework underneath. The terminology gestures at something real: AI agents are increasingly the surface where buyers first encounter your business, but without rigorous scoring, the work that follows tends to be the same content production wearing a new label.
We built MAGNET, Machine-Actionable Generative Entity Test, to bring measurement to the question. It scores agent-readiness across seven dimensions and 46 items, with band thresholds calibrated for MENA market realities and weightings that adjust by business type.
This piece is the public architecture: dimension structure, weighting rationale, band thresholds, and the structural veto checks that override everything else. The 46-item rubric itself stays inside engagements. The reasoning for that boundary is in the closing section.
Why the framework exists
Agent-era discoverability isn't a tactical update to existing search optimization. The substrate shifted. AI agents read your site through a different layer than human searchers: they want structured data they can extract, semantic markup they can parse, content they can quote, freshness signals they can trust, and entities they can resolve to a real organization with verifiable presence elsewhere on the web.
Most sites optimize for human visual reading. That optimization rarely overlaps with agent extractability. The image-as-headline pattern that works for human users is invisible to agents. The CSS-positioned content that renders correctly in browsers fails accessibility-tree extraction. The dynamic schema injected client-side gets ignored by the crawlers that build the citation pools agents draw from.
The site that scores 88 in PageSpeed Insights and ranks first on Google can be entirely absent from ChatGPT search results for the same queries.
The gap matters because agent-mediated discovery is becoming the buyer's first surface. By the time someone runs a Google search, they often have a shortlist already shaped by an AI assistant's earlier answer. Sites that aren't in those answers, or are described unfavorably in them, lose the buyer before the search even happens.
The existing measurement tools don't address this layer rigorously. SEO tools measure search rankings. Content tools measure readability. Schema validators measure correctness without measuring extractability. None of them produce a band-level signal of where you sit relative to agent-era readiness as a structural attribute. We built MAGNET because we needed that measurement to do the work, and the work needed it grounded in MENA-specific calibration that imported frameworks didn't capture.
The seven dimensions
MAGNET decomposes agent-readiness into seven dimensions. Each measures a distinct surface of how agents discover, parse, and trust your site.
Structured Data
The schema layer: Schema.org type coverage, JSON-LD validity, property completeness, entity linking via sameAs. This is where most sites have the highest theoretical room to improve and the lowest actual deployment quality. Bad structured data is worse than no structured data: it teaches agents wrong information about you.
Semantic Markup
The accessibility tree: heading hierarchy, landmark roles, ARIA correctness, semantic HTML over div soup. Agents extract content through the same accessibility tree that screen readers use. The site that fails WCAG also fails agent extraction; the relationship runs through the same underlying graph.
Content Extractability
The readable layer: claim-evidence formatting, FAQ structure, comparison content, citation-friendly phrasing. This is where most sites fail in MENA specifically because Arabic content has structural distinctness that English-first writing patterns don't translate cleanly into.
Crawler Connectivity
The infrastructure layer: robots.txt for AI agents, llms.txtadoption, WAF and CDN configuration, response headers, agent-specific user-agent permissions. The infrastructure-layer item that frequently scores 0 because it wasn't on anyone's checklist a year ago.
Freshness Signals
The temporal layer: dateModified accuracy, content coherence, expired-content cleanup, version drift. Agents weight freshness heavily; sites that report stale dateModified values get filtered out of fresh-citation pools regardless of content quality.
Entity Authority
The off-site layer: Wikipedia entries, Wikidata properties, recognized industry directory presence, citation density across the open web. The largest single ongoing investment for most clients because the work cycles are 6 to 12 months per item.
Agentic Conversion / Transactability
The conversion layer: whether an autonomous agent can complete the business's primary conversion action — a purchase for e-commerce, a demo-booking, trial-start, or lead-capture for SaaS, a quote-request or booking for services. Selector stability, agent-traversable flows, regional payment rails (Tabby, Tamara, Apple Pay in MENA) where commerce applies, and no human-only or anti-bot wall on the conversion path. Reframed in v2 so non-e-commerce businesses are scored on their own conversion goal, not penalized for missing a cart they shouldn't have.
The seven were chosen to be exhaustive of the agent-readiness surface and mutually independent enough that scoring one doesn't force scoring another. Earlier framework iterations had nine dimensions; we collapsed to seven because items were cross-pollinating between scoring axes in ways that distorted total scores.
Weightings, and why they vary
The seven dimensions don't carry equal weight. The default weighting set:
Three dimensions tie for the highest weight. D1 Structured Data, because it is the most leveraged single intervention and touches everything downstream; D3 Content Extractability and D6 Entity Authority, because the evidence shows extractable content and resolvable entity identity are the strongest drivers of whether agents actually cite you. D5 Freshness was raised in v2 — agents treat recency as a first-class retrieval signal. D7 carries the lowest weight in this default profile because most businesses aren't transactional; it rises sharply for e-commerce.
Profile-specific weightings adjust this. Service businesses use the balanced default above. E-commerce sites weight D1 and D7 higher because product schema and agent-traversable checkout determine whether agents can transact with them. SaaS / lead-genweight D3 and D6 highest because buyers ask assistants “best tool for X,” so extractable content and third-party entity authority drive the shortlist. Regulated services lean on D6 authority and D5 freshness, where verifiable, current, well-sourced information carries the most weight.
The four profile weightings are fixed; the assignment to a profile is judgment calibrated during the audit. Most engagements use one of the four; multi-vertical businesses occasionally use a custom blend signed off at scoping.
Curious where your site sits against the rubric? The audit is the entry-point engagement: a four-week diagnostic that produces a band assignment, prioritized backlog, and recommended next engagement.
See the audit pageThe five bands
Total scores fall into five bands. Band assignment determines the recommended next engagement.
Agent-Native
Site operates fluently in agent-mediated discovery. Maintenance work, not transformation work.
Agent-Friendly
Substantial agent-readiness with specific gaps. Implementation engagement closes the gaps; Retainer maintains.
Legacy-Optimized
Optimized for traditional search but missing the agent-era layer. Implementation transforms; this is the most common audit outcome.
Frictional
Structural gaps across multiple dimensions. Implementation is foundational; Retainer alone won't close the gap.
Dark Site
Agents cannot meaningfully extract or cite. Often a CMS or infrastructure problem upstream of content. Diagnostic engagement first to determine what's structurally feasible.
The bands are calibrated against MENA test sites: the thresholds are slightly different than they would be in US-only or EU-only calibrations because Arabic content and regional infrastructure produce distinct distributions.
Five veto checks
Five structural checks override the dimension scoring. The vetoes exist because dimension scoring assumes baseline accessibility: a site that fails any veto check needs the veto resolved before dimension scoring becomes meaningful.
- 01
Not-crawlable check
If your content isn't reachable at all (an auth wall, infinite scroll with no real URLs, or other access blockers), maximum total score is capped at 25: agents can't reach the content, so dimension scoring is theoretical.
- 02
Robots-blocks-AI-crawlers check
If your robots.txt blocks the major AI crawlers wholesale (GPTBot, ClaudeBot, PerplexityBot and peers), maximum total score is capped at 30: agents are told not to read you.
- 03
JS-rendered-only-content check
If the majority of your meaningful content only renders client-side (no server-rendered HTML for the agent crawlers that don't execute JavaScript), maximum score caps at 40. At-scale fetch studies show AI crawlers don't run JavaScript the way Googlebot does.
- 04
Human-in-the-loop conversion check
If your primary conversion can only be completed through synchronous human contact (a phone call, a WhatsApp message, an in-person visit) with no digital alternative an agent can traverse, maximum score caps at 40.
- 05
Zero-schema check
If there's no structured data of any kind across your audited pages (no JSON-LD, microdata, or RDFa), maximum score caps at 50. A single Organization schema in the head is enough to clear it.
What MAGNET doesn't score
MAGNET deliberately doesn't score:
- Brand sentiment. What agents say about you when they cite you. Sentiment is a content/PR question, not a readiness question.
- Traditional SEO. Rankings, backlink profile, domain authority, keyword density. These are measured well by existing tools.
- Traffic or conversion. MAGNET measures readiness as a structural attribute, not the traffic outcomes that follow. The relationship between readiness and traffic is real but indirect, and traffic measurement has its own established methodology.
- Content quality in human terms. Whether your articles are interesting, persuasive, or original. MAGNET measures extractability, not editorial merit.
The boundary matters because confusing readiness with downstream outcomes is how most agency engagements drift. A site can be Agent-Native and have weak content; a site can have brilliant content and be Frictional. These are different problems requiring different work.
Why the rubric stays private
The 46-item rubric stays inside client engagements rather than being published. Three reasons.
Competitive integrity
A published rubric becomes a checklist for sites optimizing toward the test. Optimization toward a published checklist replaces optimization toward agent-readiness: the test gets gamed, the actual readiness drifts. Keeping the items reserved keeps scoring honest.
Client value
Audit-and-Implementation engagements are valuable partly because the scoring includes specific items the client wouldn't otherwise know to check. Publishing the rubric reduces what the engagement adds beyond consulting hours. The architecture is published because architecture frames understanding; the items are reserved because the items frame the work.
Calibration drift
The 46 items are calibrated quarterly against MENA test sites and updated as Schema.org versions roll forward and new agent surfaces launch. A published rubric would become outdated within months and lock the framework into whatever version was published. Keeping items inside engagements keeps the framework alive.
What gets published, then, is exactly what this article describes: the dimension structure, weightings, bands, vetoes, and profile assignments. That's enough for a reader to understand whether MAGNET is the right framework for their situation, what its outputs mean, and how it differs from competitor frameworks. It's not enough to game it.
Closing thoughts
If you're trying to decide whether agent-readiness is a category worth investing in, the architecture above should clarify the question. If you're trying to decide whether MAGNET is the framework that fits your situation, the seven dimensions and four profile weightings should clarify that too.
What it doesn't tell you is where your site actually scores. That requires running the rubric. We do that as the entry-point engagement: a four-week diagnostic that produces a band assignment, a prioritized backlog, and a recommended next engagement.
The framework will keep evolving. New agent surfaces are launching every quarter. Schema.org keeps shipping versions. MENA-specific calibrations need ongoing recalibration as Arabic content patterns and regional infrastructure mature. We'll publish framework updates here as they happen.
For now, this is the public architecture, and it's enough to evaluate whether the work is the right fit for your situation. If it is, the next step is a scoping conversation. If it isn't, this article should at least make explicit what the agent-era layer actually contains: most of the agency conversations on the topic right now don't.
