Skip to content
Nmow
Article · Methodology

The MAGNET framework: a 46-item rubric for agent-readiness

How Nmow scores agent-readiness across seven dimensions and 46 items, with band thresholds calibrated for MENA. Published as architecture, with profile-specific weightings and the full rubric reserved for client engagements. The methodology piece that anchors the firm’s agent-era practice.

AuthorAhmed SaadPublishedApril 22, 2026Read12 min

Most growth conversations in 2026 mention “AI search” or “answer engine optimization” or “GEO” without a measurement framework underneath. The terminology gestures at something real: AI agents are increasingly the surface where buyers first encounter your business, but without rigorous scoring, the work that follows tends to be the same content production wearing a new label.

We built MAGNET, Machine-Actionable Generative Entity Test, to bring measurement to the question. It scores agent-readiness across seven dimensions and 46 items, with band thresholds calibrated for MENA market realities and weightings that adjust by business type.

This piece is the public architecture: dimension structure, weighting rationale, band thresholds, and the structural veto checks that override everything else. The 46-item rubric itself stays inside engagements. The reasoning for that boundary is in the closing section.

Why the framework exists

Agent-era discoverability isn't a tactical update to existing search optimization. The substrate shifted. AI agents read your site through a different layer than human searchers: they want structured data they can extract, semantic markup they can parse, content they can quote, freshness signals they can trust, and entities they can resolve to a real organization with verifiable presence elsewhere on the web.

Most sites optimize for human visual reading. That optimization rarely overlaps with agent extractability. The image-as-headline pattern that works for human users is invisible to agents. The CSS-positioned content that renders correctly in browsers fails accessibility-tree extraction. The dynamic schema injected client-side gets ignored by the crawlers that build the citation pools agents draw from.

The site that scores 88 in PageSpeed Insights and ranks first on Google can be entirely absent from ChatGPT search results for the same queries.

The gap matters because agent-mediated discovery is becoming the buyer's first surface. By the time someone runs a Google search, they often have a shortlist already shaped by an AI assistant's earlier answer. Sites that aren't in those answers, or are described unfavorably in them, lose the buyer before the search even happens.

The existing measurement tools don't address this layer rigorously. SEO tools measure search rankings. Content tools measure readability. Schema validators measure correctness without measuring extractability. None of them produce a band-level signal of where you sit relative to agent-era readiness as a structural attribute. We built MAGNET because we needed that measurement to do the work, and the work needed it grounded in MENA-specific calibration that imported frameworks didn't capture.

The seven dimensions

MAGNET decomposes agent-readiness into seven dimensions. Each measures a distinct surface of how agents discover, parse, and trust your site.

D1

Structured Data

The schema layer: Schema.org type coverage, JSON-LD validity, property completeness, entity linking via sameAs. This is where most sites have the highest theoretical room to improve and the lowest actual deployment quality. Bad structured data is worse than no structured data: it teaches agents wrong information about you.

D2

Semantic Markup

The accessibility tree: heading hierarchy, landmark roles, ARIA correctness, semantic HTML over div soup. Agents extract content through the same accessibility tree that screen readers use. The site that fails WCAG also fails agent extraction; the relationship runs through the same underlying graph.

D3

Content Extractability

The readable layer: claim-evidence formatting, FAQ structure, comparison content, citation-friendly phrasing. This is where most sites fail in MENA specifically because Arabic content has structural distinctness that English-first writing patterns don't translate cleanly into.

D4

Crawler Connectivity

The infrastructure layer: robots.txt for AI agents, llms.txtadoption, WAF and CDN configuration, response headers, agent-specific user-agent permissions. The infrastructure-layer item that frequently scores 0 because it wasn't on anyone's checklist a year ago.

D5

Freshness Signals

The temporal layer: dateModified accuracy, content coherence, expired-content cleanup, version drift. Agents weight freshness heavily; sites that report stale dateModified values get filtered out of fresh-citation pools regardless of content quality.

D6

Entity Authority

The off-site layer: Wikipedia entries, Wikidata properties, recognized industry directory presence, citation density across the open web. The largest single ongoing investment for most clients because the work cycles are 6 to 12 months per item.

D7

Agentic Conversion / Transactability

The conversion layer: whether an autonomous agent can complete the business's primary conversion action — a purchase for e-commerce, a demo-booking, trial-start, or lead-capture for SaaS, a quote-request or booking for services. Selector stability, agent-traversable flows, regional payment rails (Tabby, Tamara, Apple Pay in MENA) where commerce applies, and no human-only or anti-bot wall on the conversion path. Reframed in v2 so non-e-commerce businesses are scored on their own conversion goal, not penalized for missing a cart they shouldn't have.

The seven were chosen to be exhaustive of the agent-readiness surface and mutually independent enough that scoring one doesn't force scoring another. Earlier framework iterations had nine dimensions; we collapsed to seven because items were cross-pollinating between scoring axes in ways that distorted total scores.

Weightings, and why they vary

The seven dimensions don't carry equal weight. The default weighting set:

Default weighting100% total
D1Structured Data18%
D2Semantic Markup16%
D3Content Extractability18%
D4Crawler Connectivity10%
D5Freshness Signals12%
D6Entity Authority18%
D7Agentic Conversion / Transactability8%

Three dimensions tie for the highest weight. D1 Structured Data, because it is the most leveraged single intervention and touches everything downstream; D3 Content Extractability and D6 Entity Authority, because the evidence shows extractable content and resolvable entity identity are the strongest drivers of whether agents actually cite you. D5 Freshness was raised in v2 — agents treat recency as a first-class retrieval signal. D7 carries the lowest weight in this default profile because most businesses aren't transactional; it rises sharply for e-commerce.

Profile-specific weightings adjust this. Service businesses use the balanced default above. E-commerce sites weight D1 and D7 higher because product schema and agent-traversable checkout determine whether agents can transact with them. SaaS / lead-genweight D3 and D6 highest because buyers ask assistants “best tool for X,” so extractable content and third-party entity authority drive the shortlist. Regulated services lean on D6 authority and D5 freshness, where verifiable, current, well-sourced information carries the most weight.

The four profile weightings are fixed; the assignment to a profile is judgment calibrated during the audit. Most engagements use one of the four; multi-vertical businesses occasionally use a custom blend signed off at scoping.

Practical use

Curious where your site sits against the rubric? The audit is the entry-point engagement: a four-week diagnostic that produces a band assignment, prioritized backlog, and recommended next engagement.

See the audit page

The five bands

Total scores fall into five bands. Band assignment determines the recommended next engagement.

80–100

Agent-Native

Site operates fluently in agent-mediated discovery. Maintenance work, not transformation work.

60–79

Agent-Friendly

Substantial agent-readiness with specific gaps. Implementation engagement closes the gaps; Retainer maintains.

40–59

Legacy-Optimized

Optimized for traditional search but missing the agent-era layer. Implementation transforms; this is the most common audit outcome.

20–39

Frictional

Structural gaps across multiple dimensions. Implementation is foundational; Retainer alone won't close the gap.

0–19

Dark Site

Agents cannot meaningfully extract or cite. Often a CMS or infrastructure problem upstream of content. Diagnostic engagement first to determine what's structurally feasible.

The bands are calibrated against MENA test sites: the thresholds are slightly different than they would be in US-only or EU-only calibrations because Arabic content and regional infrastructure produce distinct distributions.

Five veto checks

Five structural checks override the dimension scoring. The vetoes exist because dimension scoring assumes baseline accessibility: a site that fails any veto check needs the veto resolved before dimension scoring becomes meaningful.

  1. 01

    Not-crawlable check

    If your content isn't reachable at all (an auth wall, infinite scroll with no real URLs, or other access blockers), maximum total score is capped at 25: agents can't reach the content, so dimension scoring is theoretical.

  2. 02

    Robots-blocks-AI-crawlers check

    If your robots.txt blocks the major AI crawlers wholesale (GPTBot, ClaudeBot, PerplexityBot and peers), maximum total score is capped at 30: agents are told not to read you.

  3. 03

    JS-rendered-only-content check

    If the majority of your meaningful content only renders client-side (no server-rendered HTML for the agent crawlers that don't execute JavaScript), maximum score caps at 40. At-scale fetch studies show AI crawlers don't run JavaScript the way Googlebot does.

  4. 04

    Human-in-the-loop conversion check

    If your primary conversion can only be completed through synchronous human contact (a phone call, a WhatsApp message, an in-person visit) with no digital alternative an agent can traverse, maximum score caps at 40.

  5. 05

    Zero-schema check

    If there's no structured data of any kind across your audited pages (no JSON-LD, microdata, or RDFa), maximum score caps at 50. A single Organization schema in the head is enough to clear it.

What MAGNET doesn't score

MAGNET deliberately doesn't score:

  • Brand sentiment. What agents say about you when they cite you. Sentiment is a content/PR question, not a readiness question.
  • Traditional SEO. Rankings, backlink profile, domain authority, keyword density. These are measured well by existing tools.
  • Traffic or conversion. MAGNET measures readiness as a structural attribute, not the traffic outcomes that follow. The relationship between readiness and traffic is real but indirect, and traffic measurement has its own established methodology.
  • Content quality in human terms. Whether your articles are interesting, persuasive, or original. MAGNET measures extractability, not editorial merit.

The boundary matters because confusing readiness with downstream outcomes is how most agency engagements drift. A site can be Agent-Native and have weak content; a site can have brilliant content and be Frictional. These are different problems requiring different work.

Why the rubric stays private

The 46-item rubric stays inside client engagements rather than being published. Three reasons.

Competitive integrity

A published rubric becomes a checklist for sites optimizing toward the test. Optimization toward a published checklist replaces optimization toward agent-readiness: the test gets gamed, the actual readiness drifts. Keeping the items reserved keeps scoring honest.

Client value

Audit-and-Implementation engagements are valuable partly because the scoring includes specific items the client wouldn't otherwise know to check. Publishing the rubric reduces what the engagement adds beyond consulting hours. The architecture is published because architecture frames understanding; the items are reserved because the items frame the work.

Calibration drift

The 46 items are calibrated quarterly against MENA test sites and updated as Schema.org versions roll forward and new agent surfaces launch. A published rubric would become outdated within months and lock the framework into whatever version was published. Keeping items inside engagements keeps the framework alive.

What gets published, then, is exactly what this article describes: the dimension structure, weightings, bands, vetoes, and profile assignments. That's enough for a reader to understand whether MAGNET is the right framework for their situation, what its outputs mean, and how it differs from competitor frameworks. It's not enough to game it.

Closing thoughts

If you're trying to decide whether agent-readiness is a category worth investing in, the architecture above should clarify the question. If you're trying to decide whether MAGNET is the framework that fits your situation, the seven dimensions and four profile weightings should clarify that too.

What it doesn't tell you is where your site actually scores. That requires running the rubric. We do that as the entry-point engagement: a four-week diagnostic that produces a band assignment, a prioritized backlog, and a recommended next engagement.

The framework will keep evolving. New agent surfaces are launching every quarter. Schema.org keeps shipping versions. MENA-specific calibrations need ongoing recalibration as Arabic content patterns and regional infrastructure mature. We'll publish framework updates here as they happen.

For now, this is the public architecture, and it's enough to evaluate whether the work is the right fit for your situation. If it is, the next step is a scoping conversation. If it isn't, this article should at least make explicit what the agent-era layer actually contains: most of the agency conversations on the topic right now don't.

Ahmed Saad
Author

Ahmed Saad

Founder & Chief Architect of Nmow. Twelve years operating growth at MENA-native companies including Nana, Gathern, Dryve, FlyAkeed, and Fashion.sa. The MAGNET framework distills patterns observed across those engagements into a measurement system Nmow uses for every audit.

Run the rubric

Want to know where your site sits?

The audit is the entry-point engagement. Four-week diagnostic against the full 46-item rubric, band assignment, prioritized remediation backlog, and a recommended next engagement.