Skip to content
Nmow
Methodology

How we measure agent-readiness.

MAGNET, Machine-Actionable Generative Entity Test, is the framework Nmow built to score whether a website can be discovered, understood, trusted, and transacted with by AI agents. Forty-six items across seven dimensions, calibrated for the MENA market.

FrameworkMAGNET v2.0Items46 across 7 dimensionsCalibrationMENA market
Overview

What MAGNET actually measures.

Most GEO checklists you’ll find online are about ten items long. They look at schema markup, mention llms.txt, suggest a few content tweaks, and call it done. That’s a starting point, not a methodology.

01

Forty-six items, seven dimensions

MAGNET is forty-six items across seven dimensions because the question "can an AI agent discover, parse, trust, and transact with this site?" doesn’t reduce to ten items. Each dimension addresses a different failure mode, and we’ve seen all of them in production audits.
02

Architecture in the open, rubric in the audit

The seven dimensions are Structured Data, Semantic & Accessible Markup, Content Extractability, Crawler & Agent Connectivity, Freshness Signals, Entity Authority, and Agentic Conversion / Transactability. Each dimension carries a percentage weight, scored on a 0-to-3 scale per item, with weights that shift by business profile. Five scoring bands describe what each total score means in practice. The full 46-item rubric is proprietary to the audit.
The Framework

Seven dimensions, at a glance.

Each dimension addresses a discrete failure mode in agent-readiness. Weights shift by business profile; the diagram shows the default service-business weighting.

D1Structured DataD2Semantic MarkupD3Content ExtractabilityD6Entity AuthorityD7Agentic ConversionD4Crawler ConnectivityD5FreshnessAGENTREADY
D1 Structured Data, 18% default, 20% e-commerce

Can agents retrieve deterministic facts about your offerings?

Can agents retrieve deterministic facts about your business and your offerings without inferring from prose?

Weight18%
Why it matters

Heaviest weight

Among the heaviest-weighted dimensions because LLMs prefer structured-data citations to prose-extracted ones. When ChatGPT answers "what’s the cancellation policy at this hotel" it’s vastly more likely to cite a site exposing that policy in CancellationPolicy schema than one with the same policy in a paragraph buried under a "Terms" link.
What we score

Scored items

We assess the coverage, validity, and entity-linkage of your structured data across the schema types that matter for your category, including MENA-specific coverage. The exact items and how each is scored are part of the audit deliverable.
D2 Semantic & Accessible Markup, 16% default, 14% e-commerce

Can vision-based agents click the right elements?

Can vision-based agents click the right elements and complete workflows without hallucinating element function?

Weight16%
Why it matters

Vision-agent ready

With Claude Computer Use, ChatGPT Operator, and similar agents now executing actions on real websites, accessibility tree quality has stopped being just an a11y concern. It’s a transactability concern. An agent that can’t tell which button is "Add to Cart" versus "Save for Later" will either fail or misfire.
What we score

Scored items

We assess how reliably a vision-based agent can identify and operate the interactive elements on your pages, including RTL semantic correctness for Arabic content. The specific scored items are detailed in the audit.
D3 Content Extractability, 18% default, 14% e-commerce

When an LLM cites your content, will the citation hold up?

When an LLM cites your content, will the citation hold up to scrutiny?

Weight18%
Why it matters

Citation defensibility

Citation isn’t enough: the cited claim has to be defensible. Sites that bury key facts inside images, render text only via JavaScript, or scatter contradictory information across pages get extracted incoherently. Buyers who follow citations to those pages bounce, and LLMs learn to deprioritize that source.
What we score

Scored items

We assess whether your key facts are rendered as extractable text, kept coherent across Arabic and English, and positioned so a cited claim holds up to scrutiny. The item-level rubric is part of the audit.
D4 Crawler & Agent Connectivity, 10%

Can the agents that want to read your site actually reach it?

Can the agents that want to read your site actually reach it?

Weight10%
Why it matters

Access prerequisite

The most common reason a site scores zero on agent-readiness isn’t poor markup, it’s that the site is blocking GPTBot, PerplexityBot, ClaudeBot, or other agent crawlers entirely, often unintentionally via overzealous WAF rules or copy-pasted robots.txt files. If the agent can’t get in, nothing else matters.
What we score

Scored items

We assess whether the major agent crawlers can actually reach and render your site, including the access controls and regional-hosting issues that most often block them unintentionally. The exact checks are part of the audit.
D5 Freshness Signals, 12%

Does your timestamp evidence match reality?

Does your timestamp evidence match reality?

Weight12%
Why it matters

Timestamp truth

LLMs heavily weight recency for time-sensitive queries. A site that claims "updated 2026" with content that contradicts known 2025 events gets discounted as unreliable. A site with a real dateModified aligned to actual content changes gets cited.
What we score

Scored items

We assess whether your freshness signals are truthful and consistent with the content they describe, rather than aspirational timestamps that get a site discounted. The scored items are detailed in the audit.
D6 Entity Authority, 18% default, 15% e-commerce

What evidence of credibility does an LLM find?

What evidence of credibility does an LLM find when weighing whether to cite you?

Weight18%
Why it matters

Citation worthiness

LLMs don’t have direct access to your business’s reputation, they piece it together from third-party signals. Wikipedia entries, Wikidata entity records, recognized industry directories, citation density across the open web, social proof on platforms LLMs index. This is the dimension where most MENA-specific work happens, because the Arabic citation pool is materially less saturated than the English one.
What we score

Scored items

We assess the third-party credibility signals an LLM can find when weighing whether to cite you, with particular attention to the thinner Arabic citation pool. The specific items are part of the audit.
D7 Agentic Conversion / Transactability, 8% default, 15% e-commerce

Can an agent complete your primary conversion on a buyer’s behalf?

Can an agent complete your primary conversion action on a buyer’s behalf, end-to-end?

Weight8%
Why it matters

Transactability

Reframed in v2 around each business’s declared primary conversion, not just checkout. Beyond being discoverable, can an agent actually complete a purchase, book a demo, start a trial, or submit a lead on a buyer’s behalf? Most MENA sites fail this. Multi-step flows requiring manual address selection in Arabic, payment or booking forms with no programmatic exposure, CAPTCHAs that block agentic execution: all common, all blocking. Non-commerce businesses are scored on their own conversion goal, not penalized for missing a cart they shouldn’t have.
What we score

Scored items

We assess whether an agent can complete your declared primary conversion end-to-end on a buyer’s behalf, including regional payment-rail compatibility (Mada, STC Pay, Tabby, Tamara) where commerce applies. The exact checks are part of the audit.
Scoring Bands

Five bands. What your score means in practice.

Each total score maps to a band, and each band maps to a recommended next step.

80–100 · Tier 1

Agent-Native

Sites that score here are cited by AI agents at materially higher rates than competitors. Recommended action: Agentic Growth Retainer to maintain advantage as the framework evolves and competitors catch up.
60–79 · Tier 2

Agent-Friendly

The site is being cited but inconsistently. Specific dimensions need targeted work to move into Agent-Native territory. Recommended action: Implementation engagement focused on weakest dimensions, then Retainer.
40–59 · Tier 3

Legacy-Optimized

The site was built for the search-engine era and works fine there, but agentic discovery is leaking. Most first-time MENA audits land here. Recommended action: full Implementation engagement.
20–39 · Tier 4

Frictional

Multiple dimensions are failing. The site can be discovered but with significant friction; agents that try to engage it will often fail mid-task. Recommended action: foundational Implementation work before competing on higher-order dimensions.
0–19 · Tier 5

Dark Site

The site is effectively invisible to AI agents, usually because of fundamental access blocking, missing structured data, or both. Recommended action: address access blockers immediately, then Implementation.
Score progression0 → 100
Dark Site
0–19
Frictional
20–39
Legacy-Optimized
40–59
Agent-Friendly
60–79
Agent-Native
80–100
Calibration

Weights shift by business profile.

Generic checklists score every site the same way. MAGNET adjusts dimension weights based on the business profile being audited, because the failure modes that matter most differ by category.

Service Business

Default weighting

The default profile, used for agencies, consultancies, B2B services, and most professional services. Weighting is balanced across discovery, trust, and conversion.
E-commerce

Commerce-weighted

Heavier weight on Structured Data and Agentic Conversion: product-level schema and transactability dominate buyer outcomes.
SaaS / Lead-gen

Authority-weighted

Heaviest weight on Content Extractability and Entity Authority: SaaS buyers ask assistants “best tool for X,” so extractable content and third-party authority (G2, Capterra, citations) drive the shortlist. Conversion is scored against demo-booking and trial-start, not checkout.
Regulated Services

Trust-weighted

Elevated weight on Entity Authority and Freshness, where verifiable, current, well-sourced information and recognized regulatory and government anchors carry the most weight (healthcare, finance, legal in KSA and the UAE).

The exact per-profile weighting is part of the audit. The right profile is determined during the audit’s Discovery phase. Hybrid businesses (e.g., e-commerce with a strong content arm) are scored under both profiles, with the lower of the two scores presented as the headline.

Veto Checks

Five failures cap your maximum score.

Some failures are categorical. A site can score perfectly on six dimensions and still be effectively invisible to agents because of one fundamental break. The veto checks identify those breaks and cap the maximum achievable score until they’re fixed.

Veto 01

Not Crawlable

The content isn't reachable at all: an auth wall, infinite scroll with no real URLs, or other access blockers. Agents can't reach it, so dimension scoring is theoretical.

Cap: maximum 25

Veto 02

Crawler Blocking

Blocking the major AI crawlers wholesale (GPTBot, ClaudeBot, PerplexityBot and peers), whether intentional or via a misconfigured WAF.

Cap: maximum 30

Veto 03

JS-Only Rendering

The majority of meaningful content only renders client-side, with no server-rendered HTML for the agent crawlers that don't execute JavaScript. At-scale fetch studies show AI crawlers don't run JavaScript the way Googlebot does.

Cap: maximum 40

Veto 04

Human-in-the-Loop Conversion

The primary conversion can only be completed through synchronous human contact (a phone call, WhatsApp, an in-person visit) with no digital alternative an agent can traverse.

Cap: maximum 40

Veto 05

Zero Schema

No structured data of any kind across the audited pages (no JSON-LD, microdata, or RDFa). A single Organization schema in the head is enough to clear it.

Cap: maximum 50

Boundaries

What MAGNET doesn’t measure.

MAGNET measures agent-readiness specifically. It does not measure overall business health, marketing performance, or brand strength. A site can score 100 on MAGNET and still have a failing business: the score is necessary but not sufficient.

Paid media performance

Separate Nmow service.

Organic search ranking

Overlaps in places, but not the optimization target.

Conversion rate optimization

Different methodology, different team.

Social media presence

Directly impacts D6 Entity Authority but not scored as its own dimension.

Operational fundamentals

Fulfillment quality, customer service, product-market fit. Out of scope.

Brand sentiment

What users feel when they encounter your brand. Different research tradition.
Our Score

We score ourselves on the framework we sell.

nmow.ai’s current MAGNET score, with the per-dimension breakdown. Updated quarterly. The score is generated by the same audit infrastructure used in client engagements: no special treatment, no shortcuts, no aspirational rounding.

Total Score
0%
BandAgent-Native
ProfileService Business
Re-auditQuarterly
Per-Dimension Breakdown
D1 Structured Data
22 / 22
D2 Semantic Markup
17 / 18
D3 Extractability
17 / 18
D4 Connectivity
10 / 10
D5 Freshness
8 / 8
D6 Entity Authority
12 / 14
D7 Agentic Conversion
8 / 10
Where we’re working

D6 Entity Authority is our weakest dimension: Wikipedia and Wikidata entries are pending. D7 Agentic Conversion is partial because Nmow’s primary conversion is consultation booking rather than direct transaction; we score against the booking flow’s agent-readiness, not against e-commerce checkout. (Scores shown are being refreshed under MAGNET v2.)

FAQ

Common questions about the framework.

Why these seven dimensions and not others?

Each dimension addresses a discrete failure mode we’ve seen in production audits across MENA businesses. The framework was iterated against five test sites before launch: these seven were the categories that recurred. Earlier drafts tried four dimensions (too coarse) and eleven (overlapping). Seven is the resolution at which dimensions stay distinct without leaving meaningful gaps.

Why this specific weighting?

Weights reflect the relative impact each dimension has on whether an agent will cite or transact with a site. D1 Structured Data, D3 Content Extractability, and D6 Entity Authority tie for the most weight: structured data, extractable content, and resolvable entity identity are the strongest drivers of whether an agent cites you. D7 carries the least in the default profile because most businesses aren’t transactional, though it rises sharply for e-commerce. Profile-based weight shifts further calibrate to category-specific dynamics. The exact item-level rubric inside each dimension is proprietary to the audit deliverable.

Is MAGNET open-source?

The architecture (the seven dimensions, weighting, banding, and veto logic) is documented publicly on this page. The full 46-item rubric, the per-item scoring criteria, and the automated detection infrastructure are proprietary to the Nmow audit. The reason: the value isn’t in the framework existing, it’s in the rigor of how each item is scored. Publishing the rubric without the scoring discipline would invite the kind of "I read the checklist" surface-level work that motivated us to build a real methodology in the first place.

How often is the framework updated?

Quarterly minor versions; annual major versions. Minor versions adjust item weights and add new items as the agentic landscape evolves. Major versions can change dimension structure if needed: we expect at most one major version every 18 months. The current version is v2.0.

Can I score myself against MAGNET without buying an audit?

You can use the public architecture on this page to estimate where you sit. Reading through the seven dimensions and asking "are we strong here? weak here?" usually places teams within a band. But a real score requires the rubric and the detection infrastructure, both proprietary. The closest you can get without us is: identify which dimensions you’re confident on and which you’re not, and use that as your priority list.

Why is this MENA-calibrated specifically?

Generic GEO frameworks were built against English-language, US/EU-default sites. MAGNET was built against MENA-specific failure modes: bilingual coherence between Arabic and English, regional payment rail support (Mada, STC Pay, Tabby, Tamara) at the markup level, regulated-vertical content disclosure norms in Saudi Arabia and the UAE, geographic IP-blocking that catches LLM crawler infrastructure, Arabic Wikipedia entity scarcity. None of those score in a generic framework. All of them score here.

Score Your Site

See where your site sits on MAGNET.

The Agent-Readiness Audit is a four-week diagnostic that scores your site against the full framework, with a prioritized remediation plan.