What Automated Accessibility Testing Actually Catches (And…

axe-core: An open-source accessibility testing engine developed by Deque Systems and released under the Mozilla Public License 2.0. It powers Google Lighthouse's accessibility checks, Microsoft Accessibility Insights, and hundreds of other tools. Downloaded over 3 billion times, it is the de facto standard engine for automated accessibility scanning. axe-core reliably detects 57% of real-world accessibility issues by volume — a figure derived from Deque's 2021 analysis of 2,000+ audits and 300,000+ issues.

Intelligent Guided Tests (IGTs): Semi-automated tests that combine automated scanning with developer-guided steps to evaluate issues that cannot be determined by a machine alone. Deque's IGTs extend axe-core's coverage from 57% to approximately 80% of issues by volume. They require human input but produce deterministic results — unlike fully manual auditing, which depends entirely on the evaluator's judgment and expertise.

False Positive: An error reported by an automated tool that does not represent a real accessibility barrier. False positives waste developer time and can erode trust in scanning tools. Accessible.org's analysis found that roughly 45% of WCAG success criteria produce likely false positives when tested by automation — a separate problem from the larger issue of criteria that cannot be tested by machines at all.

WCAG Success Criteria: The individual testable requirements that make up WCAG (Web Content Accessibility Guidelines). WCAG 2.1 Level AA includes 50 success criteria: 30 at Level A and 20 at Level AA. Only roughly 30% of these criteria — about 15 to 16 of the 50 — can be meaningfully tested by automated tools. The remaining 70% require human judgment to evaluate correctly.

ARIA: Accessible Rich Internet Applications. A W3C technical specification that adds semantic meaning to HTML elements for assistive technologies. ARIA is intended to fill gaps where native HTML semantics are insufficient. However, the WebAIM Million study consistently shows that pages using ARIA have more than twice as many detected errors as pages without it — 57 errors per page versus 27 — because developers frequently misapply ARIA attributes, creating accessibility barriers rather than removing them.

The Three Ways to Measure Automated Testing Coverage

Ask three accessibility professionals what percentage of issues automated tools catch and you will get three different numbers: 30%, 57%, and 40%. All three are correct. They are measuring different things.

By WCAG success criteria: WCAG 2.1 Level AA has 50 success criteria. Roughly 30% of them — about 15 to 16 criteria — can be meaningfully tested by automated tools. Accessible.org’s detailed analysis goes further, finding only about 13% can be flagged with high accuracy, with another 45% producing likely false positives, and 42% completely undetectable by any machine.

By real-world issue volume: Deque analyzed more than 2,000 audits, 13,000+ pages, and roughly 300,000 issues. They found that axe-core detected 57.38% of actual issues found in those audits. This number is higher than the criteria-based figure because certain automatically detectable issues — particularly color contrast, which accounts for about 30% of all errors — occur at very high frequency. Color contrast alone drags up the “percentage of issues caught” even though it represents just one criterion.

By the UK GDS benchmark: The UK Government Digital Service evaluated 13 different accessibility tools against a page with 142 deliberately introduced barriers. The best tool found 40% of them. The worst found 13%. GDS recommends using 2-3 tools in combination to reach approximately 50% coverage.

What all three measurements agree on: no automated tool in production use today catches more than about half of real accessibility issues, and most catch significantly less.

What axe-core Actually Detects

axe-core is the most widely deployed accessibility testing engine in the world, with over 3 billion downloads. It powers Google Lighthouse, Microsoft Accessibility Insights, and dozens of other tools. When people talk about “automated accessibility testing,” they are usually talking about axe-core or one of the tools built on it.

Deque’s 57% detection rate figure is the most credible available because it is based on actual audit data rather than theoretical calculations. Here is what that means in practice:

axe-core reliably catches color contrast failures, which are the single most common accessibility error (79.1% of home pages in the 2025 WebAIM Million study)
It catches missing alt text on images, missing form labels, empty links, empty buttons, and missing document language declarations
Combined, these six failure types account for 96% of all detected errors in the WebAIM Million data

What axe-core cannot catch includes:

Whether alt text is meaningful (it can detect missing alt text, but not whether “image123.jpg” adequately describes a product photo)
Keyboard traps — situations where users cannot navigate away from a focused element
Logical reading order and content sequence
Whether complex ARIA widgets actually work correctly with real screen readers
Video and audio content accessibility (captions, transcripts, audio descriptions)
Cognitive accessibility issues (consistent navigation, error prevention, reading level)
Whether interactive elements are reachable and operable via all input methods

These are not edge cases. They represent many of the barriers that block real users.

Why Vendor Claims Diverge From Reality

The gap between vendor marketing numbers and third-party benchmarks has been documented extensively, and the FTC has now formalized it as a legal matter.

AccessiBe marketed its overlay product as delivering “95% compliance within 48 hours” — 30% immediately, the remaining 70% processed by AI over 48 hours. The FTC’s January 2025 enforcement action explicitly banned this claim, finding that the widget “fails or has failed to make basic and essential website components like menus, headings, tables, images, recordings, and more, compliant with WCAG.” The FTC imposed a $1 million fine and a 20-year compliance order.

UserWay has been cited in reviews claiming to fix “more than 90% of accessibility issues automatically.” AudioEye, to its credit, is more measured — stating on its pricing page that automation addresses approximately 50% of issues, excluding expert testing. EqualWeb acknowledges the 30-40% industry reality in its own documentation.

The pattern: vendors claiming the highest coverage numbers offer the weakest methodology for validating those claims. Vendors offering more measured estimates (Deque, AudioEye at least partially, EqualWeb) tend to have better underlying data.

One useful signal: any claim above 80% that does not include “with semi-automated guided tests” or “with manual expert review” should be treated with skepticism. The ceiling for pure automation is around 57% by volume, and only with IGTs does coverage approach 80%.

The ARIA Problem

The WebAIM Million study has now documented this for seven consecutive years: pages that use ARIA attributes have more than twice as many detected errors as pages without ARIA. In the 2025 data, ARIA pages averaged 57 errors per page versus 27 for non-ARIA pages.

This is counterintuitive. ARIA is the W3C’s specification for making complex web components accessible to assistive technologies. It should reduce accessibility errors, not increase them.

What the data reflects is developer behavior, not a flaw in ARIA itself. Developers add ARIA attributes to their HTML because they believe it improves accessibility — and sometimes it does. But ARIA is strictly an additive layer on top of existing HTML semantics, and when misapplied, it overrides correct native semantics with incorrect custom ones. A role="button" on a <div> element does not automatically make that element behave like a button — it still lacks keyboard activation, focus management, and all the other behaviors that native <button> elements provide by default.

The implications for automated testing: axe-core can detect some ARIA misuse, but much of it requires human testing with real assistive technologies. Running a screen reader through your site’s modal dialogs, dropdown menus, and custom widgets is the only reliable way to know if ARIA implementations actually work.

Overlays make this problem worse. They inject ARIA attributes into sites at runtime to attempt automated fixes, which can conflict with existing HTML semantics and create new barriers for screen reader users. One in four web accessibility lawsuits in 2024 targeted sites running overlay widgets.

What a Realistic Scanning Strategy Looks Like

Given these limitations, here is what a practical automated testing approach achieves:

Development-stage scanning with axe-core in CI/CD catches regressions before they reach production. Every component merge, content deployment, and design change should pass automated checks. This eliminates the automatically detectable issues so they do not accumulate.

Scheduled monitoring catches new issues introduced by content management, third-party script updates, and dynamic content changes. Running weekly scans and alerting on new failures prevents technical debt from building up.

Combining 2-3 tools for initial audits reaches approximately 50% coverage. axe-core and WAVE use different engines and catch partially overlapping issue sets. Running both on the same page surfaces more issues than either finds alone.

Manual testing closes the remaining gap. Keyboard-only navigation testing, screen reader testing with NVDA or JAWS on Windows and VoiceOver on macOS, and testing with real users who rely on assistive technologies are the only ways to evaluate the issues that automation cannot reach.

The goal of automated testing is not to achieve compliance — no automated tool can do that. The goal is to eliminate the detectable problems systematically, so that human testing time is spent on the genuinely difficult issues that require judgment.

A11yProof uses axe-core as its scanning engine, runs scheduled scans to catch regressions, and surfaces each issue with the specific WCAG criterion violated and a code-level fix recommendation. Plans start at $29/month for a single site. Automated scanning is the first layer — not the last.

Q&A

What percentage of WCAG issues can automated tools detect?

It depends on how you measure. By WCAG success criteria, roughly 30% of the 50 WCAG 2.1 AA criteria can be meaningfully tested by automation. By real-world issue volume, Deque's axe-core detects 57% of actual issues found in audits, because automatically detectable failures like color contrast occur at high frequency. The UK GDS benchmark found the best available tool detected only 40% of known barriers on a test page.

Q&A

How accurate is axe-core for accessibility testing?

axe-core catches 57% of accessibility issues by volume, based on Deque's 2021 study of 2,000+ audits and 300,000+ issues. With Intelligent Guided Tests added, coverage rises to 80%. These are real-world figures from actual audits, not theoretical calculations. axe-core has a strong reputation for low false positives — when it flags an issue, it is almost always correct. What it cannot do is evaluate issues requiring human judgment: meaningful alt text, keyboard traps in complex widgets, cognitive accessibility, and captioning quality.

Q&A

Do accessibility overlays detect more issues than scanners?

No. Overlay tools like accessiBe, UserWay, and EqualWeb use the same underlying detection engines as standalone scanners, achieving 30-40% real-world coverage. They then attempt to auto-fix detected issues via runtime JavaScript injection. The FTC explicitly banned accessiBe's 95% compliance claim in January 2025 as deceptive. In 2024, 1,023 businesses were sued for ADA violations while running overlay widgets, confirming that overlay coverage claims do not translate to legal protection or actual WCAG conformance.

Q&A

Why does ARIA misuse increase accessibility errors?

ARIA attributes are intended to add semantic meaning for assistive technologies, but they override native HTML semantics and require precise implementation. A single misapplied ARIA role can make a correctly-structured component inaccessible. The WebAIM Million study found pages with ARIA averaged 57 detected errors per page versus 27 for pages without it. Overlays heavily inject ARIA attributes into sites at runtime, which partially explains why overlay-equipped sites continue to generate accessibility lawsuits.

Like what you're reading?

Try A11yProof free — start scanning your site today.

See plans & pricing

Want to learn more?

Learn More

Frequently asked

Common questions before you try it

What is the actual detection rate for automated accessibility tools?

Between 30% and 57%, depending on what you are measuring. The 30% figure applies to WCAG success criteria — only about 15 of the 50 WCAG 2.1 AA criteria are automatable. The 57% figure is Deque's finding for real-world issue volume: because high-frequency errors like color contrast are automatically detectable, automated tools catch a larger share of issues than their criteria coverage alone suggests. The UK GDS put the practical ceiling at 40% in its direct tool evaluation. No credible source supports claims above 80%, and 80% requires combining automated scanning with semi-automated guided tests.

Can I rely on automated testing alone for WCAG compliance?

No. Automated tools miss at least 40-70% of WCAG issues, including many of the most serious barriers. Keyboard traps, meaningful alt text quality, logical reading order, cognitive accessibility, video caption accuracy, and complex widget interactions all require human judgment. The UK GDS recommendation is to use 2-3 tools in combination for scanning, then supplement with manual testing using assistive technologies. Automated testing is best used as a continuous monitoring layer to catch regressions, not as a compliance gate.

Why do vendor claims about automated coverage vary so widely?

Vendors measure different things and use different baselines. Claims of '90%+' typically refer to the number of issues the tool attempts to address, not the percentage it successfully fixes. Claims built around 'WCAG criteria covered' count only detectable criteria against the total, without weighting for how common those issues are in practice. The FTC's action against accessiBe established that '95% compliance within 48 hours' constitutes deceptive marketing when the actual coverage is 30-40%. When evaluating any vendor's detection claims, ask: what is the baseline, what counts as 'detected,' and what third-party data supports the number.

How do I combine automated tools for better coverage?

The UK GDS found that combining 2-3 automated tools reaches approximately 50% coverage. axe-core and WAVE use different detection engines and catch partially overlapping sets of issues, so running both improves coverage beyond what either finds alone. Pope Tech runs the WAVE engine at scale. Siteimprove adds enterprise crawling. For development workflows, axe-core integrates into CI/CD pipelines, browser extensions, and IDEs. The ceiling for purely automated coverage is around 50-57% — after that, manual testing with real assistive technologies is the only path to higher confidence.

Is ARIA always a sign of accessibility problems?

Not always, but misapplied ARIA is one of the most common sources of introduced accessibility barriers. The problem is not ARIA itself — it is necessary for complex widgets like modal dialogs, comboboxes, and live regions where native HTML semantics are insufficient. The problem is that developers add ARIA to improve accessibility scores without fully understanding the ARIA authoring practices, and the result is worse than no ARIA at all. The WebAIM Million's consistent finding — that ARIA-equipped pages have more than twice the errors of pages without ARIA — is a symptom of this pattern. A correct rule of thumb: do not use ARIA to solve a problem that a native HTML element would solve correctly.

How does A11yProof handle automated testing coverage?

A11yProof uses axe-core as the scanning engine, which delivers the 57% issue detection rate validated by Deque's audit data. Scans run on a schedule to catch regressions, and each flagged issue includes the specific WCAG criterion violated and a code-level fix recommendation. We are direct about what automated testing cannot catch: issues requiring assistive technology testing, judgment calls on meaningful alt text, and cognitive accessibility evaluation all require human review. Our goal is to eliminate the automatically detectable issues so your manual auditing time focuses on the genuinely hard problems.

What Automated Accessibility Testing Actually Catches (And What It Misses)