TLDR
Automated accessibility tools detect between 30% and 57% of real WCAG issues, depending on how you measure. The UK Government Digital Service found the best tool it tested found only 40% of known barriers. AccessiBe's 95% claim was explicitly banned by the FTC in 2025. If your compliance strategy relies on a single automated tool, you are operating with a false sense of coverage.
- axe-core
- An open-source accessibility testing engine developed by Deque Systems and released under the Mozilla Public License 2.0. It powers Google Lighthouse's accessibility checks, Microsoft Accessibility Insights, and hundreds of other tools. Downloaded over 3 billion times, it is the de facto standard engine for automated accessibility scanning. axe-core reliably detects 57% of real-world accessibility issues by volume — a figure derived from Deque's 2021 analysis of 2,000+ audits and 300,000+ issues.
DEFINITION
- Intelligent Guided Tests (IGTs)
- Semi-automated tests that combine automated scanning with developer-guided steps to evaluate issues that cannot be determined by a machine alone. Deque's IGTs extend axe-core's coverage from 57% to approximately 80% of issues by volume. They require human input but produce deterministic results — unlike fully manual auditing, which depends entirely on the evaluator's judgment and expertise.
DEFINITION
- False Positive
- An error reported by an automated tool that does not represent a real accessibility barrier. False positives waste developer time and can erode trust in scanning tools. Accessible.org's analysis found that roughly 45% of WCAG success criteria produce likely false positives when tested by automation — a separate problem from the larger issue of criteria that cannot be tested by machines at all.
DEFINITION
- WCAG Success Criteria
- The individual testable requirements that make up WCAG (Web Content Accessibility Guidelines). WCAG 2.1 Level AA includes 50 success criteria: 30 at Level A and 20 at Level AA. Only roughly 30% of these criteria — about 15 to 16 of the 50 — can be meaningfully tested by automated tools. The remaining 70% require human judgment to evaluate correctly.
DEFINITION
- ARIA
- Accessible Rich Internet Applications. A W3C technical specification that adds semantic meaning to HTML elements for assistive technologies. ARIA is intended to fill gaps where native HTML semantics are insufficient. However, the WebAIM Million study consistently shows that pages using ARIA have more than twice as many detected errors as pages without it — 57 errors per page versus 27 — because developers frequently misapply ARIA attributes, creating accessibility barriers rather than removing them.
DEFINITION
The Three Ways to Measure Automated Testing Coverage
Ask three accessibility professionals what percentage of issues automated tools catch and you will get three different numbers: 30%, 57%, and 40%. All three are correct. They are measuring different things.
By WCAG success criteria: WCAG 2.1 Level AA has 50 success criteria. Roughly 30% of them — about 15 to 16 criteria — can be meaningfully tested by automated tools. Accessible.org’s detailed analysis goes further, finding only about 13% can be flagged with high accuracy, with another 45% producing likely false positives, and 42% completely undetectable by any machine.
By real-world issue volume: Deque analyzed more than 2,000 audits, 13,000+ pages, and roughly 300,000 issues. They found that axe-core detected 57.38% of actual issues found in those audits. This number is higher than the criteria-based figure because certain automatically detectable issues — particularly color contrast, which accounts for about 30% of all errors — occur at very high frequency. Color contrast alone drags up the “percentage of issues caught” even though it represents just one criterion.
By the UK GDS benchmark: The UK Government Digital Service evaluated 13 different accessibility tools against a page with 142 deliberately introduced barriers. The best tool found 40% of them. The worst found 13%. GDS recommends using 2-3 tools in combination to reach approximately 50% coverage.
What all three measurements agree on: no automated tool in production use today catches more than about half of real accessibility issues, and most catch significantly less.
What axe-core Actually Detects
axe-core is the most widely deployed accessibility testing engine in the world, with over 3 billion downloads. It powers Google Lighthouse, Microsoft Accessibility Insights, and dozens of other tools. When people talk about “automated accessibility testing,” they are usually talking about axe-core or one of the tools built on it.
Deque’s 57% detection rate figure is the most credible available because it is based on actual audit data rather than theoretical calculations. Here is what that means in practice:
- axe-core reliably catches color contrast failures, which are the single most common accessibility error (79.1% of home pages in the 2025 WebAIM Million study)
- It catches missing alt text on images, missing form labels, empty links, empty buttons, and missing document language declarations
- Combined, these six failure types account for 96% of all detected errors in the WebAIM Million data
What axe-core cannot catch includes:
- Whether alt text is meaningful (it can detect missing alt text, but not whether “image123.jpg” adequately describes a product photo)
- Keyboard traps — situations where users cannot navigate away from a focused element
- Logical reading order and content sequence
- Whether complex ARIA widgets actually work correctly with real screen readers
- Video and audio content accessibility (captions, transcripts, audio descriptions)
- Cognitive accessibility issues (consistent navigation, error prevention, reading level)
- Whether interactive elements are reachable and operable via all input methods
These are not edge cases. They represent many of the barriers that block real users.
Why Vendor Claims Diverge From Reality
The gap between vendor marketing numbers and third-party benchmarks has been documented extensively, and the FTC has now formalized it as a legal matter.
AccessiBe marketed its overlay product as delivering “95% compliance within 48 hours” — 30% immediately, the remaining 70% processed by AI over 48 hours. The FTC’s January 2025 enforcement action explicitly banned this claim, finding that the widget “fails or has failed to make basic and essential website components like menus, headings, tables, images, recordings, and more, compliant with WCAG.” The FTC imposed a $1 million fine and a 20-year compliance order.
UserWay has been cited in reviews claiming to fix “more than 90% of accessibility issues automatically.” AudioEye, to its credit, is more measured — stating on its pricing page that automation addresses approximately 50% of issues, excluding expert testing. EqualWeb acknowledges the 30-40% industry reality in its own documentation.
The pattern: vendors claiming the highest coverage numbers offer the weakest methodology for validating those claims. Vendors offering more measured estimates (Deque, AudioEye at least partially, EqualWeb) tend to have better underlying data.
One useful signal: any claim above 80% that does not include “with semi-automated guided tests” or “with manual expert review” should be treated with skepticism. The ceiling for pure automation is around 57% by volume, and only with IGTs does coverage approach 80%.
The ARIA Problem
The WebAIM Million study has now documented this for seven consecutive years: pages that use ARIA attributes have more than twice as many detected errors as pages without ARIA. In the 2025 data, ARIA pages averaged 57 errors per page versus 27 for non-ARIA pages.
This is counterintuitive. ARIA is the W3C’s specification for making complex web components accessible to assistive technologies. It should reduce accessibility errors, not increase them.
What the data reflects is developer behavior, not a flaw in ARIA itself. Developers add ARIA attributes to their HTML because they believe it improves accessibility — and sometimes it does. But ARIA is strictly an additive layer on top of existing HTML semantics, and when misapplied, it overrides correct native semantics with incorrect custom ones. A role="button" on a <div> element does not automatically make that element behave like a button — it still lacks keyboard activation, focus management, and all the other behaviors that native <button> elements provide by default.
The implications for automated testing: axe-core can detect some ARIA misuse, but much of it requires human testing with real assistive technologies. Running a screen reader through your site’s modal dialogs, dropdown menus, and custom widgets is the only reliable way to know if ARIA implementations actually work.
Overlays make this problem worse. They inject ARIA attributes into sites at runtime to attempt automated fixes, which can conflict with existing HTML semantics and create new barriers for screen reader users. One in four web accessibility lawsuits in 2024 targeted sites running overlay widgets.
What a Realistic Scanning Strategy Looks Like
Given these limitations, here is what a practical automated testing approach achieves:
Development-stage scanning with axe-core in CI/CD catches regressions before they reach production. Every component merge, content deployment, and design change should pass automated checks. This eliminates the automatically detectable issues so they do not accumulate.
Scheduled monitoring catches new issues introduced by content management, third-party script updates, and dynamic content changes. Running weekly scans and alerting on new failures prevents technical debt from building up.
Combining 2-3 tools for initial audits reaches approximately 50% coverage. axe-core and WAVE use different engines and catch partially overlapping issue sets. Running both on the same page surfaces more issues than either finds alone.
Manual testing closes the remaining gap. Keyboard-only navigation testing, screen reader testing with NVDA or JAWS on Windows and VoiceOver on macOS, and testing with real users who rely on assistive technologies are the only ways to evaluate the issues that automation cannot reach.
The goal of automated testing is not to achieve compliance — no automated tool can do that. The goal is to eliminate the detectable problems systematically, so that human testing time is spent on the genuinely difficult issues that require judgment.
A11yProof uses axe-core as its scanning engine, runs scheduled scans to catch regressions, and surfaces each issue with the specific WCAG criterion violated and a code-level fix recommendation. Plans start at $29/month for a single site. Automated scanning is the first layer — not the last.
Q&A
What percentage of WCAG issues can automated tools detect?
It depends on how you measure. By WCAG success criteria, roughly 30% of the 50 WCAG 2.1 AA criteria can be meaningfully tested by automation. By real-world issue volume, Deque's axe-core detects 57% of actual issues found in audits, because automatically detectable failures like color contrast occur at high frequency. The UK GDS benchmark found the best available tool detected only 40% of known barriers on a test page.
Q&A
How accurate is axe-core for accessibility testing?
axe-core catches 57% of accessibility issues by volume, based on Deque's 2021 study of 2,000+ audits and 300,000+ issues. With Intelligent Guided Tests added, coverage rises to 80%. These are real-world figures from actual audits, not theoretical calculations. axe-core has a strong reputation for low false positives — when it flags an issue, it is almost always correct. What it cannot do is evaluate issues requiring human judgment: meaningful alt text, keyboard traps in complex widgets, cognitive accessibility, and captioning quality.
Q&A
Do accessibility overlays detect more issues than scanners?
No. Overlay tools like accessiBe, UserWay, and EqualWeb use the same underlying detection engines as standalone scanners, achieving 30-40% real-world coverage. They then attempt to auto-fix detected issues via runtime JavaScript injection. The FTC explicitly banned accessiBe's 95% compliance claim in January 2025 as deceptive. In 2024, 1,023 businesses were sued for ADA violations while running overlay widgets, confirming that overlay coverage claims do not translate to legal protection or actual WCAG conformance.
Q&A
Why does ARIA misuse increase accessibility errors?
ARIA attributes are intended to add semantic meaning for assistive technologies, but they override native HTML semantics and require precise implementation. A single misapplied ARIA role can make a correctly-structured component inaccessible. The WebAIM Million study found pages with ARIA averaged 57 detected errors per page versus 27 for pages without it. Overlays heavily inject ARIA attributes into sites at runtime, which partially explains why overlay-equipped sites continue to generate accessibility lawsuits.
Want to learn more?
Frequently asked