DORA and AI: How Financial Services Companies Should Test AI System Resilience
DORA requires ICT resilience testing for financial entities using AI. Learn how Articles 24-27 apply to AI systems and what testing evidence regulators expect.
If you run AI models inside a European financial institution — or sell AI software to one — the Digital Operational Resilience Act is about to make your life more specific.
DORA has been enforceable since January 17, 2025. Unlike some EU regulations that phase in gradually, the ICT resilience testing requirements in Articles 24 through 27 are already active. Financial entities are already expected to demonstrate operational resilience across their entire ICT estate. And increasingly, that estate includes AI systems making credit decisions, detecting fraud, pricing risk, and automating customer interactions.
The problem: DORA's resilience testing framework was written for traditional ICT systems — servers, networks, databases, applications. It wasn't specifically designed for AI. But the obligations are technology-neutral. If an AI model is part of your ICT infrastructure — and if it processes customer data, influences financial decisions, or supports critical business functions — it falls within DORA's scope.
Traditional penetration testing won't catch what can go wrong with AI. A pentest will tell you whether your API is vulnerable to SQL injection. It won't tell you whether your fraud detection model can be manipulated by adversarial transaction patterns, or whether your customer-facing chatbot leaks account information through carefully sequenced prompts.
That gap is where AI-specific resilience testing becomes a regulatory necessity, not just a best practice.
What DORA Requires: The Resilience Testing Framework
DORA establishes a layered approach to ICT resilience testing. Understanding the layers matters because they determine what level of testing rigor applies to your AI systems.
Article 24: General Requirements for ICT Testing
Article 24 requires financial entities to establish, maintain, and review a sound and comprehensive digital operational resilience testing programme. The programme must include a range of assessments, tests, methodologies, practices, and tools — and it must cover all ICT systems and applications supporting critical or important business functions.
The key phrase is "all ICT systems." If your AI model supports a critical business function — and in financial services, fraud detection, credit scoring, AML screening, and algorithmic trading unambiguously qualify — it's within scope.
Article 24 requires that the testing programme be proportionate to the size and risk profile of the entity. A neobank with three AI models has different obligations than a G-SIB with hundreds. But both have obligations. There's no carve-out for "we just use AI for a few things."
Article 25: Testing of ICT Tools and Systems
Article 25 gets specific about what testing must cover. Financial entities shall identify and test all ICT systems and applications using appropriate tests, which may include:
- Vulnerability assessments and scans
- Open source analyses
- Network security assessments
- Gap analyses
- Physical security reviews
- Source code reviews (where feasible)
- Scenario-based testing
- Compatibility testing
- Performance testing
- End-to-end testing
- Penetration testing
This is where traditional testing programmes stop. Vulnerability scans and penetration tests cover the infrastructure hosting your AI models. They don't cover the models themselves.
Consider what "scenario-based testing" means for an AI system. For a traditional application, a scenario might be: "What happens when the database goes down?" For an AI credit scoring model, the scenario should be: "What happens when a loan applicant crafts their application text to manipulate the model's risk assessment?" For an AI chatbot in customer service, it should be: "What happens when a customer probes for other customers' account details through conversational manipulation?"
These are AI-specific resilience scenarios. And DORA's testing requirements, read broadly, already encompass them.
Articles 26-27: Threat-Led Penetration Testing (TLPT)
For the largest and most systemically important financial entities, DORA goes further. Articles 26 and 27 require advanced threat-led penetration testing — modeled on the TIBER-EU framework — at least every three years.
TLPT simulates real-world attack scenarios by credible threat actors. For AI systems, this means testing adversarial attacks that a sophisticated attacker would actually attempt: prompt injection chains, model extraction through API queries, training data inference attacks, and automated bias exploitation.
Most financial entities won't be subject to TLPT requirements for their AI systems specifically. But the principle trickles down: if the most advanced testing framework recognizes AI-specific threats, your baseline testing programme should too.
Why Traditional Penetration Testing Misses AI-Specific Vulnerabilities
A standard penetration test of a financial application will examine network security, API authentication, input validation, session management, and data encryption. These are necessary. They are not sufficient for AI systems.
Here's what traditional testing misses and why it matters for DORA compliance:
Prompt injection as operational disruption risk. If an attacker can inject instructions into your AI system through user inputs, they can potentially alter its behavior — causing it to approve transactions it should deny, deny transactions it should approve, or produce outputs that mislead human operators. Under DORA, this is an operational disruption that affects the integrity of a critical business function. A traditional pentest doesn't test for this. An AI resilience test does.
PII leakage as data integrity risk. AI models can inadvertently memorize and reproduce training data — including customer PII. If your fraud detection model was trained on transaction data that includes customer names and account numbers, and an adversarial query can extract fragments of that data, you have a data integrity and confidentiality incident under DORA's ICT risk management framework. Traditional data loss prevention tools don't monitor this vector. AI-specific testing does.
Bias as reputational and regulatory risk. If your AI credit scoring model systematically disadvantages certain demographic groups, the reputational impact is severe — and under EU consumer protection and anti-discrimination law, the regulatory consequences compound. DORA doesn't explicitly address bias, but the operational risk implications of deploying a biased AI model in a regulated financial context are squarely within DORA's concern about ICT risk affecting financial stability and consumer protection.
Model manipulation as systemic risk. If a coordinated adversarial campaign can degrade your AI model's performance — causing widespread incorrect risk assessments, fraudulent transaction approvals, or market-moving algorithmic trading errors — the systemic risk implications are exactly what DORA was designed to prevent. Testing your AI model's resilience against adversarial degradation is testing your operational resilience.
Mapping AI Endpoint Testing to DORA Resilience Categories
To satisfy DORA's testing requirements for AI systems, you need to map specific AI vulnerabilities to the resilience categories the regulation recognizes.
| AI Test Category | DORA Resilience Category | Regulatory Article | Evidence Produced |
|---|---|---|---|
| Prompt injection | Operational disruption, system integrity | Art. 25 scenario-based testing | Attack variants tested, bypass rates, guardrail effectiveness |
| PII / data leakage | Data confidentiality and integrity | Art. 25 vulnerability assessment | Leakage vectors tested, extraction success rates, data categories exposed |
| Bias detection | Operational risk, consumer harm | Art. 25 end-to-end testing | Demographic parity metrics, score distribution analysis |
| Toxicity / harmful output | Reputational risk, consumer protection | Art. 25 scenario-based testing | Output safety rates, category-level pass/fail, severity classification |
| Model robustness | System availability and reliability | Art. 25 performance testing | Consistency under perturbation, degradation under load, failover behavior |
| System prompt extraction | Intellectual property, security controls | Art. 25 penetration testing | Extraction attempt success rate, control effectiveness |
This mapping turns abstract DORA requirements into concrete, auditable testing activities. When a regulator or auditor asks how you've tested your AI system's operational resilience, you can point to specific test results mapped to specific articles — not a generic policy document.
Case Study: A European Neobank Facing DORA AI Scrutiny
A European neobank uses AI for three critical functions: real-time fraud detection on card transactions, automated credit decisioning for personal loans, and a customer-facing chatbot for account inquiries and basic servicing.
When their national competent authority conducted a supervisory review under DORA, the conversation about ICT resilience testing was straightforward for traditional systems. The neobank had recent penetration test results, vulnerability scan reports, and incident response test documentation.
Then the regulator asked about the AI models.
"Your fraud detection model processes every card transaction in real-time. It's clearly a critical ICT function. Where is the resilience testing evidence for this model?"
The neobank had trained the model. They had accuracy metrics from validation data. They had monitoring dashboards showing model performance over time. What they didn't have was adversarial resilience testing — evidence that the model behaves correctly when someone is actively trying to manipulate it.
The specific gaps the regulator identified:
Fraud detection model. No evidence that the model had been tested against adversarial transaction patterns — sequences of transactions designed to train the model to misclassify future fraudulent transactions as legitimate. No evidence of testing against model evasion attacks — transactions structured to fall just outside the model's learned fraud patterns.
Credit decisioning model. No evidence of bias testing across protected characteristics. No evidence of robustness testing — whether small changes to application inputs produced disproportionate changes to credit decisions. No evidence that the model couldn't be gamed by applicants who understood its decision boundaries.
Customer chatbot. No evidence of prompt injection testing. No evidence that the chatbot couldn't be manipulated into revealing other customers' account information. No evidence of system prompt extraction testing — whether an attacker could reverse-engineer the chatbot's instructions.
The regulator didn't issue a formal finding — yet. They gave the neobank 90 days to demonstrate AI-specific resilience testing and incorporate it into their ongoing testing programme.
Three months. That's the timeline when a regulator decides your AI testing programme is insufficient. The neobank scrambled, running comprehensive adversarial tests across all three models, documenting results, implementing remediations, and re-testing. Total effort: approximately 200 hours of work for their engineering and compliance teams, plus external testing costs.
Had they incorporated AI resilience testing into their DORA programme from the beginning, the incremental effort would have been a fraction of that — a few hours per model, per testing cycle.
The Intersection: DORA + SOC 2 for Fintechs Serving US and EU Markets
Many fintechs operate across jurisdictions. A company headquartered in the US with European customers needs SOC 2 for American enterprise buyers and DORA compliance for European regulators. The good news: the testing methodologies are nearly identical. The frameworks just use different language.
| Test | DORA Requirement | SOC 2 Requirement | Same Evidence? |
|---|---|---|---|
| Prompt injection | Art. 25 scenario-based testing | CC9.2 risk mitigation | Yes |
| PII leakage | Art. 25 vulnerability assessment | CC6.5 data protection | Yes |
| Bias detection | Art. 25 end-to-end testing | — (not required) | Partial |
| Toxicity | Art. 25 scenario-based testing | CC9.2 risk mitigation | Yes |
| Model monitoring | Art. 24 ongoing testing programme | CC4.1 monitoring | Yes |
| Adversarial robustness | Art. 25 penetration testing | CC7.1 threat detection | Yes |
For four of the six AI testing categories, the same test results can be mapped to both DORA and SOC 2 controls. Bias detection is a DORA concern that SOC 2 doesn't explicitly address, and some SOC 2 controls around access management (CC6.1) don't have direct DORA analogues in the testing articles. But the core testing evidence — adversarial manipulation, data leakage, harmful outputs — serves both.
The practical approach: run one comprehensive AI testing programme and produce framework-specific evidence packs from the results. Test once, map to many. This is particularly relevant for fintechs that already generate SOC 2 evidence and need to extend their compliance posture to cover DORA without doubling their testing workload.
For a detailed walkthrough of how AI testing maps to SOC 2 controls, see our SOC 2 AI Security Testing Guide. For how the same testing evidence supports ISO 42001 requirements — increasingly relevant for fintechs selling into the Microsoft ecosystem — see our ISO 42001 Compliance Guide.
Practical Next Steps for Financial Services Companies
DORA is already enforceable. If your AI systems support critical or important business functions and you haven't started AI-specific resilience testing, you're behind. Here's the minimal viable approach:
1. Inventory your AI systems. List every AI model, ML pipeline, and LLM-powered feature that supports a critical or important business function. For each, document: what it does, what data it processes, who relies on its outputs, and what happens if it fails or produces incorrect results.
2. Classify by criticality. Not every AI model needs the same testing rigor. A model that auto-categorizes internal support tickets has different risk characteristics than one that makes credit decisions. Map each model to DORA's critical function taxonomy and allocate testing resources accordingly.
3. Run baseline adversarial tests. For each critical AI system, test against the core vulnerability categories: prompt injection, data leakage, bias, toxicity, and robustness under perturbation. Document test methodologies, attack variants used, and results.
4. Map results to DORA articles. Take your test results and explicitly map them to Articles 24-25. This transforms raw test data into regulatory evidence. Your compliance team and your regulator both need this mapping to exist.
5. Integrate into your testing programme. DORA requires ongoing testing, not one-time assessments. Build AI resilience testing into your existing ICT testing programme with a defined cadence — quarterly at minimum for critical models.
6. Prepare for regulatory dialogue. When your supervisor asks about AI resilience — and they will — you want to show them a documented testing programme with historical results, not a plan to start testing. The companies that demonstrate proactive AI resilience testing will have fundamentally different supervisory experiences than those caught without evidence.
The Regulatory Trajectory
DORA's testing requirements will only become more AI-specific over time. The European Supervisory Authorities are developing Regulatory Technical Standards (RTS) and Implementing Technical Standards (ITS) that provide more granular guidance on testing methodologies. As AI becomes more prevalent in financial services, expect these standards to explicitly reference AI-specific testing requirements.
The trajectory mirrors what happened with cybersecurity. A decade ago, penetration testing was considered advanced. Now it's table stakes — regulators expect it as a baseline. AI resilience testing is on the same trajectory, accelerated by the speed of AI adoption in financial services.
Companies that build AI testing into their DORA compliance programmes now won't just satisfy current requirements. They'll be ahead of the curve when more prescriptive AI-specific standards arrive.
For how the EU AI Act adds additional obligations for financial services AI — particularly around high-risk classification and robustness requirements — see our EU AI Act Testing Evidence Guide. For Swiss financial institutions navigating FINMA requirements alongside DORA and EU AI Act, see our FINMA AI Compliance Guide.