
By Elizabeth McCaul
Former Member of the Supervisory Board of the European Central Bank (ECB)
- Elizabeth McCaul delivered the following speech at ‘European Anti-Financial Crime Summit 2026’ #EAFCS2026. Focused on: ‘The use of Sandboxes and Digital Twin Environments for FIs in Context of AMLA and the AML Regulation’, it was extremely well-received by attendees. AML Intelligence is now publishing her speech in full.
WE are moving from a world where AML supervision evaluates frameworks, to one where performance is evaluated.
Supervisors do not experience banks through strategy decks or policy documents. They experience them through data, decisions, and outcomes — and, increasingly, through what goes wrong.
But to understand where we are heading, it helps to understand where we have been.
A Personal Starting Point: September 2001
During the September 11 attacks, I served as New York’s Superintendent of Banks..
In the days that followed, in addition to dealing with shock, grief and horror at the loss of life, the need to stabilize payment systems and get markets up and running, we urgently started cooperation forums between law enforcement, bankers, and supervisors. Our goal was urgent and clear: build capability to recognize nefarious patterns of behavior and strengthen controls to identify them across the financial services landscape to prevent terrorists — as well as human and drug traffickers and money launderers — from accessing or exploiting the financial system.
It was a defining moment — not only for national security, but for the architecture of financial crime compliance as we know it today.
The systems built in the early days were, by any honest assessment, rudimentary and yet are still the backbone of AML infrastructure today. They rely on broad, rule-based scenarios designed to act as a coarse filter across transaction activity — essentially a blunt instrument applied to an extraordinarily complex problem. The rate of false positives is staggering. The volume of activity requiring significant human review is immense. Analysts become overwhelmed. The underlying typologies targeted were relatively simple compared to what criminal networks would eventually devise to evade them.
But they were a beginning. And they mattered.
Twenty-Five Years of Evolution
What has happened since then is nothing short of extraordinary.
In the early 2000s, transaction monitoring was almost entirely rules-based: fixed thresholds, static watchlists, and human review of each alert. Systems were largely siloed — payments here, accounts there, customer data somewhere else. Cross-border visibility was minimal. The concept of a risk score that updated in real time as customer behaviour changed was, at that point, science fiction.
By the 2010s, the industry began layering in analytics. Peer grouping, network analysis, and early machine learning models started to emerge. Regulators began asking not just whether controls existed, but whether they were calibrated. The concept of a risk-based approach took hold — though implementation remained uneven across jurisdictions.
Today, those same legacy systems, albeit layered now with analytics and stronger risk control frameworks form the backbone of AML compliance. Today, the possibility machine learning and artificial intelligence tools can bring is genuinely breathtaking in their scope and sophistication.
Such tools can enable institutions to detect patterns across billions of transactions in near real time — identifying behavioural anomalies, network linkages, and typology shifts that no human analyst and no static scenario could have surfaced. Graph analytics can map relationships showing visual networks across complex corporate structures and jurisdictions. Natural language processing can scan unstructured data for emerging risk signals. Dynamic risk scoring can update a customer profile continuously across the full lifecycle of a relationship, not just at onboarding.
And yet — despite all of this — a fundamental problem persists.
The Gap That Remains
Over time, one lesson from the supervisory side becomes very clear: there is often a meaningful gap between how institutions describe their control frameworks… and how those frameworks actually perform.
This is not a criticism. It is a structural reality.
Because historically, supervision has relied on periodic reviews, sample-based testing, and expert judgment applied to systems that are anything but static. The scenario based models worked in a simpler world. But today’s system is faster, more complex, and constantly evolving — and financial crime risks evolve with it.
Which leads to the fundamental question:
How do we move from assessing whether controls exist… to proving whether they actually work?
What AMLA Is Now Demanding
AMLA Chair Bruna Szego has been unambiguous on this point. At the European Banking Summit in January, she described the challenge in terms that should concentrate minds across every boardroom in Europe: criminal networks now operate seamlessly across borders, exploiting gaps between national systems and using technologies that make transactions harder to trace. Fragmentation across 27 legal frameworks, has allowed criminals to exploit inconsistencies for too long. That era is ending.
AMLA has made clear it is interested not just in whether reports are filed, but in whether those reports constitute decision-grade intelligence that law enforcement can actually act upon.
Derville Rowland, AMLA Executive Board member has articulated the technology dimension with particular clarity. How firms comply with the rules is up to them — traditional programmes or regtech tools — but what is essential is that the means used are effective, and that such effectiveness can be demonstrated to supervisors. The framework is technology-neutral. The obligation to prove it works is not.
Companies across all sectors will have to raise their AML standards materially to comply with the EU AML Regulation from 2027. The supervisory dialogue is changing — from ‘show us your scenarios’ to ‘show us your coverage, your gaps, and your evidence of continuous improvement.’
Change
For decades, AML compliance has been built on policies, procedures, governance, and sample testing. Under AMLA, the question is changing:
Not – ‘Do you have the right framework?’
But: ‘Do your systems actually work — and can you prove it?’
Here is the uncomfortable truth: most institutions today do not have a complete view of their own performance. They cannot determine, with confidence, what proportion of relevant activity is actually detected, where their systems fail to capture risk, and how their controls respond to new or evolving typologies across the full population of activity.
This is not simply a data challenge. It is a limitation of the environments in which systems are tested.
A Lesson from Credit: When History Offers No Guide
The AML world is not the first to face this challenge. Credit risk modellers confronted a version of it during COVID-19 — and the response offers a useful parallel.
When the pandemic struck, virtually every credit model in existence became unreliable overnight. The historical data on which those models were built had no analogue for an economy in sudden, government-mandated suspension. The models were not wrong — they were simply operating in conditions for which they had never been designed.
The industry’s response was instructive: rather than discarding the models or waiting for new training data that would take years to accumulate, institutions deployed overlays — expert-judgment adjustments applied on top of model outputs, anchored in evidence and documented with rigour. These overlays were not guesswork. They were structured responses to a structured problem: how do you adjust a well-built system when the environment it was built to reflect no longer exists?
I would argue we are navigating the same waters in AML today.
The threat landscape has changed substantially since the days when scenario-based detection tools were first designed. The mechanisms of financial crime — the channels, the instruments, the actors, the typologies — have evolved faster than most legacy systems can track. Static scenarios calibrated to yesterday’s threats will increasingly miss tomorrow’s risks. The question is not whether to adjust, but how.
And here the credit world offers another signal worth watching. The most forward-looking credit practitioners are now moving beyond traditional financial data entirely — incorporating what some call psychographic data: behavioural signals that reflect how individuals actually make decisions, manage obligations, and navigate risk. Some of this is in early stages. But the direction of travel is clear: the data universe available to assess creditworthiness is expanding dramatically, and the institutions that learn to use it responsibly will have a material edge.
Leapfrogging the Legacy Constraint
There is a powerful analogy here from a very different domain.
In Kenya and across much of sub-Saharan Africa, impact investors discovered that mobile phone usage patterns could predict loan repayment behaviour with remarkable accuracy. A borrower who uses multiple cell towers, makes frequent shorter calls, and exhibits varied network behaviour turns out to be a better credit risk than one whose patterns are narrow and static. The insight was counterintuitive — and completely invisible to any traditional credit scoring system.
But the broader lesson is even more striking. Africa, having never built out the dense infrastructure of telephone poles and landlines that defined telecommunications in the developed world, was able to leapfrog an entire generation of investment and construction. Mobile technology arrived, and a continent moved directly to it — skipping the intermediate stage entirely.
I find myself thinking about this in the context of AML infrastructure.
Many institutions today are sitting on legacy transaction monitoring systems that represent years of investment, configuration, and institutional knowledge. Replacing them wholesale is costly, disruptive, and slow. But the leapfrog model suggests a different question: what if we do not need to replace them at all?
What if we can overlay them — intelligently, using the tools now available — and achieve the performance uplift without the construction cost?
The digital twin concept is precisely this kind of overlay architecture. By replicating a legacy system in a controlled simulation environment, institutions can apply AI-driven optimisation to scenarios, thresholds, and detection logic — testing, tuning, and improving performance without touching the production system. The legacy investment is preserved. The intelligence applied on top of it is new.
Agentic AI
Agentic AI takes this further still. Rather than simply optimising existing scenarios, agentic systems can autonomously generate and test new detection hypotheses — exploring the space of possible typologies, identifying coverage gaps, and proposing calibration changes with evidence to support them. This is not incremental improvement. It is a fundamentally different relationship between the system and its environment.
The analogy to mobile banking in Kenya is not merely rhetorical. It is a genuine strategic option: skip the expensive replacement cycle, and deploy the intelligence layer instead.
In financial crime, how data, behaviour, and detection logic interact across an entire system are precisely the interactions that determine whether risk is detected or missed.
So the question becomes: how do we test systems in conditions that reflect reality… without introducing risk into production?
This is where simulation in digital twin environments becomes essential — not as an incremental improvement, but as a foundational capability for understanding system behaviour.
A digital twin allows institutions to replicate their transaction monitoring systems — including underlying data, detection logic, scenarios, and thresholds — in a controlled and scalable environment that reflects real-world complexity.
Within such an environment, institutions can move beyond isolated testing toward system-wide understanding. They can replay historical activity not only to review alerts, but to understand precisely where risk was not detected. They can assess how effectively their detection logic covers different behaviours and typologies across the full population. Can test both existing and emerging risks against their current controls, and identify where gaps in coverage exist. And they can simulate changes to scenarios, machine learning models, thresholds, or detection logic. Measuring the impact on detection outcomes before any change reaches production.
Controlled environment
This creates something fundamentally new: a controlled environment for continuous improvement, where systems are no longer tuned primarily through periodic reviews and expert judgment, but through measurable impact, iterative testing, and evidence-based optimisation.
And this fundamentally changes the supervisory dialogue.
Not: Here are our scenarios.
But: Here is our coverage.
Here is what we detect — and what we do not.
Here is how our system behaves under different conditions.
And here is how we continuously improve it, with evidence.
Here is how we produce descriptions of suspicious activity that enable law enforcement outcomes.
In that sense, AML is becoming an engineering discipline.
While sandbox environments provide the testing infrastructure, using digital twins for simulation provides the ability to observe, measure, and improve system performance at scale. Leading institutions will not only be able to quantify effectiveness. They will be able to understand it, challenge it, and continuously enhance it.
They will not fear supervisory scrutiny, they will be prepared for it.
From Compliance to Capability
When I look back at those early days after September 11 — the urgency, the improvisation, the determination to build something that worked — I see the same qualities that will define the leaders of this next chapter.
We built the foundations with the tools we had. The environment those tools were designed to detect has since evolved beyond recognition. The obligation to use them well has not changed.
Our goal should be clear: reduce the enormous waste of time and investment consumed by false positives, and redeploy that capacity toward better-calibrated, more adaptive detection methods. Methods capable of keeping the financial system genuinely safe using technological advances available today.
In a world fraught with geopolitical risk and new threat actors, that means stepping up through coordination, a common and robust framework, and the elimination of cross-border regulatory arbitrage. AMLA, to my eye, is on track to deliver exactly that foundation.
But frameworks alone are not enough. We must also move with urgency on the technology side.
The lesson of mobile banking in Africa is available to us here: we do not need to tear down and rebuild. We can leapfrog. Digital twins and agentic AI optimisation allow us to apply new intelligence to existing infrastructure. Preserving the investment already made while dramatically improving what it can do. That is not a compromise. It is a strategic choice.
Opportunity
This is not just a compliance burden. It is a strategic opportunity — the chance to move from asking ‘Are we compliant?’ to asking ‘Can we demonstrate, continuously, that our systems are effective?’
What matters is not how well a system is described, but how it behaves under pressure — and whether that behaviour can be demonstrated.
Compliance is no longer defined by frameworks. It is defined by measurable outcomes.
Simulation — and the digital twin environments that enable it — is becoming core infrastructure.
The institutions that move early will not just meet expectations.
They will define them.
Thank you.







