The Research Behind Our Compliance Scenario Generator
When we started building a synthetic compliance scenario generator into our stablecoin platform, the hard problem wasn’t the rule packs. MiCA, OFAC, and reserve-composition rules have been stable for a year. The hard problem was producing test data that could walk into a regulator’s office without embarrassing anyone — reproducible, auditable, and diverse enough to actually stress a rule pack rather than happily pass it.
We didn’t invent the answer. We adapted it from Davidson, Seguin, Bacis, Ilharco, and Harkous — Reasoning-Driven Synthetic Data Generation and Evaluation, published in TMLR in March. The paper introduces Simula: a framework that generates synthetic data by first mapping a target domain into explicit taxonomies, then running an agentic generator-and-critic loop against those taxonomies to produce diverse, complex, reproducible examples.
The shape transfers cleanly to compliance. Financial-crime typologies (layered remittance, sanctioned-counterparty proximity, reserve-drift patterns) sit where Simula’s taxonomy nodes sit. The generator produces synthetic counterparty graphs and transaction sequences. The critic rejects scenarios that don’t exhibit the typology they were meant to test.
The pipeline
What we added
Simula’s contribution is the methodological spine — reasoning-first taxonomy expansion, agentic refinement, critic-gated quality. What we layered on for a regulator-facing deployment:
- Provenance fingerprint. Every scenario carries a hash over its manifest, rule-pack version, and assumption bundle. A reviewer can prove today’s scenario is the same one an officer ran last quarter with a string match.
- Append-only audit ledger. Dossier exports write a row to an org-scoped Supabase table where UPDATE and DELETE are hard-blocked at the database layer. Reviewers see the full run history.
- Synthetic-only enforcement. The schema rejects any attempt to mix synthetic scenarios with live customer records. This is a compliance constraint, not a data-science one.
The honest caveat
Reasoning-first generation only produces defensible data if the underlying reasoning is actually reasonable. A well-structured taxonomy applied to a weak model produces well-structured nonsense. Davidson et al. benchmark this empirically across several datasets. Any platform deploying the approach for a regulated domain has to do the same work in that domain — which is why our engine-disposition scoring sits in a separate eval harness, and why the first live corridor will run synthetic and real pilot data side-by-side rather than swapping one for the other.