An Ontology for Accountability: Defining What Data Quality Means in Blockchain Analytics

The field of blockchain analytics, once a niche discipline reserved for specialized researchers, has transformed into a cornerstone of global financial integrity and law enforcement. As digital assets become increasingly integrated into the traditional financial system, the methods used to track, categorize, and attribute on-chain activity have come under unprecedented scrutiny. Jan Moller, the Chief…

 Avatar

by

8 minutes

Read Time

The field of blockchain analytics, once a niche discipline reserved for specialized researchers, has transformed into a cornerstone of global financial integrity and law enforcement. As digital assets become increasingly integrated into the traditional financial system, the methods used to track, categorize, and attribute on-chain activity have come under unprecedented scrutiny. Jan Moller, the Chief Scientist at Chainalysis, recently underscored the critical necessity for academic-grade rigor in this sector, warning that the "erosion of standards" among newer industry entrants poses a significant risk to legal due process and the livelihoods of individuals worldwide.

At the heart of this advocacy is the release of a formal "ontology"—a comprehensive framework designed to standardize the vocabulary and evidentiary requirements of blockchain forensics. This move aims to transition the industry from a "black box" approach, where machine learning outputs are often accepted without question, to a transparent, scientific discipline where every claim is backed by reproducible proof. The initiative follows years of internal development at Chainalysis, where the methodology was forged through high-stakes litigation and collaboration with international academic institutions.

The High Stakes of Misidentification: A Case Study in Methodology

The urgency for standardized rigor is best illustrated by a critical discrepancy encountered by Chainalysis investigators several years ago. A client presented a case where two different blockchain analytics tools provided conflicting labels for a single deposit address. While Chainalysis identified the address as belonging to a gambling service, a competing tool flagged the same address as being associated with Child Sexual Abuse Material (CSAM).

The implications of such a discrepancy are profound. A "gambling" label might suggest a breach of terms of service or a minor regulatory infraction, whereas a "CSAM" label triggers immediate law enforcement intervention, potential criminal prosecution, and permanent social ostracization. Upon technical review, it was determined that the competing tool had relied on statistical pattern matching—observing the size and frequency of small, regular payments—without accounting for the underlying structural evidence of the transactions.

"Small-time gamblers and something far darker can produce similar transaction footprints when viewed through a narrow enough lens," Moller noted regarding the incident. He characterized the reliance on mere appearance over deterministic evidence as a "reckless foundation" for any analytical tool. This event served as a catalyst for Chainalysis to formalize its internal standards into a public-facing ontology, ensuring that investigators and compliance officers understand the difference between statistical probability and forensic certainty.

Chronology of Blockchain Forensics and the Path to Admissibility

The evolution of blockchain analytics has moved through several distinct phases, from the early days of Bitcoin to the current era of complex decentralized finance (DeFi) and smart contracts.

  1. The Foundational Era (2009–2014): In the years following Bitcoin’s inception, the blockchain was largely viewed as anonymous. Analytics was restricted to academic papers exploring the theoretical traceability of the UTXO (Unspent Transaction Output) model.
  2. The Emergence of Professional Tools (2014–2018): Companies like Chainalysis were founded to provide law enforcement with the tools necessary to investigate early darknet markets, such as the Silk Road. During this period, heuristics—rules of thumb for identifying clusters of addresses—began to be formalized.
  3. The Regulatory Shift (2018–2021): Global bodies like the Financial Action Task Force (FATF) introduced the "Travel Rule" and other compliance requirements, forcing exchanges to adopt analytics tools for Anti-Money Laundering (AML) purposes.
  4. The Era of Legal Scrutiny (2021–Present): As blockchain evidence became central to high-profile criminal cases, the underlying science faced challenges in court. This culminated in the landmark case of United States v. Sterlingov.

In United States v. Sterlingov, the defendant, Roman Sterlingov, was accused of operating Bitcoin Fog, one of the longest-running bitcoin mixers. The defense challenged the admissibility of Chainalysis’s Reactor software, arguing that the heuristics used to link Sterlingov to the service were not scientifically sound. However, after an extensive Daubert hearing—a legal proceeding used to determine the validity and relevance of expert testimony—the court found the methodology to be admissible across all criteria. This ruling provided a legal "gold standard" for blockchain forensics, confirming that when applied with rigor, on-chain analysis meets the requirements for federal evidence.

Supporting Data and the Importance of Empirical Validation

The credibility of blockchain analytics does not rely solely on internal corporate claims but on external, peer-reviewed validation. One of the most significant milestones in this regard was an independent study conducted by researchers at Delft University in the Netherlands, in collaboration with law enforcement agencies.

The study represented the only empirical validation of attribution accuracy against "ground truth" data—information obtained directly from seized server infrastructure. While the study confirmed the high accuracy of the Chainalysis methodology, it also highlighted the risks of less rigorous approaches. Notably, Moller revealed that another service provider in the industry attempted to suppress the publication of the Delft study through legal threats, a move he described as antithetical to the scientific method.

Data from the Chainalysis 2024 Crypto Crime Report suggests that while the total volume of illicit transactions has fluctuated, the complexity of obfuscation techniques is rising. In 2023, illicit addresses sent over $24 billion in cryptocurrency, with an increasing shift toward "chain-hopping" and the use of cross-chain bridges. This rising complexity necessitates a move away from simple pattern matching toward a more robust, tiered analytical framework.

The New Ontology: A Two-Tiered Approach to Forensics

To address the growing "vocabulary gap" between technical experts and the investigators who use their data, the newly published ontology defines two distinct layers of analysis. Each layer carries a different evidentiary weight and requires a different type of rigor.

Tier 1: The Structural Layer (Address Clustering)

This layer focuses on determining whether multiple blockchain addresses are controlled by the same entity. According to the ontology, this must meet the standard of "structural soundness."

  • Deterministic: The analysis must be based on the hard-coded logic of the blockchain protocol.
  • Reproducible: Any independent analyst with the same data should arrive at the same conclusion.
  • Auditable: The logic must be transparent, with known and documented failure modes.

Tier 2: The Attribution Layer (Entity Identification)

This layer involves linking a cluster of addresses to a real-world entity, such as an exchange, a darknet market, or a specific individual. This follows a "structured confidence framework."

  • Source Characterization: Documenting the reliability of the intelligence used to identify the entity.
  • Reasoning Requirements: Providing a clear narrative of how the on-chain data connects to the off-chain identity.

By separating these two tiers, the ontology prevents the conflation of "what the blockchain says" with "who we think did it." This distinction is vital for prosecutors who must present evidence in court that can withstand adversarial cross-examination.

Industry Reactions and Broader Implications

The call for higher standards has resonated across the fintech and legal sectors. Legal experts suggest that as more defendants challenge blockchain evidence, the "black box" era of analytics is likely coming to an end.

"Courts are becoming more sophisticated," said one former federal prosecutor. "They are no longer satisfied with a software tool simply saying ‘this is a high-risk address.’ They want to know the ‘why’ and the ‘how.’ The industry must move toward the kind of standards we see in DNA forensics or ballistics."

For financial institutions, the implications are equally significant. Compliance teams at major banks and crypto exchanges rely on these tools to freeze assets and file Suspicious Activity Reports (SARs). If the underlying data is flawed, these institutions face significant "de-risking" challenges, potentially cutting off legitimate users from the financial system due to false positives.

Conclusion: Toward a Mature Scientific Discipline

The publication of the Chainalysis ontology is intended as an invitation to the broader industry—including competitors, regulators, and academics—to build a collective standard. As the blockchain ecosystem continues to evolve with the rise of Layer 2 solutions and privacy-enhancing technologies, the margin for error in analytics will only narrow.

Jan Moller’s transition from academia to Chief Scientist reflects a broader trend in the technology sector: the recognition that "good enough" is an insufficient standard when the output of a system directly impacts human rights and legal outcomes. By applying the rigor of distributed systems research to the messy, real-world data of the blockchain, the goal is to create a field that is not only technologically advanced but also ethically and legally robust.

"We built this to be questioned," Moller stated, emphasizing that the strength of a scientific discipline lies in its ability to withstand skepticism. As the industry moves forward, the adoption of transparent, peer-reviewed standards will be the determining factor in whether blockchain analytics remains a trusted tool for justice or becomes a liability for the very systems it seeks to protect.

About the Author

About the Author

Easy WordPress Websites Builder: Versatile Demos for Blogs, News, eCommerce and More – One-Click Import, No Coding! 1000+ Ready-made Templates for Stunning Newspaper, Magazine, Blog, and Publishing Websites.

BlockSpare — News, Magazine and Blog Addons for (Gutenberg) Block Editor

Search the Archives

Access over the years of investigative journalism and breaking reports