The concept behind data mining, especially in
the case of identifying potentially fraudulent
transactions, is to identify transactions that
exhibit certain characteristics, and the ability
to drill down into those transactions that are high-risk areas.

Data mining is also not limited to the analysis of one single data file, but should draw information together from a number of sources. For example, in an analysis or investigation of procurement, data from the accounts payable invoice history file, cash payment systems, standing data (such as supplier master file) and purchase order system may all be analysed. (Often standing data, such as details of suppliers or customers, is referred to as static data or information and histories of transactions as dynamic data. These terms will be used throughout this chapter.) Analysis will not only be conducted within these data files on a stand-alone basis, but there will also be extensive cross matching of details. We therefore include "data matching" techniques under the broader term data mining.

An effective automated detection routine is not restricted to simply obtaining a download of data and then conducting computerised analysis. If it is to be effective, the proportion of time (and cost) dedicated to the "number crunch", should, in the majority of cases, form but a part of the project. Equal resource should be devoted to understanding the business process or unit, profiling the control weaknesses and likely frauds, developing appropriate search criteria, selecting the appropriate data mining tool, assessing internal and externally held data sources, running the automated testing and the proper investigation of findings.

Ultimately, there are two types approaches in using data mining techniques to prevent, detect and investigate fraud: proactive and reactive.

The objective of proactive data mining is to identify who or what can cause a particular set of transactions or events that typify fraud before they happen. Depending on the review, this may relate to a customer, supplier, employee, trader, financial counter-party, till operator etc. In order to develop an appropriate detection system, one should consider the different types of fraud that could exist and formulate an appropriate "Fraud Theory" i.e. how a fraud might occur. This may include an assessment of the possible opponents to the organisation, the methods of operation, the assets at risk, the presence or absence of controls and the indicators that a fraud has taken place.

Like most empirical sciences, the objectives are to prove or disprove this theory by conducting a number of experiments. In this context, the experiments (data mining tests) are designed to identify patterns of loss, manipulation, or deliberate falsification of data held within the closed environment of corporate information systems.

In most organisations, one area where it is almost guaranteed to discover fraud is the purchasing and procurement function. The controls over suppliers and the awarding of contracts are frequently abused and purchasing agents are notoriously susceptible to a case of scotch at Christmas or to their house being repainted.

The collation, analysis and accurate retrieval of intelligence are fundamental requirements in the successful detection and investigation of fraud. Reactive data mining is a responsive systematic approach in effectively fulfilling this requirement by providing an investigations team with the appropriate means to identify the mechanics behind a suspected case of fraud already in progress, and the perpetrators of that fraud.

What this means, is that data mining can be used by fraud examiners to focus on fraudulent activities that have already been carried out against a victim, and deals with such activities in a way that limits further damage to the victim, whilst assisting a victim in potentially recovering any losses in the law courts.

If an organisation suspects it is the victim of fraud: then how would that organisation go about not only detecting the Modus Operandi ("MO"), but preventing further fraudulent activities, with the ultimate goal of bringing any perpetrator(s) to court?

Typically, once fraud is suspected or has been discovered by a victim and reported, fraud examiners may begin by seizing documentary or electronic evidence from a vast range of different known sources and locations of the fraudster(s) as a first step in identifying any evidence of fraudulent activity. However, if thousands, or even hundreds of thousands of documents, and/or several gigabytes of data have been seized as part of an investigation, then an investigation team can easily find themselves with a time consuming task of sifting through the exhibits to find any "smoking gun" evidence necessary for a successful prosecution on behalf of the victim.

As such, data mining is invaluable in effectively building a centralised intelligence database of all seized evidence (both documentary and electronic), so that the database can be used to interrogate the different sources of evidence for identifying potential patterns, or indicators, of fraudulent activities. From this, evidence packs can be produced based on reported findings from the central intelligence database, and suspected perpetrators can be interviewed, which may lead to court litigation and prosecution.

However, there are several key principles that need to be considered when using data mining techniques and tools to assist in reactive investigations. It is vitally important that:

Data mining techniques are equally effective when applied in a reactive situation, for example, during an ongoing criminal or civil investigation. In the initial stages of an investigation, it is vital to secure all the available sources of evidence. Evidence may, but not limited to, include:


One of the advantages of using data mining techniques in a reactive investigation is that most electronic information may be covertly gathered. As with all investigations, it is vital to preserve a clear chain of evidence for subsequent production in court. Since electronic data may easily be copied and altered, procedures must exist that can clearly demonstrate that the original data has not been altered in any way, and that all analysis took place on copies of the original data files. Whatever software tools are used, they must contain accurate and reliable audit trails that can be independently verified. Data mining software has the capabilities to import data from different sources, and therefore, create a composite database of financial transactions, suspects and witnesses, or physical events. Once the covert stage of the investigation has been completed, additional evidence, including vast quantities of documents, are added to the investigation.


Copyright © Investigative Data Mining Limited 2006