In Brief

Thank you for reading this post, don't forget to subscribe!

Data analytic software has become a ubiquitous tool for both auditors and forensic accountants. Most practices, however, still confine themselves to the use of traditional structure data analysis tools with limited visual capabilities, which leaves out swathes of relevant information and limits options in its presentation. The authors detail the benefits of using visual and text analytics to explore and visualize structured data in more creative ways, as well as how unstructured data can locate previously hidden red flags for material misstatements and fraud.

* * *

Data analytic software designed to assist audit and forensic professionals in detecting material mis-statements and fraud has dramatically improved its capabilities in recent years. Growing numbers of practices have invested substantial resources in training professionals on the use of this software, such as ActiveData for Excel, IDEA, and ACL. The primary benefit of using these tools is that 100% of the “structured” data, such as revenue cycle customer sales orders, bills of lading, sales invoices, and subsequent cash receipt records, can be more efficiently tested for the presence of rules-based red flags, such as duplicate sales invoices and accounts receivable aging anomalies. Manual audit processes are rarely performed to conduct these tests because of the high expense, and testing small samples of the relevant data is less likely to detect rules-based red flags that may be indicative of material fraud schemes. The Sidebar provides a summary of the pros and cons of competing audit analytic software packages.

On the other hand, a picture is worth a thousand words. This is especially true for the estimated 65% of the population who are visual learners (Steven Pesklo, “Using Data Visualization to Uncover Fraud,” speech given at 26th Annual ACFE Global Fraud Conference, June 15, 2015). Visual analytics is an exploratory and iterative process involving the creative and dynamic discovery of potential fraud schemes. It builds on humans’ natural ability to absorb and comprehend greater amounts of information through the use of distinctive patterns, shapes, and shadings than through analysis of numerous columns of numeric data. Instead of creating static, simplistic bar charts and scatter plots and relying on a finite number of embedded rules-based red flags for potential fraud, visual analytics can create customized, multidimensional, or layered graphics, resulting in more granular analyses of all of the structured data. As a result, visual analytic software may more easily uncover otherwise hidden relationships between data elements, enabling the discovery of new fraud red flags. Thus, visual analytic software may be an attractive alternative to IDEA and ACL for auditors of larger, publicly held entities to reduce the risk of undetected material misstatements, including fraud.

Senior management of publicly held entities and their auditors have an increased incentive to employ this software since the SEC announced its increased use of cutting-edge visual analytic software to improve the speed with which it can identify financial statement fraud and audit failures (Mary Jo White, Testimony before the U.S. Senate Appropriations Subcommittee, May 14, 2013). Unfortunately, many audit and forensic practices have not embraced the opportunity to adopt this software due to perceived cost-effectiveness issues. Visual analytics’ intuitive functionality and ability to efficiently conceptualize complex data may nonetheless result in lower training costs than IDEA and ACL, as well as comparable licensing fees.

In addition, unstructured text data—which is not created and stored in a predefined, standardized format and thus to a large extent cannot be analyzed by IDEA, ACL and visual analytic software—has exploded in growth and can provide additional clues as to the existence of material fraud schemes. For example, many individuals appear willing to reveal sensitive and incriminating narrative and pictorial data in supposedly private communications and social network postings that they would not consider disclosing in a financial report or a business meeting. Other types of unstructured data include corporate memos and e-mails, PDF files, social media postings, and audio and video files.

Forensic practices should seriously consider implementing text analytic capabilities. Audit practices also may benefit from an awareness of text analytic capabilities and consider applying them to high-risk engagements. Text analytic licensing costs, however, may be substantially higher than those for IDEA, ACL, and visual analytic software.

IDEA and ACL Software

It is worth noting that IDEA and ACL have recently developed basic unstructured text analytic capabilities, such as the ability to import and analyze PDF and plain text files for rules-based keywords that have been found in the past to be highly correlated with fraud schemes. IDEA’s “looping search” addon and ACL’s core software package offer this capability. Searching massive unstructured databases for such keywords may be a potentially fruitful initial application of text analytics.


ActiveData for Excel


  • Substantially lower licensing cost


  • Imports only Excel files; cannot import ERP relational database files
  • Imports only a limited amount of data
  • No “read-only” capability, only audit logs

ActiveData for Office


  • Lower licensing cost


  • Cannot import ERP relational database files
  • Imports more data than ActiveData for Excel and MS Access, but not as much as IDEA/ACL
  • No “read-only” capability, only audit logs

Microsoft Access


  • Substantially lower licensing cost


  • Cannot import ERP relational database files
  • Imports only a limited amount of data
  • Fewer core analytic functions
  • Not all macros embedded

Arbutus Audit Analytics


  • Spinoff of ACL with fairly comparable costs, data import capabilities, and functionality

IDEA and ACL also have the ability to employ a more advanced approach to detecting incriminating keywords within PDF and plain text files, called concept extraction, through their respective “word list maker” and “scripthub” utilities. Unlike in rules-based searches, users do not supply keywords that the software must search for. Instead, concept extraction asks the software to rank the most frequently occuring words; auditors and forensic accountants then apply professional judgment to the list in order to detect previously unknown incriminating words (Vincent Walden, “Counter-fraud Analytics Using Statistical and Predictive Modeling Techniques,” 26th Annual ACFE Global Fraud Conference, June 14, 2015).

Visual Analytic Software

Recent advances have made visual analytic software from leading providers such as Tableau and Qlik more accessible to auditors and forensic accountants, not only embedding complex macros, but also creating an easy-to-use format. Tableau and Qlik’s visual displays allow for multiple “measures” and “dimensions” to be easily clicked and dragged to a column or row. Measures involve continuous metrics that are normally the focus of the analysis, such as initial sales and purchasing data and related per-unit prices and returns. Dimensions define the granularity of the analysis, such as time period (e.g., yearly, quarterly, monthly, or weekly data), type of product or service, region of the country, subsidiary within the parent organization, or employees previously flagged as potential suspects.

Geospatial analysis.

Leading visual analytic software also has the ability to convert street, city, county, state, and country locations into latitude and longitude coordinates that can enhance an auditor or forensic accountant’s ability to pinpoint the location of likely fraud schemes. Tableau and Qlik automatically assign geographic roles and coordinates to fields with common geographical names; they can also be manually assigned to fields that are not automatically recognized. Financial transactions, asset information, customer data, and contracts are among the records that may contain such references to locations.

Structured data also may be converted into circles, lines, and colors to make them easier to distinguish. For example, differently sized circles may denote cities with different population sizes, while lines may suggest different streets connecting two or more cities. Differently colored circles may represent disparate regions of the country or subsidiaries within a parent company.

Heat maps.

A heat map is a graphical representation of data wherein distinct data elements, such as the perceived level of fraud risk within client business processes based on the likelihood of occur-rence and dollar impact, are represented by different colors. A low fraud likelihood and dollar impact might be colored green, a moderate fraud likelihood and dollar impact might be yellow, and a high fraud likelihood and dollar impact might be red.

An additional layer of data may be added to a heat map to reflect the extent of internal control resources devoted to mitigating fraud risks. Therefore, if processes with high fraud risk are under-controlled and processes with low fraud risk are over-controlled, auditors may offer a value-added recommendation to reallocate scarce control resources by eliminating unnecessary controls within low-risk processes and redirecting these resources to high-risk processes. These important differences may be less distinguishable if the data is presented in numeric columns and rows.

Multiple-source analysis.

Leading visual analytic software also can import and analyze structured data from multiple, complex enterprise resource planning (ERP) relational databases and legacy mainframes, as well as the data cloud.

Text Analytic Software

Leading software from SAS, SAP, and IBM uses natural language processing (NLP) rules and advanced statistics to reveal hidden meanings in virtually any type of unstructured data—including PDF and plain text files, memo explanations of general journal entries, e-mails, social network postings, and publicly available websites—with the objective of identifying corrupt intent that can enhance high-risk audit and forensic engagements (Walden 2015). Through advanced concept extraction and link analysis, text analytics complement visual analytics.

Text analytic software also can analyze structured data in ways that are not possible with IDEA, ACL, or visual analytic software, such as cluster analysis, market segmentation, and nearest neighbor capabilities. Core text analytic results can then be input into visual analytic software and integrated with preexisting visuals to provide a deeper view into potential fraud schemes.

Concept extraction.

Advanced concept extraction involves not only identifying potentially incriminating keywords within unstructured plain text and PDF files, but also potentially incriminating key phrases within unstructured e-mails and social network postings, which may provide more robust insights into the nature of fraud schemes.

Link analysis.

Link analysis involves determining who is talking to whom, about what, and when. It is used to evaluate relationships between individuals and organizations. For example, link analysis can be used to pinpoint the most common recipients of and responders to a primary suspect’s unstructured e-mail and social network communications, thus identifying likely coconspirators. These communications may also include keywords associated with similar past fraud schemes. Although beyond the scope of IDEA’s capabilities, Ernst and Young’s advisory practice, in collaboration with the FBI, recently revealed some of the most common keywords used in e-mail conversations by employees engaging in fraud (Warwick Ashford, “Ernst & Young E-mail Keyword Analysis Identifies Fraudsters,” Computer Weekly, Jan. 7, 2013, These links can be more easily identified through visualizing the results. Fortunately, leading text analytic software also has some embedded visual capabilities.

Leading software uses natural language processing rules and advanced statistics to reveal hidden meanings in virtually any type of unstructured data.

Cluster analysis.

Cluster analysis uses various statistical algorithms to identify groups of similar records and label them according to the group to which they belong. Instead of distinguishing between dependent and independent variables, cluster analysis examines interdependent relationships across all records (Walden 2015). Two practical applications of cluster analysis are market segmentation and nearest neighbor analyses.

Market segmentation analysis can apply clustering techniques to structured socio-demographic data such as income, education, and type of housing, to identify distinct clusters of potential customers who are more likely to purchase certain products and services. From a marketing perspective, more disadvantaged clusters, such as those with lower income and education levels and more multifamily housing, may not receive as many expensive product and service advertisements as more advantaged clusters within a given city, county, or state. Nearest neighbor analysis uses an advanced computer algorithm to measure the distance between dissimilar groups or clusters. The core results of market segmentation and nearest neighbor analyses can then be input into visual analytic software and integrated with pre-existing visuals to provide a deeper view into potential fraud schemes.

Multiple-source analysis.

Leading text analytic software can import and analyze unstructured and structured data from multiple, complex ERP relational databases and legacy mainframes, as well as the data cloud.

The practical implications for fraud detection of visual and text analytic capabilities are depicted in a hypothetical healthcare fraud case involving fraudulent financial reporting in Miami–Dade County, Fla., a hot spot for U.S. health-care fraud. This case study, which is inspired by a Deloitte white paper (Visual Analytics: Revealing Corruption, Fraud, Waste and Abuse, 2011), will accompany this article on the CPA Journal website.

George R. Aldhizer, PhD, CIA, CFE, CITP is PricewaterhouseCoopers Associate Professor of Accountancy at the School of Business at Wake Forest University, Winston-Salem, N.C.