An Introduction to Power Query
CPAs need to acquire the skills needed to leverage current technologies to perform these repetitive data transformations faster and with little to no human effort (“Meeting the Challenge of Artificial Intelligence,” Paul Lin and Tom Hazelbaker, The CPA Journal, June 2019, https://bit.ly/2XBlhSk). In acquiring these skills, CPAs should consider using Microsoft Excel and its capabilities to connect to data and to automate many of the common data manipulations being done manually today by using the Get and Transform feature from the Data tab on the ribbon, which is also known as Power Query.
The Get and Transform feature is a tool in a variety of Microsoft products, including Excel. It was first introduced in Excel as an add-in in 2013 called Power Query; once enabled, it was available for the 2010 and 2013 versions through a Power Query tab. The release of Excel 2016 included Power Query as an integrated component rather than an add-in and is available from the Get and Transform Data ribbon group on the Data tab under the Get Data dropdown. This functionality has continued in both name and location in recent versions of Excel; however, you will see references to the name Power Query when working with this functionality. The name Power Query will be used for the remainder of this article.
Power Query can provide CPAs with several advantages, summarized in Exhibit 1. The first advantage is that it enables users to connect to a data source and then use this one tool to extract, transform, and load the data (ETL) into a worksheet for further analysis. This greatly reduces the need to open an application, export the data to a format that can be opened in Excel, import the data, and then perform a variety of data manipulations or transformations in order to make the data more suitable for the specific analysis being conducted. With this tool, the ETL process can be replicated in order to obtain data from different sources while placing the results in different worksheets in the same workbook with the goal of using data from all sources to perform an analysis.
Advantages of Power Query
There are numerous available data connections; Excel categorizes them as those from files, databases, Azure, online services, and other sources. The specific connections available are determined by the version of Excel being used. Exhibit 2 lists the connections that are available in the various versions of Excel 2019. The Advanced column contains a list of connections available in addition to the Basic connections, and this may change as new versions are released. Although this list may not contain one’s specific accounting application, many applications provide an Open Database Connectivity (ODBC) driver that will allow access to accounting data from other applications, such as Power Query. For example, QuickBooks provides an ODBC driver in its Enterprise version, which can also be purchased from a third party for other versions. With this driver, CPAs can use Power Query to establish an automated ETL process to bring transactional or other data into Excel and then use other functionality in Excel to perform an analysis or visualization.
Power Query Data Source Connections Based on Excel 2019
Another advantage of Power Query is that it can connect to data sources with rows or records of data that exceed the maximum allowable in a worksheet (i.e., 1,048,576 rows). Although they can be accessed and manipulated in Power Query, if results from Power Query are loaded back into a worksheet, the worksheet maximum is still a limitation; however, Power Query offers a few strategies to overcome this limitation. One of them is to establish only a connection to the Power Query results and not load any data directly into a worksheet. This would allow the results to be available when working with other queries (e.g., merging or appending data), or directly in an analysis using an Excel feature capable of accessing a data connection (such as a PivotTable or PivotChart). Another option is to load the results directly into a PivotTable or PivotChart where they can be filtered and further summarized as needed, which can reduce the number of rows to meet the worksheet limitation. The results can also be added to Excel’s Data Model, which enables connections between tables to be made like those in relational databases so that data from multiple tables can be accessible in an analysis.
At the heart of this Excel feature is the Power Query Editor, which is used to manipulate or transform data into a form more useful for subsequent analysis (e.g., add a column with a new calculation). These transformations are the types of tasks that are normally performed manually, but can be automated using the Editor; the available transformations can be accessed via a ribbon in the Editor. Each transformation is converted into a query language entitled with the capital letter M, simply referred to as M code. This is similar to what is done in Microsoft Access, which generates an SQL expression or code based on the user’s interaction with the user interface, except that the Editor can create multiple expressions or applied steps whose results are linked together (i.e., the results of one expression become the input for the next expression). You can view the M code in the Editor, where it can be manually modified if needed. CPAs can find an advantage in understanding how to make simple modifications to the underlying M code, which may be needed in order to accomplish the transformation needed or to provide a more robust solution that is more resistant to potential errors.
Exhibit 3 lists some of the common transformations that users can apply to their data. Splitting data in a column into one or more other columns based on a delimiter or a specific number of characters is available as a transformation. In addition, the Editor offers additional ways to split data by a change in case (e.g., lowercase to uppercase) or a change from a digit to a non-digit. Grouping data based on the contents in one or more columns will allow the calculation of a variety of common aggregations (e.g., sum, average, count) for each group. The Editor will automatically assign a data type for each column of data, but this can be modified if the data type is incorrect (e.g., change text data type to number data type). Once queries are created, an additional query can be used to merge the results from two or more queries based on common data that exists between them (i.e., similar to using a VLOOKUP to combine related data from different worksheets). In addition, data can be appended from one query to another, which is usually done on query results that have an identical structure but different data (e.g., taking monthly worksheets and combining them into one yearly worksheet).
Removing columns is rather straightforward, but the removal of rows allows for a variety of options, including the removal of a user defined number of top or bottom rows, duplicates, blank rows, alternating rows, and rows with errors. Text manipulations mimic many of Excel’s text functions that trim, change case, and extract the first set or last set of characters or characters inside text. Adding a column allows the creation of a formula to calculate the results to populate the new column. Finally, the Editor can determine several aspects of a given date (e.g., year, month, week, day) as well as use the current date to determine the number of days since a date given in the data (e.g., an age).
A further advantage allows the query results, and potentially any analysis built on those results, to be updated with a click of the refresh button or with a modification to the properties of the query so that it can automatically update itself at specific time intervals. With the connection to the source established, a refresh can extract the data from the source again and replicate all of the transformations. Any additional analyses built upon the query results can also be refreshed, but may require another click of the refresh button. This will give users a greater capability to make decisions based on more current information.
The following simplified example will provide a visual representation of how Power Query can be used. The data file used as a source for this example is a Microsoft Access database that contains one table with 37,414 publicly available records of checkbook data for a city government. The overall task will be to use Power Query to access the data in the database (i.e., a simple representation of any source of data) and do the following: filter the data to only transactions in the Transportation and Drainage department, correct the format of the check date, remove selected columns of data, and load the results into a worksheet. Follow the steps below to accomplish these tasks.
Download Database—The Access Database can be downloaded from the following URL: https://nysscpa.org/docs/default-source/cpaj-download/brcheckbook.accdb .
Open Excel and Connect to Database—Open a new workbook. In a new worksheet, click on Data → Get Data → From Database →From Microsoft Access Database, as shown in Exhibit 4.
Select Data—Select the table in the left-hand navigation pane to see a preview of the data, then select Transform Data, as shown in Exhibit 5.
Filter Rows to include only the Transportation and Drainage department. From the dropdown arrow to the right of the Department column, uncheck the Select All option, then scroll down to select the Transportation and Drainage department. See Exhibit 6.
Correct the Check Date. The Check Date column has a normally formatted date, followed by the letter T and a sequence of zeros and decimal points. The transformation that is needed is to remove the letter T and the digits to the right. Click on the heading Check Date to select the column and then click on Home → Split Column → By Number of Characters. See Exhibit 7.
Enter the number 10 in the dialog window that appears and select the option “Once, as far left as possible.” Click OK to perform the split. See Exhibit 8.
This will result in one column with the correct date and a second column with the other characters. Double-click on the Check Date.1 heading and rename it Check Date. See Exhibit 9.
Remove Selected Columns. Assume that only a select few data columns will be used for analysis; the remaining columns should be omitted. Select Home → Choose Columns →Choose Columns. In the window that appears, uncheck Select All Columns at the top of the window, then select the following columns: Department, Check Number, Check Date, Amount, Description, and Vendor. Click the OK button to perform this step. See Exhibit 10.
Return Results—the final step is to return the results to the workbook. Select Home → Close & Load →Close & Load To… In the Import Data window that appears, select Table and click OK. See Exhibit 11.
Power Query will now generate the results for all of the data (not just the sample visible in the editor) and will return it to the workbook. See Exhibit 12.
Other tools in Excel (e.g., PivotTable, Chart) can be used to perform an analysis on the data. As the database is updated with new or modified data, this new data can be reflected in the worksheet as well by clicking on Data → Refresh All → Refresh All. If a PivotTable is connected to the data when the refresh occurs, the PivotTable will refresh before the connection to the database has completed its update to the worksheet, resulting in updated data in the worksheet but not in the Pivot-Table. As a result, the PivotTable will need to be refreshed again.
Although the example above shows a simple way of using this tool, several resources can be used to develop additional skills. An Internet search reveals many available massive open online courses (MOOC) as well as courses from Udemy, LinkedIn Learning, and other online providers. Exhibit 13 lists two books on Power Query that can be used, as well as a website that has links to other books, forums, blogs, and other online resources about Power Query. If you want to know more about M code, the definitive reference comes from Microsoft.
Resources to Learn More about Power Query