Excel is unquestionably the standard data analysis application used in business today. Although analytical tools such as Tableau, Power BI, and others are gaining market share, Excel remains the ubiquitous tool for analyzing financial and other numerical data. The capabilities of Excel have been expanding rapidly, particularly with Excel 2016 and more recent versions.
In this article, the author demonstrates a method for easily keeping analyses up-to-date with the most current information. Power Query, data models, and PivotTables are used in this demonstration.
Advantages of Power Query
Power Query (called Get & Transform Data in earlier versions of Excel) can be used to connect to external data, then load a query into Excel to create charts and reports. The data can be updated periodically to ensure the analyses are complete and up-to-date.
A data model provides a way to organize tables and formulas that can be used in PivotTables and PivotCharts. The data model capability comes with Excel 2016 and higher (for Windows); it was available as the Power Pivot add-in for earlier versions of Excel.
Using Power Query, building PivotTables from a data model rather than a single Excel table offers numerous advantages, including the following:
- ▪ Direct connections to data sources (instead having to copy/paste data into a worksheet) that are not limited to 1,048,576 rows of data
- ▪ Connections to a variety of data sources in a single model
- ▪ Get & Transform queries to clean the data before it is used in Excel
- ▪ Creation of PivotTables and PivotCharts that use fields from multiple tables
- ▪ More sophisticated formulas can be written in the Data Analysis Expressions (DAX) language than those available in traditional PivotTables.
- ▪ Once built, reports can be easily refreshed in subsequent periods without repeating all the model-building steps.
In this article, the author demonstrates how a Power Query connection to a folder, rather than an individual file, can automate the process of keeping analyses accurate and up-to-date. (This article uses Excel 365, but the process is largely the same for Excel 2016 and newer.) The sample dataset consists of two comma-separated values (csv) files (11_2021_data.csv (link) and 12_2021_data. csv (link)) that can be downloaded to start this example.
To get started, download the two data files, 11_2021_data. csv and 12_2021_data.csv, and move them directly to your PC’s desktop environment. These files represent the shipping information of a hypothetical company for November and December 2021, respectively. Create a new folder on the desktop, name it datafiles, and drag the 11_2021_data. csv file into the folder. December’s data will be used later in this example.
It is important to note three critical items for this process to work: First, the data files, in this case .csv files, must all be of the same structure. In other words, each .csv file must contain header information in row 1 and the columns must be in the same order in each file. Second, the name or location of the folder containing the .csv files cannot be changed. Third, the periodic updates must not contain overlapping data; in other words, if data files are received on a weekly basis, those files cannot be accumulations of multiple weeks.
Get the First Dataset with Power Query
The first step is to open a blank Excel workbook and save it as shipping_analysis.xlsx on the desktop. Two tasks need to be completed for this exercise. First, use Power Query to connect to data in the datafiles folder; second, insert a simple Pivot-Table to serve as a demonstration of keeping the analysis up-to-date.
Power Query in Excel can create connections to a wide variety of sources of data. Creating a connection to a folder, rather than to an individual file, allows Power Query to evaluate the contents of the folder and automatically include new data into the model.
With shipping_analysis.xlsx open, navigate to the Data ribbon. In the Get & Transform Data group, select Get Data > From File > From Folder as illustrated in Exhibit 1.
Navigate to the datafiles folder and select Open. See Exhibit 2 and note that no files are displayed because the application is not targeting a specific file, but rather the entire data-files folder.
In the Browse dialog box, select Open to see the 11_2021_data.csv file, as shown in Exhibit 3. Select Combine > Combine & Load To. If some standard transformations need to be completed, this is where Combine > Combine & Transform Data would be selected instead. All transformations made here will be replicated across any new files added to the folder. No transformations are needed for this example at this time.
Power Query will now evaluate the query and head to the next step to visually verify that the data is being processed correctly, as shown in Exhibit 4.
If satisfied with the result, select OK to move onto the final step of connecting data to the workbook. In the Import Data dialog box (Exhibit 5), the author prefers to select Only Create Connection and add this data to the Data Model; select those options and click OK. Creating a connection only means the data is available for analysis but is not duplicated on a worksheet within the workbook. This will result in a smaller workbook file and faster processing of new data; this also allows users to go beyond the normal 1,048,576-row limitation in a worksheet. Adding the data to the data model, while not strictly necessary for this demonstration, is required if you wish to use more than one table of data in Power Pivot.
The resulting workbook will look something like the one shown in Exhibit 6, with several Queries & Connections in the right-hand panel. The important query here is in the Other Queries folder showing the datafiles folder and 459 rows loaded. Save the file before moving on to the next steps.
Create a Simple PivotTable
The second task in this exercise is to create a simple PivotTable summarizing the count of items by ship mode. Quantity and Ship Mode are fields in the .csv data files. Locate the cursor on the blank worksheet—the author generally starts in cell B2. Go to the Insert ribbon; in the Tables group, select PivotTable > From Data Model. In the resulting Create PivotTable dialog box, leave the default configuration as is and select OK. With the PivotTable Fields open and datafiles expanded, drag and drop Quantity to the Values area and Ship Mode to the Rows area. Exhibit 7 shows the resulting PivotTable, with Sum of Quantity by Ship Mode and a Grand Total of 1,840 items shipped. This is a good point to save the file again.
The Magic of Power Query
At this point in the example, only the November data is in the analysis. The benefit of having created a query to the folder, instead of to a specific file, now becomes apparent. Temporarily move the shipping_analysis.xlsx file to the side and drag and drop the 12_2021_data.csv file into the datafiles folder.
To update the Pivot Table, place the cursor in any cell in the PivotTable, select the PivotTable Analyze ribbon, then select Data Group > Refresh. Excel and Power Query will take a few moments to run the query and add the new data to the analysis. Exhibit 8 shows how the PivotTable is now updated to show a Grand Total of 3,563 items shipped. Each time the shipping_analysis.xlsx file is opened, all that is needed is to select the Refresh button to ensure the PivotTable has all the data in the datafiles folder.
The process of updating the PivotTable can be automated as well. With the cursor located anywhere in the Pivot-Table, select PivotTable Analyze > PivotTable group > Options > Data > Refresh data when opening the file > OK. Exhibit 9 shows the PivotTable options dialog box. By now, the manual refresh process is unnecessary, as both the query and the PivotTable will be updated when the shipping_analysis.xlsx file is opened.
Microsoft Excel is the ubiquitous tool for analyzing business data today. Its capabilities have been expanding rapidly, particularly with Excel 2016 and subsequent versions. Although many CPAs are familiar with PivotTables, Power Query and data models represent a significant step up in Excel’s capabilities. The example above demonstrates how to create a query to a folder, instead of to an individual file, and add the data to a data model. Doing so eliminates the need for copying and pasting data or completing other manipulations to include periodic updates to a dataset. Although creating a Power Query connection to a folder takes a bit of time up front, this one-time process will save many hours in the future, as well as drastically reduce the potential for error.