Combining Data from Multiple Sources
- 12/27/2024
- Appending a few tables
- Appending tables on a larger scale
- Summary
With Power Query, you have the power to combine multiple sources to gain lasting results. In this chapter, you will learn how to combine multiple tables in various levels of complexity to gain insights from the combined datasets.
Chapter 2, “Basic Data Preparation Challenges,” discusses common scenarios of badly formatted and unprepared data. Often the data you need to deal with will be well formatted but dispersed over multiple data sources. In such scenarios, the main challenge is to combine the data together to bring the separate silos into a single source of wisdom. Appending multiple tables is the focus of this chapter.
This chapter starts with basic append scenarios for a small number of tables with a common format and moves on to tackle real-life challenges such as ever-changing data sources.
In this chapter, you will
Append tables from a few specified data sources
Append multiple Excel workbooks from a folder
Append multiple worksheets from an Excel workbook
Appending a few tables
One of the most common features in Power Query is Append queries, which is available on the Home tab of the Power Query editor. It enables you to append together a predefined set of tables. For simplicity, this chapter starts with appending two tables that share the same format (that is, they both have the same column names).
Appending two tables
Before we jump to how to append two tables together, let’s review the rationale for appending the two tables together to perform the analysis rather than keeping them separated. Why shouldn’t you just copy and paste the two tables into a new Excel workbook, each one pasted into a separate worksheet, and use a third worksheet for the required computations? Although this approach might be simple for tables with few rows and columns, performing such calculations on larger and ever-changing tables is a completely different ballgame.
Here is how you would approach this challenge in Excel without Power Query: You would need to copy and paste the tables and append them manually into a single table. Whenever the data changed, you would need to copy and paste the table again. Alternatively, you would need to create complex calculations to summarize the ranges in the different worksheets. Whenever the dataset size changed, you would need to update the ranges in your formulas. Fortunately, you can use Power Query to automate this rigorous process. Next time you have new data in your tables, a mere refresh will do the work for you.
Exercise 3-1: Bikes and Accessories example
This exercise involves two tables from the fictional AdventureWorks Cycles company, introduced in Chapter 2. To help place the problem in a business context, imagine that AdventureWorks, whose main product line is bikes, acquires a new company that enriches the product portfolio with accessories products. As the chief analyst for AdventureWorks, you are asked to create a new report based on the unified product lines (see Figure 3-1).
To append the Bikes and Accessories tables, you can start from a new Excel workbook or a Power BI Desktop report and import the two tables as separate queries. You can follow the steps of this exercise after you download the files C03E01 - Bikes.xlsx and C03E01 - Accessories.xlsx from www.powerquerybook.com.
Open a blank new Excel workbook or a Power BI Desktop.
In Power BI Desktop, select Excel workbook on the Home tab.
In Excel, on the Data tab, select Get data > From file > From Excel workbook.
Select the file C03E01 - Bikes.xlsx.
In the Navigator, select Bikes and select Transform data or Create.
In the Power Query editor, select Close & load to. If you use Power BI Desktop, select Close & apply and skip step 5.
In the Import Data dialog box, select Table and click OK. You now have all the Bikes data imported to your new workbook or Power BI Desktop. Whenever the data in C03E01 - Bikes.xlsx changes, and new bikes are added to the product line, you can obtain the latest data by clicking Refresh All on the Data tab of Excel or by pressing Ctrl+Alt+F5. If you use Power BI Desktop, you can simply select Refresh on the Home tab.
It’s time to import Accessories into the workbook or Power BI report.
In the Power Query editor, right-click inside the Queries pane and select New query > Excel workbook.
Select the file C03E01 - Accessories.xlsx.
In the Navigator, select Accessories and select Transform data or Create.
To append Bikes to Accessories in the Power Query editor, ensure that the Accessories query is selected and select Append queries on the Home tab. Note that the Append queries dropdown menu also includes the option Append queries as new, which will be described in Exercise 3-2.
When the Append dialog box opens, select Bikes as the table to append, and select OK.
Ensure that the Queries pane is expanded (the pane on the left side of the Power Query editor).
Notice that you have two queries in the Queries pane: Bikes and Accessories. As you have just appended the query Bikes into Accessories, the query name doesn’t represent the appended data. You should therefore rename the Accessories query and ensure that this query, and not Bikes, will be loaded to the Power BI report or Excel workbook. To rename it, right-click the query Accessories, select Rename, and rename it Products. You can also rename the query in the Properties section of the Query Settings pane.
Follow these steps to unload Bikes:
In Power BI Desktop: Right-click Bikes in the Queries pane, deselect Enable load, and select Close & apply.
In Excel:
Select Close & load to.
In the Import Data dialog box, select Table and select OK.
To unload Bikes from the workbook, in the Queries & Connections pane, right-click the query Bikes and select Load to.
In the Import Data dialog box, select Only create connection and select OK.
When the Possible Data Loss dialog box opens, select OK to unload the query and delete the table.
Finally, delete the Bikes sheet.
You now have an appended Products table that contains the combined data from Bikes and Accessories. But if you take a closer look at the Queries pane, you will notice the asymmetrical nature of the append step. You now have the Bikes query and the combined Products query. But where is Accessories? Recall that right after importing Accessories, you appended Bikes to it and renamed it Products. In the future, if you want to apply some changes to the Accessories table, it will be difficult to find the right step to modify, as you now have a query with the combined transformation steps of Accessories and the appended Products table. A cleaner approach would be to have Accessories as a separate query—and this is why the Append queries transformation has a sibling transformation called Append queries as new.
You can review the solution files C03E01 - Solution.pbix and C03E01 - Solution.xlsx, which are available at www.powerquerybook.com.
Exercise 3-2, part 1: Using Append queries as new
The Home tab of the Power Query editor enables you to append multiple queries in the Combine section in two ways: by using Append queries or Append queries as new (see Figure 3-2).
In the preceding exercise, you appended two tables by using the Append queries transformation, which appends the second query to the current one. Append queries as new creates a third query that combines the two queries, and thus the two queries remain unaffected.
Instead of repurposing Accessories to include all the products, in this exercise you will keep Accessories as a separate query and create a new one for the products.
Follow steps 1–8 in Exercise 3-1 to load the queries Bikes and Accessories into the Power Query editor.
In the Power Query editor, ensure that the Accessories query is selected, and select Append queries > Append queries as new on the Home tab.
When the Append dialog box opens, select Bikes as the table to append to the primary table and select OK. You now have a new query with the combined Bikes and Accessories data.
Rename the new query Products, and follow Exercise 3-1 step 12 to unload Bikes. To unload Accessories, you can again follow step 12, but this time apply the instructions on Accessories instead of Bikes.
Save your workbook or Power BI report. Next, you will learn another method to append Bikes and Accessories into a new query—by using the Reference option. You will also compare the underlying M expression of the two methods.
Exercise 3-2, part 2: Query dependencies and references
In Exercise 3-2 Part 1, you applied Append queries as new, which resulted in a new query that combined Bikes and Accessories. The new query referenced the other two queries and used their output as its input. When a query is referencing another query, a dependency between the queries is established. When a query has dependencies on other queries, you cannot delete it. (If you try to delete Bikes, for example, you will get a prompt that prevents you from deleting it because Products is dependent on it.)
To view dependencies between queries, open the Power BI report or Excel workbook that you created in Exercise 3-2, part 1 and launch the Power Query editor. On the View tab, select Query Dependencies. The Query Dependencies dialog box opens, as shown in Figure 3-3.
The Query Dependencies dialog box shows a flow diagram, which starts from the bikes and accessories files as the data sources, flows through the intermediate queries Bikes and Accessories, and finally ends at Products, where you appended the two tables. This view will come in handy when you work on complex data preparation scenarios.
Now that you know how to view the dependencies between queries, you’re ready to learn about references, which enable you to create rich and reusable data transformation flows and tackle real-life data challenges. A reference can be used to create a new query whose source, or input data, is the output of another query.
Earlier, in Exercise 3-2, part 1, you loaded Bikes and Accessories as queries and applied Append queries as new to create a new query with the appended results. Alternatively, you can achieve the same outcome by using a combination of Append queries and Reference.
Follow steps 1–8 in Exercise 3-1 to load the queries Bikes and Accessories into the Power Query editor.
Open the Queries pane and right-click Accessories. Select Reference, and notice that you now have a new query, Accessories (2), which starts with the output of Accessories.
Rename Accessories (2) to Products.
With the Products query selected, select Append queries on the Home tab to append Bikes to Products: When the Append dialog box opens, select Bikes as the table to append and select OK.
Now you can examine how Power Query handled the Reference and Append queries steps in the underlying M formulas and compare the output of the Reference and Append queries combination with the output of Exercise 3-2, part 1. Let’s examine the underlying M code.
See Also Recall that Chapter 1, “Introduction to Power Query,” introduces M code, which is generated by Power Query to execute the transformation steps. Each transformation step that you create in the Power Query editor is translated into a line of code, which you can see in the formula bar.
On the View tab, select Formula Bar and then select Source in the Applied steps list in the Query Settings pane. Here is the M formula you see:
Accessories
This code represents the output of the Accessories query, which is used as the input for the new query. As you might guess, this code was generated by applying Reference on Accessories.
Click Advanced editor on the Home tab to review the M expression:
let Source = Accessories, #"Appended Query" = Table.Combine({Source, Bikes}) in #"Appended Query"
Chapter 9, “Introduction to the Power Query M Formula Language,” covers this syntax in more detail and explains the meaning of let and in, but for now, you can just focus on the two lines between let and in.
The first line after let places the output of the query Accessories in Source. (You can see that Source is also the first step in the Applied steps list.) The second line uses Table.Combine to append between Source and Bikes—and this is an append operation between Accessories and Bikes. (You may have noticed that the queries are wrapped in braces, which are explained in the next section. Hint: Braces are used to create a list.)
If you now look at the M code that was generated when you followed Exercise 3-2, Part 1 using Append queries as new, you find the following code in the Advanced editor:
let Source = Table.Combine({Accessories, Bikes}) in Source
If you look closely at these two sections of code, you will see that the two expressions return the same output: the combined tables of Accessories and Bikes.
To conclude this exercise, you can use a combination of Reference and Append queries to get the same results as when you use Append queries as new. While the latter technique leads to fewer lines of codes and takes only a single user interface step instead of two, there is no real difference between the two methods. So why did we introduce the Reference option? As you will see in later chapters, Reference can be very helpful in creating a pipeline of transformation queries or multiple transformation branches of the same base query.
Appending three or more tables
So far in this chapter, you have learned how to append two tables. It’s time to move to a more generic challenge: how to append three tables or more. One approach to this challenge is to start by appending two tables, and then to append the output of the two tables with a third table. Next, you can append the output of these steps with a fourth table, and so forth. While this approach can work well, it is difficult to maintain, as it requires the creation of intermediate queries. When you need to append three or four tables, it is not a significant price to pay, but with a larger number of queries, a better approach is required.
Exercise 3-2, part 3: Bikes + Accessories + Components
Let’s return to our fictional story. After the successful integration of the accessories product line with bikes, AdventureWorks decides to acquire a new company and extend its portfolio to components. Your goal as the chief analyst is to append the product lines into a single table.
For this exercise, you can download the following three workbooks from www.powerquerybook.com:
C03E01 - Bikes.xlsx
C03E01 - Accessories.xlsx
C03E01 - Components.xlsx
Open a new blank Excel workbook or a new Power BI report, and import the three Excel workbooks into the Power Query editor. Each workbook contains a single worksheet with a corresponding name (that is, a Bike worksheet in C03E01 - Bikes.xlsx, an Accessories worksheet in C03E01 - Accessories.xlsx, and a Components worksheet in C03E01 - Components.xlsx). If you’re not sure how to import these workbooks, see Exercise 3-1 for the steps to load Bikes and Accessories. Then apply similar steps for Components.
In the Queries pane of the Power Query editor, select Components.
On the Home tab, select Append queries as new.
When the Append dialog box opens, select Three or more tables. The Available table(s) and Tables to append lists appear, as shown in Figure 3-4. You can find Bikes and Accessories under Available table(s).
Add Bikes and Accessories to the Tables to append list, and click OK.
You now have a new query with the combined product line. From here, you can rename the new query Products and unload the intermediate queries as you learned in Exercise 3-1.
Now that you have appended Bikes, Accessories, and Components, you can see how easy it is to apply the same transformation in the formula bar of the Power Query editor. When the Products query is selected, you can see the following formula:
Table.Combine({Components, Bikes, Accessories})
Do you find this familiar? Earlier, when you appended two tables, the formula bar included the following code:
Table.Combine({Accessories, Bikes})
From the syntax of this formula, you can see that Table.Combine is the function used to execute the append transformation of multiple tables. The function can hold multiple tables inside braces.
Now that you know how to append three tables, you can add a fourth table, but this time you will do it by using code only. (Keep your Power BI report or Excel workbook open for the next exercise.)
Exercise 3-2, part 4: Bikes + Accessories + Components + Clothing
Let’s return to our fictional story. With the expansion in product lines, the chief operating officer of AdventureWorks decides to consolidate the Clothing division with the core business units. As the chief analyst, you receive a fourth table for the clothing product line. In part 4 of this exercise, you will learn how to append the four tables together—this time using M rather than the Append dialog box.
For this exercise, you can download the workbook C03E01 - Clothing.xlsx from www.powerquerybook.com. Open the Excel workbook or Power BI Desktop report from Exercise 3-2, part 3 (if it isn’t still open), and import Clothing as a new query. (See Exercise 3-1 for the steps to load Bikes and Accessories. Then apply similar steps for Clothing.) Rename the new query Clothing.
In the Power Query editor, select the Products query. Look at the formula bar. You should see the following formula:
Table.Combine({Components, Bikes, Accessories})
You need to modify it by adding Clothing anywhere inside the braces. Make sure that all four tables are comma separated inside the braces. When you are done, press Enter. Your formula can look like any of the following:
Table.Combine({Clothing, Components, Bikes, Accessories})
or
Table.Combine({Components, Clothing, Bikes, Accessories})
or
Table.Combine({Components, Bikes, Clothing, Accessories})
or
Table.Combine({Components, Bikes, Accessories, Clothing})
To verify that the Table.Combine worked, click the Filter control in the header of the ParentProductCategoryName column. When the Filter pane opens, you should see four values: Accessories, Bikes, Clothing, and Components. Checking the values in the Filter pane is an easy and practical method to audit your transformation steps.
Now that you have appended Clothing to the products, you’re ready to move to new scenarios and learn how to append multiple tables on a larger scale. You can review the solution files C03E02 - Solution.pbix and C03E02 - Solution.xlsx, which are available at www.powerquerybook.com.