
How to Join, Merge, Union, & Concatenate Tables in SQL and Python
I walked into a restaurant the other day and saw 2 tables sitting together; so I went over and joined them…and, after watching this training, you can too!
In the training video above, we walk through how to perform left, right, inner, outer, and cross joins (which are essentially ways of “horizontally” merging) as well as unions or concatenation (which is essentially “vertically” merging). We’ll show you how to apply these concepts in both SQL and Python (using the Pandas package); but first we illustrate these concepts (which apply to joining tables in nearly any analytics platform) with some Value Driven Analytics t-shirts!
You can download the Jupyter Notebook used in the video here.
If you want to run the code alongside the video (not critical) and don’t already have Anaconda or Jupyter installed, check out the very beginning of the intro to Python training here to see how to Download Anaconda (free packaging of Python executor, the Jupyter Notebook IDE, and common Python packages) and launch a Jupyter Notebook. If you want to learn more about Pandas data manipulation, which we use briefly in the video, you can keep watching this intro to Python training.
If you want to learn more about SQL data manipulation, you can check out our intro to SQL training here, which covers selecting data from a data table, creating new columns, joining tables together, and aggregating and grouping data.
In a join or merge you have a left table and right table (sometimes multiple); the left table is essentially the starting table that is listed first in your code or platform.
A left join will lead to a resulting table that will include all rows in the left table (or starting table), regardless of whether they have a match in the right table being joined to it; rows from the right table will only be included in the resulting table if they have a match in the left table.
As you can imagine, a right join does the opposite; it will include all the rows in the right table (the table mentioned second or later) even if they’re not included in left table (or first table). Note that any columns from the left (or first table) would be null for rows where there weren’t any matching rows in the left table.
With an inner join, only rows that are contained in both tables are included in the outputted resulting table.
With an outer or full outer join, all rows in the left and right tables are included! Keep in mind that rows will have missing values for any columns that come a table that row is not contained in.
Finally, with a cross join, every row from the left or starting table gets joined with every row in the right table, even if they don’t have any matching characteristics. For instance, the resulting table of cross joining a 3 row table with a 2 row table would have 3 x 2 = 6 rows in total; this is also called a cartesian production. Since it creates so many rows, it can be very computationally intensive. It’s often not desired, but there is occasionally a good use case for it.
Lastly, let’s discuss the concept of what Pandas would call a concatenation and what SQL would call a union. What we’ve been discussing above is more of a horizontal joining of tables in the sense that usually it’s done to bring in additional columns (which makes the resulting table wider, horizontally), although it can create more rows as well. With a concatenation or a union, it’s more of a vertical joining of tables, usually for the purpose of bringing in more rows (making the table taller).
Concatenating or unioning tables together simply appends the rows from one table to another table; if concatenating or unioning a table with 2 rows to a table with 3 rows, the result would be a table with 5 rows.
Watch the video above to learn how to code out the concepts we just discussed and more using SQL and the Pandas library within Python.
I hope you enjoyed this article on joining, merging, concatenating, and unioning. Come back for more coding articles! Who knows…there might even be a sequel…


