Introduction

The merge feature enables you to merge two or more tables into one.

tut_merge00

A typical use case for this feature is to blend information from similar-but-different source systems. For example, an insurance company might have different legacy systems for each type of cover that it offers. While the concept of “policy” may be similar, the structure and content of the data in each system may be different. The merge operation helps you to combine that data into one consolidated policy table.

In this tutorial, we will walk through the creation of a merge table from a data source.

Create Test Data

Using a text editor, create two CSV files with the following data:

merge1.csv:

PolicyNumber,PolicyName,SumInsured
12345,Ron Dunn,10000
23456,Ian Nicholson,20000
34567,Rob Dumont,30000
45678,Ngo Phi Nga,15000

merge2.csv:

Policy,Insured,PolicyHolder
ABCDE,12000,Dan Drum
FGHIJ,8000,Au How Leng
KLMNO,23000,Russell Park-Miller
PQRST,12345,Erik Baardse

Save the files in one of the following locations:

Windows:          
c:\ajilius2\samples\merge1.csv
c:\ajilius2\samples\merge2.csv

Linux / OSX:      
/opt/ajilius2/samples/merge1.csv 
/opt/ajilius2/samples/merge2.csv

Load merge1.csv

In this step we will load the merge1.csv file into the data warehouse.

tut_merge00a

Using the CHINOOK sample data warehouse, drill into the CSV Files data source:

tut_merge01

Load the metadata for the merge1 table:

tut_merge02

Run the scripts to create and load the load_merge1 table:

tut_merge03

View the load_merge1 table to confirm that the data has been loaded:

tut_merge04

Load merge2.csv

In this step we will load file merge2.csv into the data warehouse.

tut_merge00b

Repeat the steps that you took to load merge1.csv in the previous section, but this time load and verify merge2.csv instead of merge1.

When you view load_merge2 it should look like this:

tut_merge05

Transform load_merge1 to stage_merge1

We now create a staging table from the load_merge1 load table. This will form the target for the later merge operation.

tut_merge00c

From the LOAD TABLES list, select CREATE STAGE on the context menu for the load_merge1 table that you created in the previous section:

tut_merge06

Create the stage table:

tut_merge07

Run the scripts to create and populate the stage table:

tut_merge08

View the table data to confirm that it has been loaded:

tut_merge09

Transform load_merge2 to stage_merge1

Now we are going to define the merge relationship between our second load table (load_merge2) and the original staging table (stage_merge1). This is different from the standard transform, in that we won’t be creating a new table as a result, but using an existing table instead.

tut_merge00d

From the LOAD TABLES list, select CREATE MERGE from the context menu for the load_merge2 table:

tut_merge10

Create the merge definition by updating the table name (stage_merge2) and description, and selecting  stage_merge1 as is the merge target:

tut_merge11

Note the difference in the heading for the right-hand list of columns. For merge tables, we show the columns of the target table (stage_merge1):

tut_merge12

Now, select the COLUMNS option from the context menu for the stage_merge2 table that you just created:

tut_merge13

The COLUMN MAP screen shows all of the columns from the target table (stage_merge1), with a drop down list of the columns from the source table (load_merge2). Update the column mapping:

tut_merge14

Run the scripts to populate the merge. Note that there is no CREATE script to run, because we are targeting a table that has already been created for stage_merge1.

tut_merge15

Finally, view the data for stage_merge1, to verify that the table now contains the combined contents of both load tables:

tut_merge16

Summary

Merge transforms help you to bring together different tables into a consolidated view of data. Their most common use is to combine similar data from different computer systems.

In this tutorial we demonstrated the creation of a merge from LOAD tables. Merges may also be created from STAGE tables, enabling you to clean and transform data before the merge.

The steps demonstrated in this tutorial are:

  • Load two different sets of data
  • Create a target table from one set of data
  • Merge a second set of data into the target table

If you have any questions, please contact us for support.