Description:-
As explain earlier merge is like a union operation in which it just collect all the records from provided sources without any condition. Suppose if we have Publisher_Mst
Below is Store_Mst table with two columns Store_Id & Store_Name
Step 1:- Add the different sources which you want. For this example I am taking two sources which have AdventureWorks database and one source have Store_Mst table & another one have Publisher_Mst as shown above in the example.
We added data flow task as shown in below image and renamed it to Merge join data flow task.
In this, you have to right click on DataSource and choose option Advance Editor by which you will get the following screen. You have to select “OLEDB Source output” of “Input and output Properties” tab.
Here, we need to make the IsSorted property to True as highlighted in below snap.
Here if you see, we can configure join type like Inner join, Left outer join, Full outer join.
For current example, we are choosing the option “Inner join” and joining key is BusinessEntityId which is primary key in Person table and foreign key in PersonPhone table.
In this post we are going to discuss Merge Join.
Although, it might be confusing sometimes merge & Merge Join. So,
let me explain it here.
As explain earlier merge is like a union operation in which it just collect all the records from provided sources without any condition. Suppose if we have Publisher_Mst
& Store_Mst
two sources as shown below.
Below is Store_Mst table with two columns Store_Id & Store_Name
Store_Id
|
Store_Name
|
1
|
Book store
|
2
|
Medical Store
|
3
|
Kiran Store
|
Below is Publisher_Mst table with two columns Publisher_Name
& Store_Id
Publisher_Name
|
Store_Id
|
Umesh
|
1
|
Harpal
|
2
|
Kirit
|
3
|
Now if we talk about merge transformation then the result would be as
shown below
Merge Transformation output.
Merge Transformation output.
Store_Id
|
Publisher_Name
|
Store_Name
|
1
|
Book store
|
|
2
|
Medical Store
|
|
3
|
Kiran Store
|
|
1
|
Umesh
|
|
2
|
Harpal
|
|
3
|
Kirit
|
Now, if
we talk about Merge Join transformation for above same inputs (Publisher_Mst, Store_Mst) then we will get following
output.
Store_Id
|
Publisher_Name
|
Store_Name
|
1
|
Umesh
|
Book store
|
2
|
Harpal
|
Medical Store
|
3
|
Kirit
|
Kiran Store
|
So, Merge Join is basically a join like Inner
join, Left outer join, full outer join etc.
I hope you got the context. Here we go step by step to implement the Merge join transformation.
I hope you got the context. Here we go step by step to implement the Merge join transformation.
Step 1:- Add the different sources which you want. For this example I am taking two sources which have AdventureWorks database and one source have Store_Mst table & another one have Publisher_Mst as shown above in the example.
We added data flow task as shown in below image and renamed it to Merge join data flow task.
Now, in this
data flow task, we will add data sources, in the first source we use Store_Mst
table and in the second source we use Publisher_Mst.
from Store_Mst table, we are choosing Store_Id and Store_Name as shown in below figure
from Store_Mst table, we are choosing Store_Id and Store_Name as shown in below figure
In the other source, we are using Store_Id & Publisher_Name
columns.
Step 2:- Now
once the source is configured, we are going to drag drop merge control as shown
in below figure.
Step 3:- Now to
configure Merge join, we are going to drag drop inputs from the sources.
When you try this at your end then you will get the following screen. Here you
have to set left input and right input for the merge join.
Now, when you
tried it you might found following warning. Which means the provided
inputs of datasource1 & datasource2 are not sorted.
Step 4:- Here you have the option to add a Sort transformation and then provide input to merge join
transformation control which is fairly simple and we have discussed in
previous tips. Here, I am sharing one more option.
In this, you have to right click on DataSource and choose option Advance Editor by which you will get the following screen. You have to select “OLEDB Source output” of “Input and output Properties” tab.
Here, we need to make the IsSorted property to True as highlighted in below snap.
Step 5:- Once the above Step is done next step is click
on output column’s Store_Id value.
Here you have to set the SortKeyPosition
to 1. As highlighted in below screen.
Step 6:- Once you configured sorted input sources then next step is to
configure the merge join. So when you click on Edit option of merge
join. You will get the following screen.
Here if you see, we can configure join type like Inner join, Left outer join, Full outer join.
For current example, we are choosing the option “Inner join” and joining key is BusinessEntityId which is primary key in Person table and foreign key in PersonPhone table.
Once the
configuration is done. We can select what are the columns which we need as an
output.
Step 7:- Now, Next step is to configure the output in excel. For this, we drag
& drop a destination source. In the current example, we are using Excel as
a destination. We are mapping the columns of output to excel as shown in below
figures.
Step 8:- Once,
everything is configured. We run the package and if everything is working fine
then we will get the following screen.
To cross check the result we open excel and we get the output which we
expected as shown in below figure.
Thanks for comments.....