Open topic with navigation
Join tasks in DMExpress are binary, meaning they can only take two sides - a left side and a right side - even if there are multiple sources on either side. A multi-way join takes more than two sides, and can be implemented in DMExpress by linking multiple binary join tasks together in a job.
The following graphical representation and corresponding ANSI SQL query, which selects employee information from three database tables, is an example of a multi-way join:
Emp_Dept_Input as t1
Emp_Info_Input as t2 ON t2.EmpNo = t1.EmpNo
Addr_Input as t3 ON t2.AddrId = t3.AddrId
In DMExpress, a join task allows multiple sources on each side of the join, but that does not make it a multi-way join, as it still has only two sides. However, a multi-way join can be implemented in DMExpress by linking multiple binary join tasks together, as shown in the following example.
The attached example implements a multi-way join in DMExpress using two tasks.
Step1 joins data from two source files, Emp_Dept_Input and Emp_Info_Input, on the EmpNo key to create an intermediate result called Step1_Join_Out.
Step2 joins data from two source files, Step1_Join_Out (the output from Step1) and Addr_Input. The final output, MultiwayJoin_Output, is the result of an effective multi-way join of all three original sources.
For the sake of transparency, the example uses a sequential data flow between the join tasks, allowing the reader to see the intermediate output. However, direct data flow is recommended whenever possible so that data is transferred in memory, resulting in potentially significant performance gains.
Performance of a DMExpress Job implementing a multi-way join can be improved by early data reduction. When ordering the join tasks in a job, it is best to put the join task that will most reduce the amount of data being transformed early in the job.
For example, suppose we have the following situation:
In this case, running the Emp_Info_Input and Addr_Input join task first will eliminate a large number of records in the intermediate output, allowing the second join task (on intermediate output and Emp_Dept_Input) to run much faster.
Conversely, it is best to put the join task that will add the most data to the end of the job. Although reordering join tasks in a job can help improve performance, it should be done only if it does not modify the job logically.
While the example uses flat files as sources to the DMExpress Join tasks, the design remains unchanged when using database sources.
57_MultiWayJoin.zip, compatible with DMExpress version 8.0.5 or higher
For more information on Join in DMExpress, see Join and Join Dialog in the DMExpress Help.
Multiple lookup functions can sometimes be used in place of a multi-way join. For more information, see Lookup Function and Lookup and performance in the DMExpress Help.
For more information on direct data flows, see Direct data flows and performance in the DMExpress Help and Best Practices When Using Direct Data Flows.
Copyright © 2016 Syncsort All rights reserved.