Open topic with navigation
Syncsort DMExpress is known for its extreme performance. Customers can integrate DMExpress with Informatica PowerCenter to accelerate the performance of their existing Informatica solutions in the following ways:
Here we focus on the first two methods to invoke a DMExpress job/task from Informatica PowerCenter.
An Informatica command task can be used to run one or more OS shell commands (any valid UNIX command or shell script on UNIX, or any valid DOS command or batch file on Windows) during an Informatica PowerCenter session execution in the following ways:
A DMExpress job/task can be invoked in any of these scenarios by wrapping the dmxjob or dmexpress command in a shell script (on UNIX) or a DOS batch file (on Windows) and executing that script as a command in the command task. In the same wrapper script, you can set any environment variables needed such as DMXDataDirectory, PATH, LD_LIBRARY_PATH, etc.
While commands invoked via command task cannot receive/return any data directly via the command task, they can read any necessary input from a designated source and write any output to a designated target. These sources/targets can include flat files or database tables, and in some situations, named pipes, based on the requirements/capabilities of any upstream/downstream processes with which they need to interact.
A Source Qualifier in an Informatica session defines the properties of a source from which the data is read. When the source type is flat file, the Source Qualifier can be configured to read the data from the output of an OS command or shell/batch Script.
The Informatica flat file reader can read the standard output of the command generating data, so the DMExpress job/task can write its target data to standard output, thereby avoiding file I/O overhead. The Informatica source file properties (such as fixed-length vs. delimited/delimiter, record layout) must match the properties of the DMExpress standard output.
A DMExpress job/task can be invoked by wrapping the dmxjob or dmexpress command in a shell script (UNIX) or a DOS batch file (Windows) and defining that script as a "command generating data" in an Informatica Source Qualifier. As in the case of a command task, any necessary environment variables can be set in the wrapper script.
The attached example illustrates how DMExpress tasks can be invoked both in a standalone command task and as a "command generating data" for flat file sources in the same Informatica work flow.
In the following Informatica PowerCenter mapping, source data is read from a very large relational database table using a relational reader. It is then passed on to two downstream processes in which it is sorted on different keys and passed through some other transformations before it is finally written to two target database tables.
It was found that the sort performance degraded as the size of the data increased, and the customer wanted to accelerate the performance of this Informatica mapping by offloading the sort work to DMExpress.
In the Informatica solution that integrates with DMExpress, the data is read from Oracle using a DMExpress copy task, then passed to two DMExpress sort tasks that feed into the Informatica PowerCenter mapping for downstream processing.
If there was only one sort, you would not need the DMExpress copy task and the Informatica command task to invoke it. You would need only one DMExpress sort task, invoked as a "command generating data" in the Source Qualifier, that reads from Oracle source, sorts the data, and writes to standard output. With two sorts, the copy task is used to access the database only one time instead of having each sort access the database directly.
The DMExpress copy task is invoked in a standalone Informatica command task at the beginning of the Informatica work flow. It reads the data from the database table and passes it to the two sort tasks, SORT-S1 and SORT-S2, using two named pipes. Named pipes allow us to parallelize the copy and sort tasks, and eliminate the additional I/O that would be incurred if passing the data via files. Because we are using named pipes, the copy and sort tasks must be run simultaneously, and hence the copy task must be run in the background.
The command to run the DMExpress copy task, CopyTask.dxt, is wrapped in a shell script and the shell script is run as a command in the command task, whose properties are specified as follows:
Command: ksh /opt/dmx/run_dmx_copy.sh
Note that the dmexpress command in the run_dmx_copy.sh script is run in the background using an ampersand (&). On Windows, use start /b dmexpress ... to run in the background.
The DMExpress sort tasks are invoked as commands of type "command generating data" in the Source Qualifier definitions in the Informatica session. The output of the sort tasks is picked up by the Informatica mapping via standard input. In the Informatica mapping, the relational reader is replaced with two Source Qualifiers of type "file reader".
The commands to run the DMExpress sort tasks, SortS1Task.dxt and SortS2Task.dxt, are wrapped in shell scripts and the shell scripts are run as commands of type "Command Generating Data" in the Informatica session. The source qualifier properties for both file readers are the same except for the Command property:
Input Type: Command
Command Type: Command Generating Data
Source File Type: Direct
Command (for S1): ksh /opt/dmx/run_dmx_sort_s1.sh
Command (for S2): ksh /opt/dmx/run_dmx_sort_s2.sh
The Source Qualifier properties under the Mapping tab of the session properties window define which command is run to generate the data for an Informatica flat file source. When calling a DMExpress job/task to generate the data, the following properties must be set to the indicated values, with all remaining properties set to default values:
|Input Type||Command||Type of the input source. "Command" indicates that the command replaces the source file.|
|Command Type||Command Generating Data||Indicates that the command generates the input data.|
|Command||<call to wrapper script>||The call to the wrapper script that runs the dmxjob/dmexpresscommand that writes target data to standard output, e.g. ksh /opt/dmx/dmx_sort.sh|
|Source File Type||Direct||Indicates that the source file (in this case standard output) contains the actual data.|
112_ShellScripts.txt contains the sample shell scripts.
Note that no commands can follow the dmexpress or dmxjob command in the scripts.
Copyright © 2016 Syncsort All rights reserved.