Open topic with navigation
The Mainframe Extract Local to HDFS and Mainframe Extract to HDFS use case accelerators demonstrate how to load mainframe files to HDFS. They convert fixed length EBCDIC data to displayable ASCII data in the process. The examples vary only in their source file location:
The example job has two tasks, each using a different source file. This job demonstrates loading multiple mainframe-formatted files into HDFS in parallel.
A locally stored mainframe file can be loaded into HDFS using one copy task. This task reads from a mainframe-formatted file that is on the local file system and loads into HDFS.
The source is specified to be a fixed record length file with EBCDIC encoding in the Source File dialog. The Remote File connection is set to File is local at run-time.
A file can be loaded directly from the mainframe into HDFS using one copy task. This task reads the source file directly from the mainframe using a remote file connection that connects to the mainframe and then loads into HDFS.
The source is specified to be a fixed record length file with EBCDIC encoding in the Source File dialog. The "Remote file connection" should point to the mainframe server.
The COBOL copybook lineitem.cpy is used to define the record layout for the source files. This allows DMExpress to interpret the mixed text and binary data in mainframe source files without the need to redevelop the source metadata that the copybook describes.
A reformat is used to identify the fields to be output to the target. It is also used to convert the fields to delimited format and compress any leading and trailing spaces. Using a delimited reformat will automatically cause the binary number fields to be converted as displayable decimal numbers.
Finally, the target is defined to be a UNIX text file with an ASCII encoding and a comma delimiter. To load the file directly to HDFS, an HDFS connection is attached to the target definition in the Target Dialog.
Since the source is a local file, you can run Mainframe Extract Local to HDFS on any Linux system that has an HDFS client configured to connect to a Hadoop cluster. The example assumes that the fixed length mainframe files were transferred from the mainframe to the local server, typically via binary FTP transfer mode. If you are using the example files provided, this has already been taken care of for you.
To run Mainframe Extract to HDFS, you need to do the following:
The following attachments are available for running the Mainframe Extract use case accelerators:
See the Guide to DMX-h ETL Use Case Accelerators for an overview of how the set of use case accelerators are organized and how to run them.
For general guidance on developing and running DMX-h ETL solutions, see Developing DMX-h ETL Jobs and Running DMX-h ETL Jobs in the DMExpress Help.
Copyright © 2016 Syncsort All rights reserved.