Open topic with navigation
One of the challenges of offloading mainframe data to an open system is determining how to interpret the data. A COBOL copybook typically provides the record structure of a mainframe dataset, but interpreting the contents of a COBOL copybook, with various binary formats and REDEFINES clauses, is itself a challenge that few open source mechanisms can handle.
DMExpress easily connects to the mainframe, pulls the data, and both interprets and transforms it, all set up through a convenient graphical interface. It eliminates the file transfer and coding complexities of other tools such as JRecord.
Mainframe data sets generally contain numeric data stored in various binary formats, including packed decimals, zoned decimals, native binaries, and others, represented as COMP types in COBOL copybooks. Typically, these copybooks also contain a number of REDEFINES clauses, each of which imposes its own interpretation on the same set of bytes.
For example, if two REDEFINES are applied to an 8-byte text field [PIC X(8)], those 8 bytes can be interpreted in three different ways, such as a transaction date that can be used for US date format, UK date format, or a summary date format.
Unfortunately, the logic for choosing the correct interpretation during record processing is embedded in the COBOL program on the mainframe, not in the copybook. This means the interpreting system must provide a way to handle this ambiguity when migrating mainframe datasets to open systems.
Here we compare the benefits of using DMExpress to process mainframe files over using a tool such as JRecord.
DMExpress provides a graphical user interface to open a COBOL copybook, browse to the (remote) mainframe data file, sample the data, and expand it to show all the different interpretations. Accessing mainframe files via DMExpress rather than manual FTP ensures that record descriptor blocks (RDW) are retained and EBCDIC encoding is converted to ASCII.
You can then create conditions to select the desired interpretation based on your knowledge of the data, or you can work with multiple interpretations simultaneously. You can define any other required data transformation, and finally write the resulting data set to HDFS, databases, or any other target system. All of this functionality is stored in job and task files that can then be run via the GUI or at the command line.
In the following screen shot, we see the DMExpress view of the copybook, which contains three different interpretations of the transaction date:
Correspondingly, DMExpress creates three sets of fields for the transaction date interpretations, as shown in the data sampling view:
JRecord provides jar files to interpret mainframe data. However, you would need to write a java program that uses the JRecord routines to get access to the copybook interpretations. Further, you would need to extend the code to pick up the correct interpretations.
Moreover, JRecord is only a mainframe file interpreter, not a file mover. It neither pulls files from the mainframe nor pushes the final data to HDFS or any other target location. You would need to FTP the file manually from the mainframe, which can be problematic if it’s a variable block file or it contains binary data. Similarly, any other data processing is outside the realm of JRecord and would need to be coded in java separately, then all the pieces would need to be bundled together into a script for an end-to-end process.
For an example of using DMExpress to process mainframe datasets, see DMX-h Use Case Accelerator: Extract Mainframe Files with REDEFINES To HDFS.
Copyright © 2016 Syncsort All rights reserved.