Utilities > Finding DMExpress Hadoop Logs on MRv1

Finding DMExpress Hadoop Logs on MRv1

Article #: Product: Version:

Summary

When using DMExpress within Hadoop, the DMExpress execution metadata (status messages and statistics) is output to the Hadoop stderr logs. This log output can be useful for ensuring that DMExpress was invoked, reviewing any issued warnings or errors, and checking statistics for the executed job.

The logs can be viewed individually using the JobTracker web interface, or gathered using the attached script, which also requires the JobTracker web interface.

The instructions provided here assume that DMExpress is being used with Hadoop MapReduce version 1 (MRv1), and apply to all methods of invocation of DMExpress within Hadoop, including streaming.

Resolution

When running a Hadoop job that invokes DMExpress, the DMExpress execution metadata does not appear on the terminal, but is captured in the Hadoop logs as follows:

Hadoop job log files are stored in a standard location and made available over HTTP by the JobTracker node. You can access them in the following ways, both of which require the Hadoop JobTracker web interface, as described in detail in the next sections:

Attachments

The attached script, getlogs.sh, can be used to gather the logs.

Additional Information

The raw logs for each task attempt are stored as individual files on the local filesystem of the node where the task execution was attempted. It can be difficult to locate these files due to the distributed nature of Hadoop. However, these log files are also made available through the JobTracker HTTP interface, so they can be accessed from any system. This provides a centralized location to query for task attempt logs.

The full stderr logs for a Hadoop task attempt are generally erased when the completed Hadoop job is "retired" or archived by the JobTracker. With a typical configuration, jobs are retired about one day after they finish running. If the job is retired and the stderr logs are deleted, it will probably not be possible to determine whether DMExpress was invoked for that job.

Some Hadoop distributions include alternative management interfaces in addition to the standard JobTracker web interface. It may also be possible to check stderr logs using these interfaces.

For instructions on finding the Hadoop logs for MRv2, see Finding DMExpress Hadoop Logs on YARN (MRv2).

Last updated: