Open topic with navigation
When using DMExpress within Hadoop, the DMExpress execution metadata (status messages and statistics) is output to the Hadoop stderr logs. This log output can be useful for ensuring that DMExpress was invoked, reviewing any issued warnings or errors, and checking statistics for the executed job.
The logs can be viewed individually using either the JobHistoryServer (JHS) web interface or the ResourceManager (RM) web interface. They can also be gathered using the attached script, which requires JHS to be running to gather the logs.
The instructions provided here assume that DMExpress is being used with Hadoop MapReduce version 2 (YARN), and apply to all methods of invocation of DMExpress within Hadoop, including streaming.
When running a Hadoop job that invokes DMExpress, the DMExpress execution metadata does not appear on the terminal, but is captured in the Hadoop logs as follows:
Hadoop job log files are stored in a standard location and made available over HTTP. You can access them in the following ways, as described in detail in the next sections:
The JHS web interface lists job IDs for completed jobs only (both successful and non-successful). To view logs for running jobs, use the RM web interface.
The default port for the JHS web interface is 19888. If the default has been changed, you can find the port number in the configuration parameter mapreduce.jobhistory.webapp.address.
For example, if the hostname of the node where JHS is running is jobHistoryServer, and the default port is being used, the JHS web interface can be accessed by entering the following URL in your browser:
Then, follow these steps to view the logs for an individual task attempt:
The RM web interface lists job IDs for both completed and running jobs:
The default port for the RM web interface is 8088. If the default has been changed, you can find the port number in the configuration parameter yarn.resourcemanager.webapp.address.
For example, if the hostname of the RM node is resourceManager, and the default port is being used, the RM web interface can be accessed by entering the following URL in your browser:
For completed jobs, follow these steps to view the logs for an individual task attempt:
For currently executing jobs, follow these steps to view the logs for individual task attempt:
The attached script can be used to automatically gather all logs, including DMExpress logs, from a particular Hadoop job run, subject to the following requirements:
After downloading the script (save it as getlogs.sh), make sure it is marked as executable before attempting to invoke it by running:
chmod +x getlogs.sh
The usage of the script is:
./getlogs.sh –j JOB_ID –v MR_VERSION
For example, if the job ID is job_1378758047280_0020, and the MR version is MRv2, run the script as follows:
./getlogs.sh -j job_1378758047280_0020 -v 2
This will gather the log files for all task attempts associated with the given Hadoop job. These files will be placed in a new directory, named with the job ID, within the current directory.
The attached script, getlogs.sh, can be used to gather the logs.
By default, the JHS will retain the logs for one week, after which they are deleted. This default value can be configured by setting the JHS configuration parameter mapreduce.jobhistory.max-age-ms.
If the job logs are deleted, it will not be possible to determine whether DMExpress was invoked for that job.
For instructions on finding the Hadoop logs for MRv1, see Finding DMExpress Hadoop Logs on MRv1.
Copyright © 2016 Syncsort All rights reserved.