Open topic with navigation
An important step in analyzing the performance of your DMExpress tasks is determining whether the task is CPU bound or I/O bound:
Observing the elapsed vs. CPU time in the DMExpress statistics can provide insight into where your task is bound and how you might improve the performance of your task.
Tasks are taking longer than expected to run.
The most straightforward way to determine whether a task is I/O bound or CPU bound is to compare the elapsed time of the task with the CPU time of the task. These values can be found in the DMExpress statistics. For example:
Here we can see that this task took roughly 1 minute of elapsed time and 3 minutes of CPU time. The CPU time shown is the aggregate of all CPUs utilized, so the CPU time being a little over 3 times that of the elapsed time implies that the task used about 3 CPUs, thereby parallelizing 3 minutes of processing time into 1 minute of elapsed time.
Having an elapsed time much greater than CPU time typically indicates that the task was I/O bound, implying there were periods of time where the CPU was waiting for some procedure to finish, such as reading/writing data from/to a disk, and that the task would benefit from faster disks, a faster network, and/or data compression.
Note that the CPU time being less than the elapsed time doesn’t necessarily indicate that the task used no parallel processing, only that it spent more time waiting than processing.
Having a CPU time equal to or greater than elapsed time typically indicates that the task was not I/O bound, implying that there may be opportunities to optimize the design of the task to improve processing time.
One way to separate I/O performance from CPU performance is to create a plain copy task that uses the same source as your real task so that you eliminate nearly all of the processing and come up with a baseline I/O throughput. This will help determine the extent to which the CPU is the bottleneck as compared to the time it took for I/O alone.
When attempting to resolve a performance issue, it is recommended to follow the DMExpress and DMX-h Implementation Best Practices as a starting point. If that does not resolve the performance issue, contact Syncsort Technical Support at (201) 930-8270 or DMXSupport@syncsort.com, or contact your local agent.
To find the DMExpress statistics for MapReduce jobs, see Finding DMExpress Hadoop Logs on MRv1 or Finding DMExpress Hadoop Logs on YARN (MRv2).
Copyright © 2016 Syncsort All rights reserved.