Use Case Accelerators > How to Deduplicate Data Using Sort
How to Deduplicate Data Using Sort
One method for eliminating duplicate records in DMExpress is to sort the data and choose to retain only one record from an equal-keyed set. This method assumes that the output can contain any record from an equal-keyed set.
Sort can also be used to retain the first record, and is easier to implement when the records contain numerous fields, but otherwise, aggregation is recommended for deduplication when retaining the first (or last) record in an equal-keyed set is desired.
The attached example demonstrates deduplicating data using a DMExpress sort task.
The task sorts the data on Name in ascending order. In order to retain only one record from an equal-keyed set, we select Retain only one record. By default, the order of equal keyed records is not maintained, so this will result in "any" record being retained.
Deduplication can also be achieved using aggregation. Consider the following when choosing between the sort and aggregation methods:
If you want to keep any record from a set of duplicate records, with all fields guaranteed to be from the same record, use the sort method.
If you want to keep the first record from a set of duplicate records, the sort method can be used by clearing the Original order of equal-keyed records need not be maintained checkbox in the Performance Tuning dialog. This method is easier to implement than the aggregation method if the records contain numerous fields, but it may not perform as well.
If you want to keep the first, last, and/or any fields of a set of duplicate records, the aggregation method is more flexible.
When there is sufficient memory with respect to the size of the output, DMExpress will invoke a high performance aggregation, in which case aggregation performs better than sort.