Open topic with navigation
Many businesses are prioritizing efforts to cleanse or standardize data to improve data quality. One way to cleanse data is to identify invalid data and write it to an invalid data file, and write valid data to a separate, valid data file. With this approach, you can analyze and correct the invalid data in the invalid data file while continuing to process data from the valid data file.
You can facilitate data cleansing by using the DMExpress RegExReplace function, conditional expressions, and filters to create two targets, one each for valid and invalid data.
One example of data cleansing can involve identifying non-alphabetic characters in fields where only alphabetic characters are expected, writing the associated records to an invalid data file, and writing records with valid data to a valid data file as shown in the following Input Data file, Valid Data file, and Invalid Data file:
To find invalid data in the Input Data file, the FirstName field value is evaluated for non-alphabetic characters:
In the attached example, this processing is done in DMExpress using the RegExReplace function, conditional text, and filters to create the two targets.
The DMExpress function RegExReplace identifies any non-alphabetic characters in the source field FirstName and stores these non-alphabetic characters in the variable V_NameWithNonAlpha as follows:
V_NameWithNonAlpha = RegExReplace(RL_ForSource.FirstName,U'([^[:alpha:]]*)[[:alpha:]]*',U'\1')
||is the source record layout|
||is the source field against which we match the pattern of any combination of non-alphabetic characters, alphabetic characters, or both.|
The named condition IsItValidData is defined to evaluate whether the variable V_NameWithNonAlpha contains non-alphabetic characters as follows:
IsItValidData => V_NameWithNonAlpha = ''
Filters based on the value of the conditional expression are applied to the two targets to write the valid data to the Valid Data file and the invalid data to the Invalid Data file as follows:
For the Invalid Data file, the value of the V_NameWithNonAlpha variable is written to the Non-Alpha Character column.
215_SeparatingInvalidDataNames.zip, compatible with DMExpress version 7.0.0 or higher
For additional information on DMExpress functions, see DMExpress functions reference in the DMExpress Help.
Copyright © 2016 Syncsort All rights reserved.