Preparing Data With a Data Flow


Before you load data into your target data source, you can enhance it for your needs. This massaging of the data prior to loading is called data preparation. You can take advantage of the data preparation options to, for example, convert numeric codes to meaningful attributes, discard erroneous data, smooth out ragged data into manageable bins, and blend descriptive data from additional data sources. 

The data preparation calculations that create new fields use some of the same tools that are available in the Synonym Editor. However, when the calculations are added in the Synonym Editor, they are added to the Master File. The data remains the same, but the calculations are performed every time the field is referenced in a request against the synonym. When you create new fields in a data flow and then run the flow, calculated values are loaded into the target data source as field values, not calculations.

Many of the data preparations examples in this topic use data from a station-based bike share system. Bikes are unlocked from one station and returned to any other station in the system. The data we will use has been extracted from the daily ridership and membership information publicly available from Citi Bike, a public-private partnership between New York City and Lyft Bikes.