Data preparation base statistic

    • Select data
    • Clean data
    • Construct data
    • Integrate data
    • Format data

The records needed have to be chosen, as well as the appropriate aggregation level. Data has to be cleaned: we need to get rid of erroneous data. Necessary pre-processing, conversions and transformations need to be done. Following are the most important data manipulation methods:

    • Record selection
    • Creating new records
    • Cleansing
    • Variable selection
    • New variable creation
    • Merging
    • Aggregation
    • Sorting

After the data manipulation step, we create different statistics on the data. Based on the results, we can decide if further data modifications are needed, or data are proper for model construction.

Data verification
To make sure that the data are proper, it is worth checking the correctness of the modifications mentioned above, before handing the data table over for model construction.