Data Profiling




Data Profiling:

This is a very important activity mostly performed while collecting the requirements and the system study. Data Profiling is a process that familiarizes you with the data you will be loading into the warehouse/mart. In the most basic form, data profiling process would read the source data and report on the
          Type of data – Whether data is Numeric, Text, Date etc. 
         Data statistics – Minimum and maximum values, most occurred value.        
         Data Anomalies- like Junk values, incorrect business codes, invalid characters,
                                       values outside a given range.        
Using data profiling we can validate business requirements with respect to business codes and attributes present in the sources, define exceptional cases when we get incorrect or inconsistent data.
A lot of Data Profiling tools are available in the market. Some examples are Trillium Software and IBM Web sphere Data Integrator’s Profile Stage.
In the simplest form of data profiling, we can even hand code (programming) data profile application specific to projects. Programming languages like PERL, Java, PLSQL and Shell Scripting can be very handy in creating these applications.

No comments:

Post a Comment