Data Profiling
Released in Jet Analytics Data Integration 6024.1
This article describes what data profiling is, why to use data profiling, and how to check the data profile of a table in Jet Analytics Data Integration.
What is data profiling
The data profile tool can be used for data profiling, which is the process of examining, analyzing, reviewing, and summarizing data sets to gain insight into the quality of data.
What can data profiling be used for?
The data profile tool can be used to show the number of null values and outliers (based on minimum, maximum, and average), which can be used to gauge data quality and guide decisions around data cleansing.
It can also be used to decide which data types make sense for a field. For example, in the case of a text field, the maximum number of characters can be used to guide the choice of data type.
Checking the Data Profile of a Table
Right-click on a table and select Data Profile.
The data profile shows information regarding each field in the table.
The data profile for the first field department_id indicates that it is an integer field with no null values, an average of 6, a minimum of 1, and a maximum of 11.
If you select the next field on the left, department_name, the data profile indicates that it is a varchar(30) field with zero null values, 11 distinct values, and a maximum length of 16 characters for a single record.
Pattern Search
The Pattern Search button can be found at the bottom of the data profile window, and can be used to explore the data and find patterns within it.
Pattern Search finds a specific number of rows matching a LIKE or NOT LIKE search phrase. A percentage sign can be used as a wildcard to search for several characters. For example, the search %ing finds all records in the department_name field ending in "ing". Further information regarding wildcard searches can be found by hovering over the information icon.
When Select top is set to unlimited, all matching records are returned. In the example above, all 4 records in the department_name field are returned.
If this property is changed from unlimited to 2, only 2 records are returned.
This functionality also works for integer fields. For example, the location_id field is an integer field, but you can still search for all location_id records that start with 2 using a LIKE 2% search.