Data Quality in Data Warehouse

Poor-quality data creates problems for both sides of the house—IT and business. According to a study published by The Data Warehousing Institute (TDWI) entitled taking Data Quality to the enterprise through Data Governance, some issues are primarily technical in nature, such as the extra time required for reconciling data or delays in deploying new systems. Other problems are closer to business issues, such as customer dissatisfaction, compliance problems and revenue loss. Poor-quality data can also cause problems with costs and credibility.

Data quality affects all data-related projects and refers to the state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. This means that in any kind of project related to data, one has to ensure the best possible quality by checking the right syntax of columns, detecting missing values, optimizing relationships, and correcting any other inconsistencies.

Expected Features for any Data Quality Tool are listed below.

  • Table Analysis
    • Business Rule analysis
    • Functional Dependency
    • Column set Analysis
  • Data Consistency validation
  • Columns from  different  Tables
  • Tables from the same database
  • Tables from the different databases
  • Data source as file can be compared with current database
  • Results in tabular/ Graph format
  • Powerful pattern searching capability – Regex functions
  • Data Profiling capabilities
  • Has option to store functions as library/ reusable component
  • Metadata Repository
  • Can be used as a testing tool for DB/ ETL projects
  • Quickly Browsing Data Structures
  • Getting an Overview of Database Content
  • Do Columns Contain Null or Blank Values?
  • About Redundant Values in a Column
  • Is a Max/Min Value for a Column Expected
  • What is the best selling product?
  • Using Statistics
  • Analyzing a Date Column
  • Analyzing Intervals in Numeric Data
  • Targeting Your Advertising
  • Identify and Correct Bad Data—Date, Zip Code
  • Getting a Column‘s Pattern
  • Detecting Keys in Tables
  • Using the Business Rule (Data Quality Rule)
  • Are There Duplicate Records in my Data?
  • Column Comparison Analysis
  • Discover Duplicate Tables
  • Recursive Relationships: Does Supervisor ID also Exist as Employee ID?
  • Deleting Redundant Columns
  • Executing Text Analysis
  • Creating a Correlation Analysis
  • Storing and Running Your Own Queries
  • Creating a Report (PDF/HTML/XML)
  • Can data be Corrected Using Soundex?

Talend Open Studio for Data Quality helps discover and understand the quality of data in the data warehouse and addresses all the above mentioned features. Easy to carry out accurate data profiling processes and thus reduce the time and resources needed to find data anomalies. It’s comprehensive data profiling features will help enhance and accelerate data analysis tasks.

Posted by Mallikharjuna Pagadala
Comments (0)
October 25th, 2012

Comments (0)