Vexdata.io Documentation
  • Vexdata Overview
  • Installation
    • System Requirements
    • Windows Installation
    • Linux Installation
    • Cloud Installation
      • GCP
      • Azure
      • AWS Cloud Deployment Architecture Diagram
        • Pre-requisites
        • Manual Install on ec2
        • Cloud Formation Scripts (command line)
    • Admin Post Installation Setup
    • User Post Installation Setup
    • Server Improvement (Optional)
  • Getting Started
    • Roles and Permissions
    • Managing Groups and Projects
      • Folders
      • Projects
    • Settings
      • Slack Integration
      • Integrations
      • Rules
      • Properties
      • Integrations
      • Email Template
      • Report Template
      • Create Data Connections
        • Databases/Warehouses
        • Run On
  • Creating and Executing Test Cases
    • Test Case
      • Create Test Cases
        • Define Data Source and Target
          • Configure Table/s from Database
          • Input SQL Data
          • Input Files
            • Excel/CSV
            • XML
            • JSON
            • Parquet/AVRO
          • API Input
          • Table Filter
          • Advance - Input
        • Data Compare
          • ETL Testing/Cell By Cell Compare
            • Data Compare Mapping
            • Data Compare Options
          • Schema Compare
          • Counts Compare
        • Schema Monitoring
        • Data Quality
          • SQL Rules
          • Column Level Rules
          • Duplicates
      • Executing Test Cases
      • Defining Test Case Settings
    • Test Runs
    • Variables
    • Note
    • Label
  • Reports
    • User Reports
    • Admin Report or Manager Reports
  • Current Limitations
Powered by GitBook
On this page
  • Steps to create a Data Quality Test Case
  • Step 1: Initiate a New Test Case
  • Step 2: Define the Source Data
  • Step 3: Write Data Quality Rules
  • Step 4: Configuration and Execution
  1. Creating and Executing Test Cases
  2. Test Case
  3. Create Test Cases

Data Quality

PreviousSchema MonitoringNextSQL Rules

Last updated 4 months ago

Vexdata.io enables users to create and manage data quality rules applicable to various data sources, including traditional databases, NoSQL databases like MongoDB, file-based data, and streaming data. These rules help ensure that data meets business quality standards before it is processed or analyzed.

Steps to create a Data Quality Test Case

Step 1: Initiate a New Test Case

Navigate to the "New Test Case" button in the top right corner of the Vexdata.io interface and select "Data Quality" Test Case Type to start defining a new data quality test case.

Step 2: Define the Source Data

In the screenshot above, the user is validating a CSV file (radio button selected). To validate it, the CSV file needs to cross-referenced with lookup tables in Snowflake.

Step 3: Write Data Quality Rules

There are two modes of data quality execution. Depending on the type of input source, the mode is selected automatically.

Option 1: Data Quality Rules on Database (like Oracle, Postgres, Snowflake, etc)

For data sourced from a database or Data Warehouse (DWH), apply one of the following rule types:

  1. SQL Queries:

    • Write SQL queries to express business rules.

    • Any record returned by the SQL query is considered as bad data.

    • Example: Validate that no users are registered with an invalid age.

      SELECT user_id, age FROM users WHERE age <= 0 OR age >= 150;

    If this query returns any records, these records are considered as bad records and the test case fails.

  2. Column Level Conditions:

    • Directly specify conditions for individual columns.

    • Example rule : Age in the users table should be between 1 and 149.

      • Write SQL expression age > 0 and age < 150

      If there are any records where age is below 1 and above 140, the rule will fail and those repords are reported.

In below screenshot, TOTALPRICE > 0

The records for which TOTALPRICE <= 0 will be reported as bad data.

Option 2: Data Quality Rules on Files(CSV, Excel, Json, XML) and No-SQL (Mongo)

For data from files, MongoDB, or streaming sources, rules are executed in-memory. The following four rule types are available.

  1. SQL Queries:

  • Similar to database input, write SQL-like queries for in-memory data checks.

  1. Column Level Conditions:

  • Directly specify conditions for individual columns.

  • Example rule : Age in the users table should be between 1 and 149.

    • Write SQL expression age > 0 and age < 150

    If there are any records where age is below 1 and above 140, the rule will fail and those repords are reported.

In below screenshot, TOTALPRICE > 0

The records for which TOTALPRICE <= 0 will be reported as bad data.

  1. Foreign Key Validation:

  • Validate that foreign key values exist in the referenced table.

  • Example: Ensure every listed state in the 'users' table exists in the 'states' table.

  1. Table Metadata Rules:

Set rules regarding table structure changes.

  • Select "Column Names" to ensure the column names remain unchanged.

  • Select "Data Types" to verify that the column types are not altered.

  • Select "New Columns" to trigger a rule failure when new columns are added.

  • Select "Removed Columns" to trigger a rule failure when columns are removed.

Step 4: Configuration and Execution

After defining all data quality rules:

  • Click "Next" to proceed to configuration.

  • Assign a name to the test case.

  • Optionally, schedule and run the test case to enforce data quality checks.

Specify the source data for the test case. This data can originate from single or multiple datasets and may be drawn from various databases or files. Refer to steps to .

Example: Ensure that transaction amounts are positive. Refer to for writing rules in English.

create a new source
section
Two source datasets