Skip to content

Data Quality Validation

Accurids expands its existing Data Integrity with SHACL Constraints capabilities by introducing a comprehensive Data Quality feature. This enhancement provides structured validation and clear visibility into data quality issues, ensuring all entity data consistently adheres to predefined standards and constraints.

The Data Quality feature operates by validating data against SHACL rules defined within the dedicated "constraints" dataset, providing immediate feedback during entity creation, modification, and approval workflows, as well as offering comprehensive data quality reports at the dataset level for monitoring and addressing issues across multiple entities.

Data Quality Validation Process

The Data Quality validation process occurs at multiple stages and locations:

1. Global Entity View (GEV)

When creating or editing entities via the Global Entity View (GEV), Accurids performs immediate inline validation checks. Validation messages clearly identify data quality issues at three severity levels:

  • Errors (Red): Critical issues that must be resolved before submitting new entities or changes. For instance, missing mandatory fields like "Height" or invalid formats like an improperly formatted email address.
  • Warnings (Yellow): Recommended improvements or corrections that do not block submission but indicate deviations from preferred standards (e.g., missing "Gender" information).
  • Informational Messages (Grey): Suggestions or minor issues that serve to improve data completeness or accuracy but have no impact on submission.

In the GEV, validation feedback is displayed directly next to affected properties. Users can hover over properties to view detailed error, warning, or informational messages. Entity-level issues prominently indicate required corrections and offer quick actions ("Fix") to resolve identified problems.

Data Quality Validation in GEV

2. Pending Changes Workflow

All new entities or modifications to existing entities proceed through Accurids’ structured Pending Changes workflow, which incorporates comprehensive Data Quality checks.

Hovering over or expanding individual changes reveals detailed validation results, specifying the exact properties affected, corresponding messages, and rules triggering the validations.

When submitting changes (individually or multiple simultaneously) to the next approval stage, a validation summary popup is presented. This summary includes a dedicated validation column summarizing issues identified during validation:

  • Number of errors shown in red (errors must be resolved before submission)
  • Number of warnings shown in yellow
  • Number of informational messages shown in grey

Data Quality Validation in Pending Changes

3. Dataset-Level Validation Status

Accurids’ dataset overview page incorporates a new validation status indicator titled "Data Quality":

  • If data quality issues are detected in a dataset (even when pipelines execute successfully), a small red informational icon appears adjacent to the dataset's general status indicator.
  • Hovering over this icon displays a quick summary of data quality issues identified in that dataset, specifically listing the count of errors, warnings, and informational messages.

Within the individual dataset details page, under the "Status" section, the system provides a concise Data Quality summary (e.g., "Found 4 errors, 2 warnings, 2 infos"). Clicking this summary opens a dedicated detailed view:

  • This dedicated Data Quality view lists all affected entities within the dataset.
  • Users can filter and search entities based on severity or specific properties.
  • Each entity entry can be expanded to view detailed validation messages, clearly identifying problematic values, the validation rules triggered, and actionable messages for resolution.

Data Quality Validation in Dataset

Resolution and Submission Requirements

  • New Entities: Submission is blocked until all error-level data quality issues are resolved. Users must correct these mandatory problems within the GEV prior to entity submission.
  • Pending Changes: Although warnings and informational messages do not prevent progression through the Pending Changes workflow, administrators and contributors should address these messages proactively to ensure optimal data integrity.

Conclusion

The introduction of the Data Quality feature significantly enhances Accurids’ capacity to maintain data accuracy and consistency. By clearly identifying and communicating validation issues at multiple stages of data management, Accurids empowers data stewards, administrators, and contributors to proactively manage and improve overall data quality.