Data Quality Validation
Accurids expands its existing Data Integrity with SHACL Constraints capabilities by introducing a comprehensive Data Quality feature. This enhancement provides structured validation and clear visibility into data quality issues, ensuring all entity data consistently adheres to predefined standards and constraints.
The Data Quality feature operates by validating data against SHACL rules defined within the dedicated "constraints" dataset, providing immediate feedback during entity creation, modification, and approval workflows, as well as offering comprehensive data quality reports at the dataset level for monitoring and addressing issues across multiple entities.
Data Quality Validation Process
The Data Quality validation process occurs at multiple stages and locations:
1. Global Entity View (GEV)
When creating or editing entities via the Global Entity View (GEV), Accurids performs immediate inline validation checks. Validation messages clearly identify data quality issues at three severity levels:
- Errors (Red): Critical issues that must be resolved before submitting new entities or changes. For instance, missing mandatory fields like "Height" or invalid formats like an improperly formatted email address.
- Warnings (Yellow): Recommended improvements or corrections that do not block submission but indicate deviations from preferred standards (e.g., missing "Gender" information).
- Informational Messages (Grey): Suggestions or minor issues that serve to improve data completeness or accuracy but have no impact on submission.
In the GEV, validation feedback is displayed directly next to affected properties. Users can hover over properties to view detailed error, warning, or informational messages. Entity-level issues prominently indicate required corrections and offer quick actions ("Fix") to resolve identified problems.
2. Duplicate Check Before Entity Submission
Before a new entity is submitted or saved, or if changes to an existing entity lead to high similarity with another existing entity, Accurids performs a duplicate check. The user receives information about very similar entities already present in the same dataset.
- Behavior: When creating or changing an entity, if very similar entities are found, Accurids provides information about these existing entities before submission.
- User Actions
- Discard: The user can choose to discard the new entity creation or the proposed changes.
- Submit Anyway: The user can choose to ignore the information on similar entities and proceed with submitting the new entity or changes.
3. Pending Changes Workflow
All new entities or modifications to existing entities proceed through Accurids’ structured Pending Changes workflow, which incorporates comprehensive Data Quality checks.
Hovering over or expanding individual changes reveals detailed validation results, specifying the exact properties affected, corresponding messages, and rules triggering the validations.
When submitting changes (individually or multiple simultaneously) to the next approval stage, a validation summary popup is presented. This summary includes a dedicated validation column summarizing issues identified during validation:
- Number of errors shown in red (errors must be resolved before submission)
- Number of warnings shown in yellow
- Number of informational messages shown in grey
4. Dataset-Level Validation Status
Accurids provides comprehensive data quality reporting at the dataset level, helping users monitor and resolve issues across multiple entities with ease.
At-a-Glance Status in Dataset Overview
Each dataset includes a Data Quality status indicator in the "Status" section of the dataset detail page. If validation issues are detected, a clickable summary appears (e.g., 16 errors, 2 warnings, 10 notes) that allows users to access the full Data Quality report.
Detailed Data Quality Report
Clicking on the status link opens a dedicated Data Quality report view, allowing for both high-level and granular exploration of issues. This interface includes:
Dual View Options
-
By Rule tab:
- Lists validation rules that triggered issues, grouped by SHACL shape.
- Users can expand each rule to see the affected entities and exact validation messages.
- Rules can be sorted by severity, number of issues, or other columns.
-
By Entity tab:
- Groups issues by individual entities.
- Users can expand each entity to see all rules and messages that apply.
- Sorting and filtering is also available per entity.
Filtering and Search
- Filters are available for:
- Entity types
- Rules
- Severity (Error, Warning, Note)
- Issue type
- Specific entities
- A search field supports keyword highlighting in rule names, messages, and URIs.
- Filter “bubbles” appear above the table to keep selections visible and manageable.
Interactive Features
- Copy & Visit: Hovering over an entity URI or rule shows options to copy or navigate directly to the entity.
- Sorting & Pagination: Both tables support sorting by any column and adjusting number of rows per page.
- Issue Count Badges: Compact badges next to entities show counts of errors, warnings, and notes. Hovering shows detailed breakdowns.
- Consistency Checks: Counts shown in summaries, badges, and expanded details are fully synchronized.
Export
- Users can export the complete Data Quality report as a Turtle (.ttl) file.
- Export includes all issues, regardless of currently applied filters.
Data Quality Validation Scope
Validation results shown here are based on SHACL rules defined in the dedicated “constraints” dataset and applied during ingestion, editing, or approval.
Resolution and Submission Requirements
- New Entities: Submission is blocked until all error-level data quality issues are resolved. Users must correct these mandatory problems within the GEV prior to entity submission.
- Pending Changes: Although warnings and informational messages do not prevent progression through the Pending Changes workflow, administrators and contributors should address these messages proactively to ensure optimal data integrity.
Conclusion
The introduction of the Data Quality feature significantly enhances Accurids’ capacity to maintain data accuracy and consistency. By clearly identifying and communicating validation issues at multiple stages of data management, Accurids empowers data stewards, administrators, and contributors to proactively manage and improve overall data quality.