Dataset Management
Datasets Overview
To view and browse all loaded datasets, go to the Datasets view by clicking on the list icon in the left sidebar. The Datasets view provides a list of all datasets loaded at the moment, including some metadata like the status, a tag, the time of creation, or the content updated date, as well as the number of entities and statements. The Search field in the upper right corner of the Datasets overview allows you to search for a particular dataset within all loaded datasets.
For each dataset, you can continue with several actions such as viewing the Dataset details by clicking , or directly browsing the dataset by clicking the Table view icon or Tree view icon . More information on browsing datasets can be found in the Search & Browse section.
Dataset Properties
The Dataset details view gives you an overview of the following dataset properties.
Property | Description |
---|---|
Name | A descriptive title that is displayed in the Datasets view. |
Tag | A short and unique mnemonic abbreviation or code for the dataset. The tag is used as a shortcut throughout Accurids, e.g., in the search result display or search filters. |
Description | An informative description of the dataset. |
Color | The color of the badge that indicates the dataset throughout different Accurids screens. |
Load status | A dataset must be loaded before you can manage it. loading completed successfully loading in progress loading not yet started loading failed |
Index status | A dataset must be successfully indexed before you can work with its content. indexing completed successfully indexing in progress indexing not started yet indexing failed |
Analysis status | A dataset is analyzed regarding structure and quality. analysis completed successfully analysis in progress analysis not started yet analysis failed |
Created | The date and time when the dataset was initialized. |
Created by | The user who created the dataset. |
Updated | The date and time when the metadata of the dataset was last changed. |
Last updated by | The user who last changed the dataset metadata. |
Content updated | The date and time when the content of the dataset was last changed. |
Content updated by | The user who last changed the dataset content. |
Storage Size | The storage size of the dataset. |
Entities | The number of indexed entities. |
Statements | The number of triples. |
Unique predicates | The number of unique predicates. |
Mappings | The total number of mapping triples contained in the dataset. |
Hierarchy properties | The list of detected hierarchical properties such as rdfs:subClassOf or skos:broader . The value is empty if no hierarchical property has been found in the dataset. |
Cycles | The number of cycles formed by some hierarchical property. |
ID Generator | ID Generator associated with the dataset. Only available with the PID Generator Module. |
Working with Datasets
Users in Accurids can be given a specific role based on which particular actions in Accurids are restricted or allowed. Regarding dataset management, standard users can not upload or edit any dataset. Contributors can upload datasets and edit or delete those. Admins can upload, edit or delete any dataset that has been loaded, independent of who uploaded the dataset in the first place. More information on user roles can be found in the Platform Administration section.
Creating a New Dataset
When on the Datasets view, click in the upper right corner to create a new dataset. In the upcoming dialog, you must specify the name, tag, and, optionally, a description.
Multiple data sources can be used, and they can also be combined between them.
Finally, click SAVE DATASET to start the upload. You can monitor the loading and indexing progress in the Datasets view. After successful ingesting and indexing, you get notified and can start searching in the dataset.
Using RDF File
Click ADD FILES to select all files that should be loaded into the dataset. You can continue to add data from files by repeatedly clicking ADD FILES.
Using URI
Click in Import via URIs section, and paste the URL location of a dataset that should be loaded.
Using SPARQL
Select a configured SPARQL Endpoint in Import via SPARQL Endpoint section and choose all graphs you want to include in the dataset.
Configuration of a SPARQL Endpoint
To enter the configuration for a SPARQL Endpoint click and fill required parameters. The endpoint configuration is remembered per user and only visible to you.
Using Relational Database
Select a single/multiple database endpoint in Import via Database Endpoint section, and upload the transformation file (.rqg) by clicking ADD FILES. See Transformation file section for more details.
Configuration of a Database Endpoint
To enter the configuration for a Database Endpoint click and fill required parameters. The endpoint configuration is remembered per user and only visible to you.
Using CSV/JSON
Click ADD FILES to select the CSV/JSON data sources. A transformation file (.rqg) is required, you can upload it by clicking ADD FILES. See Transformation file section for more details.
Updating the Content of a Dataset
Click in the Dataset details view to update the content of a dataset.
You can choose in Data handling section whether the update will append to the existing content, or replace and clear the current content.
If you want to update the name, tag, description, or color of the dataset, you can do that directly in the Dataset details view by simply clicking the pencil icon
Controlling Visibility of Data
(This feature is available since Accurids 2.7.0)
Accurids allows you to configure which users can view the content of a dataset. A dataset can either be public (accessible to all users) or private. If private, only explicitly configured users can access it.
To facilitate access management, Accurids provides user groups that allow you to assign access permissions to a group of users instead of managing permissions for each user individually. An administrator can manage user groups. See the corresponding section in the admin guide for further details.
The owner of a dataset and an administrator can configure who is able to view a dataset. This can be done in the dataset details view in the "Visibility" box. The switch in the upper part of the box allows to toggle whether the dataset is public or private.
- If a dataset is set to public, every user of the platform can see it and search its content.
- If a dataset is set to private, only the owner, administrators and explicitly configured users can see it. The visibility box shows who can access the dataset. By pressing the plus button in the upper right corner, you can add a user or group of users to this list. The trash can icon next to a user or group allows you to revoke access.
Regardless of being private or public, only the owner and administrators can modify a dataset.
If a dataset was created with a previous version of Accurids (before Accurids 2.7.0), it is set to public by default. However, the administrator or owner can edit the settings just like any other dataset.
Updating Dataset Ownership
Accurids allows the ownership of a dataset to be transferred from one user to another. This is particularly useful when a dataset needs to be managed by a different user, such as when responsibilities change or when a project transitions to a new team member.
To update the ownership of a dataset, follow these steps:
- Open the Dataset Details View: Navigate to the Datasets overview and select the dataset whose ownership you wish to change. This will open the Dataset details view.
- Change the Owner: In the upper right corner of the Dataset details view, you will see an icon labeled Change owner. Click on this icon to initiate the ownership change process.
- Select a New Owner: A dropdown menu will appear, displaying a list of users who can be selected as the new owner. You can scroll through the list or start typing a username to refine the search results. Once you have found the appropriate user, click on their name to select them and confirm the change by clicking the Save button. The ownership of the dataset will be updated immediately.
- Verification: The new owner's name will now be displayed under the Owner field in the Dataset details view, indicating that the transfer was successful.
Note: Only the current owner or an administrator can transfer ownership of a dataset.
Download a Dataset
In the Dataset details view, click to download the dataset. Depending on the file size this may take some minutes.
Remove a Dataset
To remove a dataset, go to the Dataset details view and click the trashcan icon , then confirm the deletion. This process cannot be undone.
Search for a Dataset
The Search field in the upper right corner of the Datasets overview allows you to search for a particular dataset within all loaded datasets.
Dataset Requirements
To be successfully loaded, indexed, and displayed a dataset has to fulfill the following requirements:
- RDF Syntax: The dataset must be valid RDF syntax according to the different serializations such as Turtle, N3, or RDF/XML.
- RDF Type: All entities which should be indexed need to have a specified
rdf:type
property.
Transformation file
A transformation file is required to map the data format into RDF triples for uploading relational databases or CSV or JSON data sources. The transformation file has an extension .rqg
. The library used for the transformation is sparql-generate. A more advanced and further example can be seen on their website.
Example with Relational Database
Assume we already Configured a database endpoint called dbConn
. And in this database, we have a table called user
that we want to map into triples and upload.
The table looks like below:
id | dob | first_name | last_name | |
---|---|---|---|---|
1 | first.user@example.com | 1990-01-01 | First | User |
2 | second.user@example.com | 1991-02-03 | Second | User |
3 | third.user@example.com | 1991-05-08 | Third | User |
The transformation file (.rqg):
PREFIX accuridsIterator: <https://accurids.com/iterator/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
BASE <http://example.org/>
GENERATE {
<id/{ xsd:string(?id) }> a <User>;
<email> ?email .
}
ITERATOR accuridsIterator:SQL(<https://accurids.com/databaseEndpoint/dbConn>, "select id, email from user") AS ?id ?email
The generated triples:
http://example.org/id/1 @rdf:type http://example.org/User
http://example.org/id/1 @http://example.org/email "first.user@example.com"
http://example.org/id/2 @rdf:type http://example.org/User
http://example.org/id/2 @http://example.org/email "second.user@example.com"
http://example.org/id/3 @rdf:type http://example.org/User
http://example.org/id/3 @http://example.org/email "third.user@example.com"
The iterator for loading relational database is https://accurids.com/iterator/SQL
. For referring to the database endpoint connection that has already been created is also using URI with the prefix https://accurids.com/databaseEndpoint/
and their name (in this example, dbConn
).
CAVEAT: An URI must be a string. Hence, this xsd:string
conversion is needed as the original data type of id is an integer.
Example with CSV
Assume we have a CSV file with the name persons.csv
like this:
PersonId,Name,Phone,Email,Birthdate,Height,Weight
1,Jin Lott,374-5365,nonummy@nonsollicitudina.net,1990-10-23T09:39:36+01:00,166.58961852476,72.523064012179
2,Ulric Obrien,1-772-516-9633,non.arcu@velit.co.uk,1961-11-18T02:18:23+01:00,164.38438947455,68.907470544061
3,Travis Wilkerson,240-1629,felis@Duisac.co.uk,1956-03-05T15:57:29+01:00,163.47434097479,64.217840002146
The transformation file (.rqg):
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
BASE <http://example.org/>
GENERATE {
?personIRI a foaf:Person ;
foaf:name ?name;
foaf:mbox ?email ;
foaf:phone ?phone ;
schema:birthDate ?birthdate ;
schema:height ?height ;
schema:weight ?weight .
}
SOURCE <persons.csv> AS ?persons
ITERATOR iter:CSV(?persons) AS ?personId ?name ?phoneStr ?emailStr ?birthdateStr ?heightStr ?weightStr
WHERE {
BIND( URI( CONCAT( "http://example.com/person/", ?personId ) ) AS ?personIRI )
BIND( URI( CONCAT( "tel:", ?phoneStr ) ) AS ?phone )
BIND( URI( CONCAT( "mailto:", ?emailStr ) ) AS ?email )
BIND( xsd:dateTime( ?birthdateStr ) AS ?birthdate )
BIND( xsd:decimal( ?heightStr ) AS ?height )
BIND( xsd:decimal( ?weightStr ) AS ?weight )
}
The generated triples:
http://example.com/person/1 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/1 @http://xmlns.com/foaf/0.1/name "Jin Lott"
http://example.com/person/1 @http://xmlns.com/foaf/0.1/mbox mailto:nonummy@nonsollicitudina.net
http://example.com/person/1 @http://xmlns.com/foaf/0.1/phone tel:374-5365
http://example.com/person/1 @http://schema.org/birthDate "1990-10-23T09:39:36+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/1 @http://schema.org/height "166.58961852476"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/1 @http://schema.org/weight "72.523064012179"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/2 @http://xmlns.com/foaf/0.1/name "Ulric Obrien"
http://example.com/person/2 @http://xmlns.com/foaf/0.1/mbox mailto:non.arcu@velit.co.uk
http://example.com/person/2 @http://xmlns.com/foaf/0.1/phone tel:1-772-516-9633
http://example.com/person/2 @http://schema.org/birthDate "1961-11-18T02:18:23+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/2 @http://schema.org/height "164.38438947455"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @http://schema.org/weight "68.907470544061"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/3 @http://xmlns.com/foaf/0.1/name "Travis Wilkerson"
http://example.com/person/3 @http://xmlns.com/foaf/0.1/mbox mailto:felis@Duisac.co.uk
http://example.com/person/3 @http://xmlns.com/foaf/0.1/phone tel:240-1629
http://example.com/person/3 @http://schema.org/birthDate "1956-03-05T15:57:29+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/3 @http://schema.org/height "163.47434097479"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @http://schema.org/weight "64.217840002146"^^http://www.w3.org/2001/XMLSchema#decimal
The iterator for loading the CSV is http://w3id.org/sparql-generate/iter/CSV.
Example with JSON
Assume we have a JSON file name persons.json
like this:
[
{
"PersonId": 1,
"Name": "Jin Lott",
"Phone": "374-5365",
"Email": "nonummy@nonsollicitudina.net",
"Birthdate": "1990-10-23T09:39:36+01:00",
"Height": 166.58961852476,
"Weight": 72.523064012179
},
{
"PersonId": 2,
"Name": "Ulric Obrien",
"Phone": "1-772-516-9633",
"Email": "non.arcu@velit.co.uk",
"Birthdate": "1961-11-18T02:18:23+01:00",
"Height": 164.38438947455,
"Weight": 68.907470544061
},
{
"PersonId": 3,
"Name": "Travis Wilkerson",
"Phone": "240-1629",
"Email": "felis@Duisac.co.uk",
"Birthdate": "1956-03-05T15:57:29+01:00",
"Height": 163.47434097479,
"Weight": 64.217840002146
}
]
The transformation file (.rqg):
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
BASE <http://example.org/>
GENERATE {
?personIRI a foaf:Person ;
foaf:name ?name;
foaf:mbox ?email ;
foaf:phone ?phone ;
schema:birthDate ?birthdate ;
schema:height ?height ;
schema:weight ?weight .
}
SOURCE <persons.json> AS ?persons
ITERATOR iter:JSONSurfer(?persons, "$[*]",
"$.PersonId",
"$.Name",
"$.Phone",
"$.Email",
"$.Birthdate",
"$.Height",
"$.Weight"
) AS ?I1 ?personId ?name ?phoneStr ?emailStr ?birthdateStr ?heightStr ?weightStr
WHERE {
BIND( URI( CONCAT( "http://example.com/person/", xsd:string(?personId) ) ) AS ?personIRI )
BIND( URI( CONCAT( "tel:", ?phoneStr ) ) AS ?phone )
BIND( URI( CONCAT( "mailto:", ?emailStr ) ) AS ?email )
BIND( xsd:dateTime( ?birthdateStr ) AS ?birthdate )
BIND( xsd:decimal( ?heightStr ) AS ?height )
BIND( xsd:decimal( ?weightStr ) AS ?weight )
}
The generated triples:
http://example.com/person/1 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/1 @http://xmlns.com/foaf/0.1/name "Jin Lott"
http://example.com/person/1 @http://xmlns.com/foaf/0.1/mbox mailto:nonummy@nonsollicitudina.net
http://example.com/person/1 @http://xmlns.com/foaf/0.1/phone tel:374-5365
http://example.com/person/1 @http://schema.org/birthDate "1990-10-23T09:39:36+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/1 @http://schema.org/height "166.58961852476"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/1 @http://schema.org/weight "72.523064012179"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/2 @http://xmlns.com/foaf/0.1/name "Ulric Obrien"
http://example.com/person/2 @http://xmlns.com/foaf/0.1/mbox mailto:non.arcu@velit.co.uk
http://example.com/person/2 @http://xmlns.com/foaf/0.1/phone tel:1-772-516-9633
http://example.com/person/2 @http://schema.org/birthDate "1961-11-18T02:18:23+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/2 @http://schema.org/height "164.38438947455"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @http://schema.org/weight "68.907470544061"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/3 @http://xmlns.com/foaf/0.1/name "Travis Wilkerson"
http://example.com/person/3 @http://xmlns.com/foaf/0.1/mbox mailto:felis@Duisac.co.uk
http://example.com/person/3 @http://xmlns.com/foaf/0.1/phone tel:240-1629
http://example.com/person/3 @http://schema.org/birthDate "1956-03-05T15:57:29+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/3 @http://schema.org/height "163.47434097479"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @http://schema.org/weight "64.217840002146"^^http://www.w3.org/2001/XMLSchema#decimal
The iterator for loading the JSON is http://w3id.org/sparql-generate/iter/JSONSurfer.