Skip to content

Dataset Management

Datasets Overview

To view and browse all loaded datasets, go to the Datasets view by clicking on the list icon in the left sidebar. The Datasets view provides a list of all datasets loaded at the moment, including some metadata like the status, a tag, the time of creation, or the content updated date, as well as the number of entities and statements. The Search field in the upper right corner of the Datasets overview allows you to search for a particular dataset within all loaded datasets.

For each dataset, you can continue with several actions such as viewing the Dataset details by clicking , or directly browsing the dataset by clicking the Table view icon or Tree view icon . More information on browsing datasets can be found in the Search & Browse section.

Dataset Properties

The Dataset details view gives you an overview of the following dataset properties.

Property Description
Name A descriptive title that is displayed in the Datasets view.
Tag A short and unique mnemonic abbreviation or code for the dataset. The tag is used as a shortcut throughout Accurids, e.g., in the search result display or search filters.
Description An informative description of the dataset.
Color The color of the badge that indicates the dataset throughout different Accurids screens.
Load status A dataset must be loaded before you can manage it.
loading completed successfully
loading in progress
loading not yet started
loading failed
Index status A dataset must be successfully indexed before you can work with its content.
indexing completed successfully
indexing in progress
indexing not started yet
indexing failed
Analysis status A dataset is analyzed regarding structure and quality.
analysis completed successfully
analysis in progress
analysis not started yet
analysis failed
Created The date and time when the dataset was initialized.
Created by The user who created the dataset.
Updated The date and time when the metadata of the dataset was last changed.
Last updated by The user who last changed the dataset metadata.
Content updated The date and time when the content of the dataset was last changed.
Content updated by The user who last changed the dataset content.
Storage Size The storage size of the dataset.
Entities The number of indexed entities.
Statements The number of triples.
Unique predicates The number of unique predicates.
Mappings The total number of mapping triples contained in the dataset.
Hierarchy properties The list of detected hierarchical properties such as rdfs:subClassOf or skos:broader. The value is empty if no hierarchical property has been found in the dataset.
Cycles The number of cycles formed by some hierarchical property.
ID Generator ID Generator associated with the dataset. Only available with the PID Generator Module.

Working with Datasets

Users in Accurids can be given a specific role based on which particular actions in Accurids are restricted or allowed. Regarding dataset management, standard users can not upload or edit any dataset. Contributors can upload datasets and edit or delete those. Admins can upload, edit or delete any dataset that has been loaded, independent of who uploaded the dataset in the first place. More information on user roles can be found in the Platform Administration section.

Creating a New Dataset

When on the Datasets view, click in the upper right corner to create a new dataset. In the upcoming dialog, you must specify the name, tag, and, optionally, a description.

Multiple data sources can be used, and they can also be combined between them.

Finally, click SAVE DATASET to start the upload. You can monitor the loading and indexing progress in the Datasets view. After successful ingesting and indexing, you get notified and can start searching in the dataset.

Using RDF File

Click ADD FILES to select all files that should be loaded into the dataset. You can continue to add data from files by repeatedly clicking ADD FILES.

Using URI

Click in Import via URIs section, and paste the URL location of a dataset that should be loaded.

Using SPARQL

Select a configured SPARQL Endpoint in Import via SPARQL Endpoint section and choose all graphs you want to include in the dataset.

Configuration of a SPARQL Endpoint

To enter the configuration for a SPARQL Endpoint click and fill required parameters. The endpoint configuration is remembered per user and only visible to you.

Using Relational Database

Select a single/multiple database endpoint in Import via Database Endpoint section, and upload the transformation file (.rqg) by clicking ADD FILES. See Transformation file section for more details.

Configuration of a Database Endpoint

To enter the configuration for a Database Endpoint click and fill required parameters. The endpoint configuration is remembered per user and only visible to you.

Using CSV/JSON

Click ADD FILES to select the CSV/JSON data sources. A transformation file (.rqg) is required, you can upload it by clicking ADD FILES. See Transformation file section for more details.

Updating the Content of a Dataset

Click in the Dataset details view to update the content of a dataset.

You can choose in Data handling section whether the update will append to the existing content, or replace and clear the current content.

If you want to update the name, tag, description, or color of the dataset, you can do that directly in the Dataset details view by simply clicking the pencil icon

Download a Dataset

In the Dataset details view, click to download the dataset. Depending on the file size this may take some minutes.

Remove a Dataset

To remove a dataset, go to the Dataset details view and click the trashcan icon , then confirm the deletion. This process cannot be undone.

Search for a Dataset

The Search field in the upper right corner of the Datasets overview allows you to search for a particular dataset within all loaded datasets.

Dataset Requirements

To be successfully loaded, indexed, and displayed a dataset has to fulfill the following requirements:

  • RDF Syntax: The dataset must be valid RDF syntax according to the different serializations such as Turtle, N3, or RDF/XML.
  • RDF Type: All entities which should be indexed need to have a specified rdf:type property.

Transformation file

A transformation file is required to map the data format into RDF triples for uploading relational databases or CSV or JSON data sources. The transformation file has an extension .rqg. The library used for the transformation is sparql-generate. A more advanced and further example can be seen on their website.

Example with Relational Database

Assume we already Configured a database endpoint called dbConn. And in this database, we have a table called user that we want to map into triples and upload.

The table looks like below:

id email dob first_name last_name
1 first.user@example.com 1990-01-01 First User
2 second.user@example.com 1991-02-03 Second User
3 third.user@example.com 1991-05-08 Third User

The transformation file (.rqg):

PREFIX accuridsIterator: <https://accurids.com/iterator/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

BASE <http://example.org/>

GENERATE {
<id/{ xsd:string(?id) }> a <User>;
    <email> ?email .
}

ITERATOR accuridsIterator:SQL(<https://accurids.com/databaseEndpoint/dbConn>, "select id, email from user") AS ?id ?email

The generated triples:

http://example.org/id/1 @rdf:type http://example.org/User
http://example.org/id/1 @http://example.org/email "first.user@example.com"
http://example.org/id/2 @rdf:type http://example.org/User
http://example.org/id/2 @http://example.org/email "second.user@example.com"
http://example.org/id/3 @rdf:type http://example.org/User
http://example.org/id/3 @http://example.org/email "third.user@example.com"

The iterator for loading relational database is https://accurids.com/iterator/SQL. For referring to the database endpoint connection that has already been created is also using URI with the prefix https://accurids.com/databaseEndpoint/ and their name (in this example, dbConn).

CAVEAT: An URI must be a string. Hence, this xsd:string conversion is needed as the original data type of id is an integer.

Example with CSV

Assume we have a CSV file with the name persons.csv like this:

PersonId,Name,Phone,Email,Birthdate,Height,Weight
1,Jin Lott,374-5365,nonummy@nonsollicitudina.net,1990-10-23T09:39:36+01:00,166.58961852476,72.523064012179
2,Ulric Obrien,1-772-516-9633,non.arcu@velit.co.uk,1961-11-18T02:18:23+01:00,164.38438947455,68.907470544061
3,Travis Wilkerson,240-1629,felis@Duisac.co.uk,1956-03-05T15:57:29+01:00,163.47434097479,64.217840002146

The transformation file (.rqg):

PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>

BASE <http://example.org/>

GENERATE {
 ?personIRI a foaf:Person ;
            foaf:name ?name;
            foaf:mbox ?email ;
            foaf:phone ?phone ;
            schema:birthDate ?birthdate ;
            schema:height ?height ;
            schema:weight ?weight .
}
SOURCE <persons.csv> AS ?persons
ITERATOR iter:CSV(?persons) AS ?personId ?name ?phoneStr ?emailStr ?birthdateStr ?heightStr ?weightStr
WHERE {
    BIND( URI( CONCAT( "http://example.com/person/", ?personId ) ) AS ?personIRI )
    BIND( URI( CONCAT( "tel:", ?phoneStr ) ) AS ?phone )
    BIND( URI( CONCAT( "mailto:", ?emailStr ) ) AS ?email )
    BIND( xsd:dateTime( ?birthdateStr ) AS ?birthdate )
    BIND( xsd:decimal( ?heightStr ) AS ?height )
    BIND( xsd:decimal( ?weightStr ) AS ?weight )
}

The generated triples:

http://example.com/person/1 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/1 @http://xmlns.com/foaf/0.1/name "Jin Lott"
http://example.com/person/1 @http://xmlns.com/foaf/0.1/mbox mailto:nonummy@nonsollicitudina.net
http://example.com/person/1 @http://xmlns.com/foaf/0.1/phone tel:374-5365
http://example.com/person/1 @http://schema.org/birthDate "1990-10-23T09:39:36+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/1 @http://schema.org/height "166.58961852476"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/1 @http://schema.org/weight "72.523064012179"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/2 @http://xmlns.com/foaf/0.1/name "Ulric Obrien"
http://example.com/person/2 @http://xmlns.com/foaf/0.1/mbox mailto:non.arcu@velit.co.uk
http://example.com/person/2 @http://xmlns.com/foaf/0.1/phone tel:1-772-516-9633
http://example.com/person/2 @http://schema.org/birthDate "1961-11-18T02:18:23+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/2 @http://schema.org/height "164.38438947455"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @http://schema.org/weight "68.907470544061"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/3 @http://xmlns.com/foaf/0.1/name "Travis Wilkerson"
http://example.com/person/3 @http://xmlns.com/foaf/0.1/mbox mailto:felis@Duisac.co.uk
http://example.com/person/3 @http://xmlns.com/foaf/0.1/phone tel:240-1629
http://example.com/person/3 @http://schema.org/birthDate "1956-03-05T15:57:29+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/3 @http://schema.org/height "163.47434097479"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @http://schema.org/weight "64.217840002146"^^http://www.w3.org/2001/XMLSchema#decimal

The iterator for loading the CSV is http://w3id.org/sparql-generate/iter/CSV.

Example with JSON

Assume we have a JSON file name persons.json like this:

[
  {
    "PersonId": 1,
    "Name": "Jin Lott",
    "Phone": "374-5365",
    "Email": "nonummy@nonsollicitudina.net",
    "Birthdate": "1990-10-23T09:39:36+01:00",
    "Height": 166.58961852476,
    "Weight": 72.523064012179
  },
  {
    "PersonId": 2,
    "Name": "Ulric Obrien",
    "Phone": "1-772-516-9633",
    "Email": "non.arcu@velit.co.uk",
    "Birthdate": "1961-11-18T02:18:23+01:00",
    "Height": 164.38438947455,
    "Weight": 68.907470544061
  },
  {
    "PersonId": 3,
    "Name": "Travis Wilkerson",
    "Phone": "240-1629",
    "Email": "felis@Duisac.co.uk",
    "Birthdate": "1956-03-05T15:57:29+01:00",
    "Height": 163.47434097479,
    "Weight": 64.217840002146
  }
]

The transformation file (.rqg):

PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>

BASE <http://example.org/>

GENERATE {
 ?personIRI a foaf:Person ;
            foaf:name ?name;
            foaf:mbox ?email ;
            foaf:phone ?phone ;
            schema:birthDate ?birthdate ;
            schema:height ?height ;
            schema:weight ?weight .
}
SOURCE <persons.json> AS ?persons
ITERATOR iter:JSONSurfer(?persons, "$[*]",
    "$.PersonId",
    "$.Name",
    "$.Phone",
    "$.Email",
    "$.Birthdate",
    "$.Height",
    "$.Weight"
) AS ?I1 ?personId ?name ?phoneStr ?emailStr ?birthdateStr ?heightStr ?weightStr
WHERE {
    BIND( URI( CONCAT( "http://example.com/person/", xsd:string(?personId) ) ) AS ?personIRI )
    BIND( URI( CONCAT( "tel:", ?phoneStr ) ) AS ?phone )
    BIND( URI( CONCAT( "mailto:", ?emailStr ) ) AS ?email )
    BIND( xsd:dateTime( ?birthdateStr ) AS ?birthdate )
    BIND( xsd:decimal( ?heightStr ) AS ?height )
    BIND( xsd:decimal( ?weightStr ) AS ?weight )
}

The generated triples:

http://example.com/person/1 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/1 @http://xmlns.com/foaf/0.1/name "Jin Lott"
http://example.com/person/1 @http://xmlns.com/foaf/0.1/mbox mailto:nonummy@nonsollicitudina.net
http://example.com/person/1 @http://xmlns.com/foaf/0.1/phone tel:374-5365
http://example.com/person/1 @http://schema.org/birthDate "1990-10-23T09:39:36+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/1 @http://schema.org/height "166.58961852476"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/1 @http://schema.org/weight "72.523064012179"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/2 @http://xmlns.com/foaf/0.1/name "Ulric Obrien"
http://example.com/person/2 @http://xmlns.com/foaf/0.1/mbox mailto:non.arcu@velit.co.uk
http://example.com/person/2 @http://xmlns.com/foaf/0.1/phone tel:1-772-516-9633
http://example.com/person/2 @http://schema.org/birthDate "1961-11-18T02:18:23+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/2 @http://schema.org/height "164.38438947455"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/2 @http://schema.org/weight "68.907470544061"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @rdf:type http://xmlns.com/foaf/0.1/Person
http://example.com/person/3 @http://xmlns.com/foaf/0.1/name "Travis Wilkerson"
http://example.com/person/3 @http://xmlns.com/foaf/0.1/mbox mailto:felis@Duisac.co.uk
http://example.com/person/3 @http://xmlns.com/foaf/0.1/phone tel:240-1629
http://example.com/person/3 @http://schema.org/birthDate "1956-03-05T15:57:29+01:00"^^http://www.w3.org/2001/XMLSchema#dateTime
http://example.com/person/3 @http://schema.org/height "163.47434097479"^^http://www.w3.org/2001/XMLSchema#decimal
http://example.com/person/3 @http://schema.org/weight "64.217840002146"^^http://www.w3.org/2001/XMLSchema#decimal

The iterator for loading the JSON is http://w3id.org/sparql-generate/iter/JSONSurfer.