Backup configuration

Accurids provides the capability to create a backup of the data stored in it. A backup allows to recover the system's content to the time of the backup in case of a catastrophic failure.

(Please note: This feature is available since Accurids 1.4.0)

To perform a regular full backup, a few configuration parameters have to be set (for a detailed description, see the sections below):

The backup repository defines where the data is written to. Currently, Accurids supports Amazon S3 and a local file system.
A backup schedule defines when the backup is performed.
A recovery trigger defines under which circumstances a recovery is performed.

Backup Repository Settings

The backup repository defines where the data is written to. Currently, Accurids supports:

S3: This is the recommended setting for production systems. Besides Amazon S3 buckets, also compatible services like Minio are supported.
File system: The backup data is written to a directory in the file system. Currently, this setting is for evaluation purposes and should not be used in production systems, esp. not for cluster setups with multiple Accurids nodes.

For details, see the respective section about the backup repository below. The configuration parameter accurids.backup.type must be used to choose the repository type (s3 or fs).

Please ensure that only one Accurids installation is using a repository location to perform backups at the same time to avoid data corruption. A repository can be used by several installations at the same time for recoveries.

Amazon S3

Preliminaries: To store the backup data, an S3 bucket must be created. Additionally, the access to the bucket must be configured. The following AWS policy shows the recommended permissions where arn:aws:s3:::my-accurids-backup is the ARN of the bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Statement1",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Resource": "arn:aws:s3:::my-accurids-backup"
    },
    {
      "Sid": "Statement2",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload",
        "s3:GetObjectVersion",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::my-accurids-backup",
        "arn:aws:s3:::my-accurids-backup/*"
      ]
    }
  ]
}

Configuration settings:

Key	Description
accurids.backup.type	Must be set to `s3` to use this repository type
accurids.backup.s3.bucket	Mandatory, name of the S3 bucket (e.g. `my-accurids-backup`)
accurids.backup.s3.basePath	Optional, the backup is placed under a this path in the bucket, e.g. (`department1/backup`)
accurids.backup.s3.region	Optional, the region of the S3 bucket, e.g. `us-east-1`
accurids.backup.s3.endpoint	Optional, an alternative S3 endpoint, necessary when using alternatives S3 compatible services (e.g. https://s3-backup-server.example.com)
accurids.backup.s3.accessKeyId	Optional, the access key ID for the AWS user to user (see below)
accurids.backup.s3.secretAccessKey	Optional, the secret key for the AWS credentials
accurids.backup.s3.readonly	Optional, the backup repository can only be used for recoveries, no backups can be are written

Authentication:

If present, AWS credentials are taken from the configuration settings above (accurids.backup.s3.accessKeyId and accurids.backup.s3.secretAccessKey).

If not present, the AWS default mechanism takes place. Among others, the following attempts to find credentials are performed:

The environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are evaluated
Web Identity Token credentials are retrieved from the environment or container. E.g. this is used when a container in a Kubernetes cluster has assigned a service account which is mapped to an AWS role.

File System

Not for production systems

Currently, this repository type (file system) is for evaluation purposes only and should not be used in production systems, esp. not for cluster setups with multiple Accurids nodes.

Configuration settings:

Key	Description
accurids.backup.type	Must be set to `fs` to use this repository type
accurids.backup.fs.path	Mandatory, file system path to the backup directory (e.g. `/opt/backup/accurids`)
accurids.backup.fs.readonly	Optional, the backup repository can only be used for recoveries, no backups can be are written

Backup Schedule

A backup schedule defines when the backup is performed. The parameter accurids.backup.schedule takes a "cron" expression which consists of six fields, separated by spaces. A field contains either a number or an asterisk (*) for an arbitrary value. Multiple values can be given by separating them by commas.

Second (0-59)
Minute (0-59)
Hour (0-23)
Day of the month (1-31)
Month (1-12)
Day of the week (0-6 for Sunday(0)-Saturday(6))

Example Schedules:

Every day at 03:17:35 in the night: 35 17 3 * * *
Every Sunday at 18:00:00: 0 0 18 * * 0
Every first day of the month at 02:30:00: 0 30 2 1 * *
Every first, tenth and twentieth day of the month at 02:30:00: 0 30 2 1,10,20 * *
To use a backup, the recovery procedure must be triggered. Currently, Accurids supports "Automatic recovery":
Automatic recovery: By setting the parameter accurids.backup.autoRecovery to true, Accurids will start the recovery of the last performed backup when:
- Accurids is started the first time (i.e. when the underlying database is empty)
- A backup repository is configured and a backup has been made.

Index backup

Feature in BETA status

Please note: This feature (the index backup) is in BETA status, do not yet use for production systems

By default, Accurids does not backup the search index. In case of a disaster recovery, the search index needs to be re-created from the recovered data. Depending on the number and size of datasets hosted by Accurids, this process could take several hours.

Accurids uses Elasticsearch for the search index. To enable a backup of the index, first Elasticsearch's snapshot feature has not be configured. Currently, Accurids supports only index backup when using S3 repositories.

Elasticsearch configuration: Documentation about Elasticsearch snapshots in general can be found here and about S3 repositories here.

In particular, you need to perform the following steps on the command line in the Elasticsearch directory: * Install the S3 plugin

    bin/elasticsearch-plugin install repository-s3

Set the AWS access key (here the default client is used, you can choose any other identifier, see below)
```
elasticsearch-keystore add s3.client.default.access_key
```
Set the AWS secret key (here the default client is used, you can choose any other identifier, see below)
```
elasticsearch-keystore add s3.client.default.secret_key
```
Elasticsearch repository (optional): You can setup and configure your own Elasticsearch snapshot repository that is then used by Accurids (see the Elasticsearch documentation). To use this feature, set the parameter accurids.backup.es.reponame to the name of your snapshot repository. Alternatively, you can let Accurids set up the configuration.
Accurids backup configuration:

Key	Description
accurids.backup.es.reponame	Optional, specify to use your own Elasticsearch snapshot repository configuration
accurids.backup.es.autoconfig	Optional, set to `true` to let Accurids create an Elasticsearch snapshot configuration. Ignored if `reponame` is specified.
accurids.backup.es.settings.*	Optional, additional settings for the Elasticsearch snapshot repository configuration. These settings take precedence over any automatically generated once. Ignored if `autoconfig` not set.

If you have added the S3 Elasticsearch plugin and added the AWS access and secret key to Elasticsearch as described above, it is sufficient to set accurids.backup.es.autoconfig to true. If you have chosen a different client (other than default) you can specify this by overwriting the snapshot configuration accurids.backup.es.settings.client.

Recovery

To use a backup, the recovery procedure must be triggered.

Currently, Accurids supports "Automatic recovery":

Automatic recovery: By setting the parameter accurids.backup.autoRecovery to true, Accurids will start the recovery of the last performed backup when:
Accurids is started the first time (i.e. when the underlying database is empty)
A backup repository is configured which contains an existing backup.

An automatic recovery procedure allows to set up a working system in short time after a failure.

Example configuration

The following configuration is for a daily backup at 4:30 in the morning to an AWS S3 bucket (my-accurids-backup/acc-backup). The access and secret key are randomly chosen. If the system is started from scratch, the last performed backup will be recovered on startup. (autoRecovery: true).

accurids:
  backup:
    type: s3
    s3:
      bucket: my-accurids-backup
      basePath: acc-backup
      region: eu-central-1
      accessKeyId: AKIDFOO9RM3HPRJLBARO
      secretAccessKey: OaBCm+123456789LUabcdefghijk/BExyzrg
    schedule: 0 30 4 * * *
    autoRecovery: true