Scheduled Backups on Ubuntu using Duplicacy and Azure
This page is a guide on how to backup a folder on an Ubuntu machine to a storage account on Azure, using Duplicacy. The data backed up to Azure will be encrypted with an RSA key, and the config encrypted with a password. The backups will be scheduled via cronjob, and will send an email to you on every run (this requires an SMTP server, SendGrid is a decent option). This assumes you already have a machine setup.
1. Create a Storage Account
- Go to the Azure Portal and make a new storage account, doesn’t matter the name/region/resource group etc. Feel free to use Azure’s redundancy features for extra backup security (assuming you trust Azure to not go down, which is a fairly safe assumption).
- Create a container within the storage account, and form the URL
azure://STORAGEACCOUNT/CONTAINER
and save this for later. - Get one of the access keys and save it for later.
2. Setup duplicacy
- Grab the latest release of duplicacy, make it executable with
chmod +x
and then move it into/usr/local/bin
. - Go into the folder you want to backup and run
duplicacy init -e -key public.pem repository_id storage_url
.public.pem
can be any public RSA key.repository_id
is just the name of the backup repository, choose whatever you like but remember it, as you’ll need it if you ever want to pull down the backup onto another machine.storage_url
is the URL formed in step 1.2.- When you run this command, it will ask for a password because of the
-e
flag, this encrypts the config, so don’t lose this password. It will also ask for the access key to the storage account.
- Now you can run
duplicacy backup -stats
to do a backup. It will ask for the storage account access key and config password again. To prevent having to enter these every time, edit.duplicacy/preferences
in the current folder and change the"keys"
property to look like this:
"keys": {
"azure_key": "STORAGE ACCOUNT KEY HERE",
"password": "CONFIG PASSWORD HERE"
},
- Now you can run
duplicacy backup
orduplicacy restore
from this machine without needing to enter the passwords all the time (note that the passwords are being stored here in plain text, so only do this if you trust no one will compromise the machine).
3. Setup duplicacy-util
duplicacy-util is another executable that helps schedule backups.
- Similarly as before, grab the latest release of duplicacy-util, run
chmod +x
on it, and move it to/usr/local/bin
. - Make a
.duplicacy-util
folder somewhere, e.g.~/.duplicacy-util
. Go in there and make 2 files:duplicacy-util.yaml
andrepository_id.yaml
whererepository_id
is the name you used in step 2.2. This name doesn’t really matter but it helps to keep things consistent. - In
duplicacy-util.yaml
, enter the following:
notifications:
onStart: []
onSkip: ['email']
onSuccess: ['email']
onFailure: ['email']
email:
fromAddress: "Duplicacy Backup <[email protected]>"
toAddress: "Firstname Lastname <[email protected]>"
serverHostname: smtp.sendgrid.net
serverPort: 587
authUsername: apikey
authPassword: XXX
This assumes you’re using SendGrid, change the email config options as necessary.
4. In repository_id.yaml
, enter the following:
repository: /path/to/folder
storage:
- name: default
threads: 1
prune:
- storage: default
keep: "0:365 30:180 7:30 1:7"
threads: 1
check:
- storage: default
repository
is the folder you initialized duplicacy in, from step 2.2. Read the duplicacy-util docs on the different settings here. Here’s what the current keep
settings mean:
1:7 # Keep a revision per (1) day for revisions older than 7 days
7:30 # Keep a revision every 7 days for revisions older than 30 days
30:180 # Keep a revision every 30 days for revisions older than 180 days
0:360 # Keep no revisions older than 360 days
You can increase the thread count if you have them to improve performance.
5. Finally you need to edit your crontab to run duplicacy-util on some schedule. Run sudo crontab -e
to edit the root crontab and add 0 5 * * * /usr/local/bin/duplicacy-util -sd /path/to/.duplicacy-util -f repository_id -a -m -q
. This will run the job at 5 AM every day, running backup, pruning, and validation of data, with no output to logs, and will send an email.