You are here

Data Storage Solution: Backup & Cloud

Data Availability
Data is usually made available by two means: reliability and ports of entry (i.e. redundancy).

Effort has been made to buy reliable hardware. Top of the line SSD, motherboard, memory, CPU and power supplies are used in the server.

Data is made redundant by providing multiple on-line copies of data. This is achieved primarily through a RAID1 mirror and protects against bad sectors and disk failures.

At some point I should add a small UPS to protect against power outages and brown outs.

There are other options available, like having a pool of servers available so servers can be brought down for maintenance or repair. You can also add redundancy to the network and the site with co-location. These options are cost prohibitive and not necessary for a home server.

Once proper security is in place availability will be increased by making certain information available on different hardware. Ideally the local server, streaming hardware, laptop and smartphone will be able to access information as long as certain requirements are met (e.g. authentication).

Data Integrity
If your data cannot be kept online then integrity can be maintained with backups. This helps protect against hardware failures, software bugs, catastrophe, etc. There are also cases when on-line data can lose integrity for example through accidental or unauthorized changes.

This is a summary of data presented in a previous article with data integrity plan appended:

Data Type Mirror File Integrity Monitor Reverse Backup Cloud Backup
Mission Critical Y Y 75day Y
Critical Y Y 75day Y
Important Y Y N N
Unessential N N N N

Mirroring is a reference to the SDD Raid array, in my case I am mirroring two 1TB drives.

File Integrity Monitor
I use AIDE to monitor all file systems for failures of integrity, hardware failures or accidental user errors. This is configured in /etc/aide.*.

Reverse Backup
The reverse backup (rdiff package) provides a secondary copy of data as well as n days of reverse differential backups of that data. This is handy to restore accidentally modified or deleted files. This is particularly useful for user data like documents or config files.

Data is reverse backed daily and retention is currently 75days. This is configured in ninjahelper.

The reverse backup should be located on a physical media that is not the same as the source.

Cloud Backup
This is the primary tool for protecting data. A cloud backup provides a number of advantages: off site, off network and off system. The data is not online but it is redundant, private and provides integrity in a completely separate environment from the home network.

Data from the system is aggregated, compressed, encrypted and uploaded to the server once a month. Data sources are dumped and aggregated by ninjahelper scripts and then uploaded with a custom ninjahelper script that I wrote. I do not backup any user (/home/user/blah) or system (/etc) config data. I also do not backup some docs located in the docs folder.

The biggest draw backup of a cloud backup solution is that the storage capacity of the cloud services currently available to me are currently too small to use backup all of my data. So I must be selective. I could pay for a write only backup service. There are a tremendous number available. I will consider this for the future.

Also WAN uplink speed cannot be overlooked. At some point uploading a huge amount of data over a household internet line becomes time prohibitive.

There are also security concerns which I attempt to mitigate with compression and encryption.

Remote Cloud Services

  • Offsite Backup - I needed a service that is already available and supported by Debian Linux 8.0.
    • I have free storage on Amazon (5GB) and Google (30GB)
    • Native support for Google (grive2)
    • I wrote a custom plugin for backupninja that will bzip2, encrypt and upload files to gdrive on a monthly basis

Local Cloud Host
There is also a desire to provide a cloud server at home to make it easy to access and share my data. It must be secure and fast and easy to use.

I considered a number of servers, but only a few fit the bill of performance, function and trust. They can be reviewed here.

A number of these services are promising, but review quite a bit of setup and maintenance. I settled on Syncthing. It is an ad-hoc peer-peer network that is secure, robust and easy to use. It's primary function is to synchronize files in folders.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <table> <tr> <td> <ul> <ol> <li> <dl> <dt> <pre> <dd> <img> <sub> <sup>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
5 + 5 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.