You are here

Data Storage Solution: Rationale (Part 1 of 3)

Introduction
Time to upgrade my ailing, 99% filled 350GB spinning disk mirror.

There were several end user requirements:

  1. Capacity: The final solution must provide enough storage to cover my current requirements, backing my entire music collection in FLAC and 100 blurays. Because data will grow over time (and probably exponentially) I will deploy at max 50% provisioned.
  2. Reliability: The final solution must provide total protection against failure. In other words data loss is not acceptable.
  3. Security: Data access (sharing) must be from authorized sources only. The final solution must provide an adequate deterrent against prying eyes and criminal elements.
  4. Quality of Life: Must ensure high quality of life. In this case I'll need a solution with no VOCs, noise or anything obnoxious. So this means SSD.

There are also a number of optional features that should be met, within reason:

  • Availability
  • Performance
  • Usability

A review of each is below starting with data classification.

Data Classification
Explicit data classification is necessary to provide correct and adequate treatment of data whether this be security requirements, availability, backup requirements, performance requirements, etc.

Data is organized by class and priority and each priority receives the same treatment. This ensures consistent and predictable handling of data, particularly security.

Class Priority Definition Example
personal Mission Critical maximum availability, must be protected at all cost financial and health records, project and design docs
config Critical availability not critical, must be protected at all cost /etc, parts of /var
media Important maximum availability, too large to backup as desired, streaming performance FLAC and Bluray rips, podcast archives
other Unessential this data is not important, easily replaceable everything not captured above

Data Management
Mission Critical - maximum availability, must be protected at all cost

Task Description
Mirror protect against single hard failure
filesystem proven code; access control
AIDE protect against bit rot
AIDE protect against file system bugs / site power loss / unauthorized modification
rdiff protect against unintentional/unauthorized modification/deletion
cloud available locally and remotely; protected against site compromise

Critical - availability not critical, must be protected at all cost

Task Description
Mirror protect against single hard failure
filesystem proven code; access control
AIDE protect against bit rot
AIDE protect against file system bugs / site power loss / unauthorized modification
rdiff protect against unintentional/unauthorized modification/deletion

Important - maximum availability, too large to backup as desired, streaming performance

Task Description
Mirror protect against single hard failure
filesystem proven code; access control
AIDE protect against bit rot
AIDE protect against file system bugs / site power loss / unauthorized modification
AIDE protect against file system bugs / site power loss / unauthorized modification
cloud available locally and remotely; protected against site compromise

Unessential - this data is not important, easily replaceable

Task Description
Do nothing

Usage
These locations have specific functions:

  • /home/will/Documents: all projects and personal data
  • /mnt/scratch: all experimental data and svn checkouts
  • /mnt/scratch/Download: all web browser downloads
  • /mnt/files: all media goes here (e.g. photos, videos, music, books)
  • /mnt/files/backup: backup information and temporary backup data goes here
  • /opt: all packages that are compiled or do not come from a debian archive go here

Capacity Review
The requirement is for storing my entire FLAC collection and 100 Bluray discs. I also have (compared to 100 Bluray) a small amount of storage. Provisioning is 50% max at time of deployment.

Item Capacity
FLAC 300GB
Bluray 100*35GB=3500GB
Provisioning 50%
Total 7600GB (7.6TB)

A quick look at pricing indicates that a 100% SSD solution is prohibitive. At time of writing (2016) between FLAC and misc the storage required is 460GB. Doubling to 920GB and deferring the Bluray collection (a project I plan to start in approximately 3 years) 1TB will do. 1TB SSD are fairly affordable so that is what I will purchase.

Performance Review
Maximize performance without breaking the budget. In most cases performance corresponds with wear level mitigation.

  • Disk: These are high performance SSD. They do require maintenance (see below).
  • Bus: My server has 6 IO ports. These are organized as follows:
    	1x6G port for 500GB SSD
    		/ mounted here
    	2x3G ports for 1TB SSD mirror
    		/home /var mounted to mirror
    

    mdadm --detail <md_dev>

  • Partition: all partitions are aligned:
    		HDD	phys_sector	partitions aligned?	raid aligned?	fs aligned?
    		sda	512byte		yes			NA		4096bytes
    		sdb	512byte		yes			yes		4096bytes
    		sdc	512byte		yes			yes		4096bytes/64k stripe
    
  • Filesystem
    • filesystem type based on performance demands, file size, access type, etc.
    • filesystem and mdadm are aligned to disk.
    • Chunk size for media partition is 64k (instead of 4k).
  • Operating System
    • Parameters are mostly default; tuned where needed
    • IO scheduler is Deadline
  • Network
    • Local gigabit fabric
    • High speed local wifi for mobile devices (currently only laptop and smartphone)
    • High speed WAN connection with fast uplink and downlink
  • User: none
  • Site: none

Longevity Review
I'd like to ensure that SSD read/write is done sparingly and intelligently. This can be managed at a number of levels:

  • Disk: none, this is handled internally by the drive and a bit of a black box really
  • Partition:
    • partitions are write aligned to disk
  • File System:
    • filesystem & SSD are aligned
    • mdadm aligned to fs
    • partitions are mounted relatime to significantly reduce incidental metadata writes caused by the filesystem
    • large chunk size used on media partitions
  • OS:
    • ramdisk: Debian does an excellent job of offloading many file system services to ram disks. Most system resources are mounted as tmpfs or other types of RAM disks. Swappiness is also tuned (see Performance). Use mount | column -t to see list of mounted partitions.
    • IO scheduler: deadline scheduler optimizes SSD cache usage and allows diskto schedule its writes best. See Performance.
  • Services:
    • The meat in the pie. These are allowed to do what they need to get the job done, albeit with some tuning for egregious services
    • Logging: Logging on this system is somewhat heavy. I have mitigated this by fixing system bugs and in some cases reducing log levels where it wasn't providing much useful information.
    • cron: most of these are benign or already discussed elsewhere
    • backup: this is a heavy writing function. Files are prepared once a month for offsite backup and both services and user data have local copies made daily.
    • monitoring: These services can be quite taxing. I have reduced file system integrity checking to once a month. That's an entire disk read. The rest of the monitors aren't terribly taxing except when they log.
  • Network: none
  • User: none
  • Site: none

Security Review
Ideally create a strong enough deterrent that drive by or opportunity crime is deterred. More than this creates an exponentially larger maintenance overhead with little added benefit. Those with the resources will gain access to your data.

Access Control

  • Disk: none
  • Partition: none
  • File System: standard Linux permissions and access control; other users are read-only or not at all. Special group for full access.
  • Server: none
  • OS
    • protected kernel memory space
    • protected process memory space
    • services are run in containers or are run by non-root limited access users/groups
  • Network:
    • 2x Firewalls
    • MAC, IP and port restrictions/whitelisting
    • Blacklisting bad actors
  • User: only trusted users are given access to the network or physical location
  • Site: Standard lock and key; largest risk is theft.

Confidentiality/Privacy

  • Disk: none
  • Partition: encrypted
  • File System: access control and permissions
  • Server: none
  • OS: none
  • Network:
    • data on the wire is encrypted, locally and over WAN
  • User: policy against sharing passwords, access details, etc
  • Site: location is not public

Integrity

  • Disk: SSD contains ECC and wear leveling to ensure integrity is maintained at a bit for bit level
  • Partition: none
  • File System:
    • XFS and EXT4 have a proven track record of robustness and include many checks to ensure data integrity
    • file scanners detect unauthorized changes to files and metadata
  • Server: hardware/BIOS updates only from trusted sources (i.e. vendor)
  • OS: OS and apps receive regular security service only from trusted sources; See Confidentiality, See Data Management
  • Network:
    • almost all connections are ethernet (less prone to tampering)
    • most transactions over the wire have CRC to ensure integrity
    • encryption is a natural deterrent against tampering
  • User: none
  • Site: none

Non-repudiation

  • Disk: none
  • Partition: none
  • File System: metadata logs modification times
  • OS: none
  • Services: extensive logging of system activity
  • Network: 2xFirewalls log firewall rule violations
  • User: access/logins and login failures are logged
  • Site: none

Availability Review

  • Disk: RAID mirror
  • Partition: none
  • File System: mirrored
  • Server: reputable, high reliability hardware
  • OS: Linux is one of the most robust OS available (only Solaris and *BSD better?); hardware is monitored for failures and down time
  • Network: reputable, high reliability hardware
  • User: none
  • Site: off-site backups

Maintenance

  • Queued trim can be performed on drives that support it. It is not recommended on my Samsung SSD (and is blacklisted in the kernel.
  • Manual Trim: run a weekly trim (via systemd). Do not use DISCARD.
  • It is not recommended to defragment SSD. Both XFS and EXT4 support online defrag.
  • EXT4: e4defrag -c -v /dev/sdxX
  • XFS: xfs_fsr -v /mount/point -OR- xfs_fsr -v /path/to/specific/file

Future Plans
There are a few items I could improve upon in the future.

  1. Encryption: Encrypt /home and /var. Risky and requires lots of time.
  2. Offsite backup. New internet speed and cloud budget reqd. No way to cloud backup large av files/design folder. There is simply too much data.
  3. XFS and EXT4 do not have a good provision for online periodic health detection or repariing (e.g. online fsck) thus requiring a reboot
  4. Mobile and cloud solution. New os and sw reqd. There is a desire for local cloud hosting or more folder syncing (syncthing) available for data to device independence. Need to more fully incorporate mobile devices and networks outside my own. At the moment I manually copy some folders I want to share via syncthing. They inevitably drift and I have to sync them. If I had a large backup repository I could use syncthing (which I do not trust) to share these folders directly. Some folders I do not want to sync, I just want to cache a specific folder but stream the rest if I want to.
  5. New server with kvm to improve security, capacity and performance. Enhanced availibility.

Appendix: Wear Level Monitoring
smartctl -a <dev>:
START Dec 11 2016:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 5854 418 11 5968708308
/dev/sdb 129 6 0 864487023
/dev/sdc 129 6 1 2817683918

INITIAL CHECKPOINT Jan 11 2017:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 6602 420 12 7126658407
/dev/sdb 877 8 0 1484208418
/dev/sdc 877 8 1 3437404118

2ND CHECKPOINT Dec 13 2017:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 14655 423 19 12032358308
/dev/sdb 8930 11 2 8058372986
/dev/sdc 8930 11 3 10011333856

On Dec 14, 2017 the partition table of /dev/sda was corrected and aligned to 2048 (instead of 63).

3RD CHECKPOINT Dec 2 2018:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 22815 443 23 14925486911
/dev/sdb 17090 34 5 15374381385
/dev/sdc 17090 34 6 17327381417

4TH CHECKPOINT Dec 4 2019:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 31621 447 26 16753269226
/dev/sdb 25896 38 7 22056318529
/dev/sdc 25896 38 8 24112734909

5TH CHECKPOINT Dec 6 2020:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 40422 459 29 18204062081
/dev/sdb 34696 50 13 31256954903
/dev/sdc 34696 50 13 33352406665

6TH CHECKPOINT Dec 27 2021:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 49564 468 40 22534818155
/dev/sdb 43838 59 24 43882936851
/dev/sdc 43839 59 22 45978346490

7TH CHECKPOINT Dec 31 2024:

SSD Power_On_Hours Power_Cycle_Count Wear_Leveling_Count Total_LBAs_Written
/dev/sda 75831 576 69 40229227743
/dev/sdb 70104 168 60 85948354903
/dev/sdc 70104 168 65 90535063110