Introduction
Time to upgrade my ailing, 99% filled 350GB spinning disk mirror.
There were several end user requirements:
- Capacity: The final solution must provide enough storage to cover my current requirements, backing my entire music collection in FLAC and 100 blurays. Because data will grow over time (and probably exponentially) I will deploy at max 50% provisioned.
- Reliability: The final solution must provide total protection against failure. In other words data loss is not acceptable.
- Security: Data access (sharing) must be from authorized sources only. The final solution must provide an adequate deterrent against prying eyes and criminal elements.
- Quality of Life: Must ensure high quality of life. In this case I'll need a solution with no VOCs, noise or anything obnoxious. So this means SSD.
There are also a number of optional features that should be met, within reason:
- Availability
- Performance
- Usability
A review of each is below starting with data classification.
Data Classification
Explicit data classification is necessary to provide correct and adequate treatment of data whether this be security requirements, availability, backup requirements, performance requirements, etc.
Data is organized by class and priority and each priority receives the same treatment. This ensures consistent and predictable handling of data, particularly security.
Class |
Priority |
Definition |
Example |
personal |
Mission Critical |
maximum availability, must be protected at all cost |
financial and health records, project and design docs |
config |
Critical |
availability not critical, must be protected at all cost |
/etc, parts of /var |
media |
Important |
maximum availability, too large to backup as desired, streaming performance |
FLAC and Bluray rips, podcast archives |
other |
Unessential |
this data is not important, easily replaceable |
everything not captured above |
Data Management
Mission Critical - maximum availability, must be protected at all cost
Task |
Description |
Mirror |
protect against single hard failure |
filesystem |
proven code; access control |
AIDE |
protect against bit rot |
AIDE |
protect against file system bugs / site power loss / unauthorized modification |
rdiff |
protect against unintentional/unauthorized modification/deletion |
cloud |
available locally and remotely; protected against site compromise |
Critical - availability not critical, must be protected at all cost
Task |
Description |
Mirror |
protect against single hard failure |
filesystem |
proven code; access control |
AIDE |
protect against bit rot |
AIDE |
protect against file system bugs / site power loss / unauthorized modification |
rdiff |
protect against unintentional/unauthorized modification/deletion |
Important - maximum availability, too large to backup as desired, streaming performance
Task |
Description |
Mirror |
protect against single hard failure |
filesystem |
proven code; access control |
AIDE |
protect against bit rot |
AIDE |
protect against file system bugs / site power loss / unauthorized modification |
AIDE |
protect against file system bugs / site power loss / unauthorized modification |
cloud |
available locally and remotely; protected against site compromise |
Unessential - this data is not important, easily replaceable
Task |
Description |
Do nothing |
|
|
|
|
|
|
|
|
|
Usage
These locations have specific functions:
- /home/will/Documents: all projects and personal data
- /mnt/scratch: all experimental data and svn checkouts
- /mnt/scratch/Download: all web browser downloads
- /mnt/files: all media goes here (e.g. photos, videos, music, books)
- /mnt/files/backup: backup information and temporary backup data goes here
- /opt: all packages that are compiled or do not come from a debian archive go here
Capacity Review
The requirement is for storing my entire FLAC collection and 100 Bluray discs. I also have (compared to 100 Bluray) a small amount of storage. Provisioning is 50% max at time of deployment.
Item |
Capacity |
FLAC |
300GB |
Bluray |
100*35GB=3500GB |
Provisioning |
50% |
Total |
7600GB (7.6TB) |
A quick look at pricing indicates that a 100% SSD solution is prohibitive. At time of writing (2016) between FLAC and misc the storage required is 460GB. Doubling to 920GB and deferring the Bluray collection (a project I plan to start in approximately 3 years) 1TB will do. 1TB SSD are fairly affordable so that is what I will purchase.
Performance Review
Maximize performance without breaking the budget. In most cases performance corresponds with wear level mitigation.
- Disk: These are high performance SSD. They do require maintenance (see below).
- Bus: My server has 6 IO ports. These are organized as follows:
1x6G port for 500GB SSD
/ mounted here
2x3G ports for 1TB SSD mirror
/home /var mounted to mirror
mdadm --detail <md_dev>
- Partition: all partitions are aligned:
HDD phys_sector partitions aligned? raid aligned? fs aligned?
sda 512byte yes NA 4096bytes
sdb 512byte yes yes 4096bytes
sdc 512byte yes yes 4096bytes/64k stripe
- Filesystem
- filesystem type based on performance demands, file size, access type, etc.
- filesystem and mdadm are aligned to disk.
- Chunk size for media partition is 64k (instead of 4k).
- Operating System
- Parameters are mostly default; tuned where needed
- IO scheduler is Deadline
- Network
- Local gigabit fabric
- High speed local wifi for mobile devices (currently only laptop and smartphone)
- High speed WAN connection with fast uplink and downlink
- User: none
- Site: none
Longevity Review
I'd like to ensure that SSD read/write is done sparingly and intelligently. This can be managed at a number of levels:
- Disk: none, this is handled internally by the drive and a bit of a black box really
- Partition:
- partitions are write aligned to disk
- File System:
- filesystem & SSD are aligned
- mdadm aligned to fs
- partitions are mounted relatime to significantly reduce incidental metadata writes caused by the filesystem
- large chunk size used on media partitions
- OS:
- ramdisk: Debian does an excellent job of offloading many file system services to ram disks. Most system resources are mounted as tmpfs or other types of RAM disks. Swappiness is also tuned (see Performance). Use mount | column -t to see list of mounted partitions.
- IO scheduler: deadline scheduler optimizes SSD cache usage and allows diskto schedule its writes best. See Performance.
- Services:
- The meat in the pie. These are allowed to do what they need to get the job done, albeit with some tuning for egregious services
- Logging: Logging on this system is somewhat heavy. I have mitigated this by fixing system bugs and in some cases reducing log levels where it wasn't providing much useful information.
- cron: most of these are benign or already discussed elsewhere
- backup: this is a heavy writing function. Files are prepared once a month for offsite backup and both services and user data have local copies made daily.
- monitoring: These services can be quite taxing. I have reduced file system integrity checking to once a month. That's an entire disk read. The rest of the monitors aren't terribly taxing except when they log.
- Network: none
- User: none
- Site: none
Security Review
Ideally create a strong enough deterrent that drive by or opportunity crime is deterred. More than this creates an exponentially larger maintenance overhead with little added benefit. Those with the resources will gain access to your data.
Access Control
- Disk: none
- Partition: none
- File System: standard Linux permissions and access control; other users are read-only or not at all. Special group for full access.
- Server: none
- OS
- protected kernel memory space
- protected process memory space
- services are run in containers or are run by non-root limited access users/groups
- Network:
- 2x Firewalls
- MAC, IP and port restrictions/whitelisting
- Blacklisting bad actors
- User: only trusted users are given access to the network or physical location
- Site: Standard lock and key; largest risk is theft.
Confidentiality/Privacy
- Disk: none
- Partition: encrypted
- File System: access control and permissions
- Server: none
- OS: none
- Network:
- data on the wire is encrypted, locally and over WAN
- User: policy against sharing passwords, access details, etc
- Site: location is not public
Integrity
- Disk: SSD contains ECC and wear leveling to ensure integrity is maintained at a bit for bit level
- Partition: none
- File System:
- XFS and EXT4 have a proven track record of robustness and include many checks to ensure data integrity
- file scanners detect unauthorized changes to files and metadata
- Server: hardware/BIOS updates only from trusted sources (i.e. vendor)
- OS: OS and apps receive regular security service only from trusted sources; See Confidentiality, See Data Management
- Network:
- almost all connections are ethernet (less prone to tampering)
- most transactions over the wire have CRC to ensure integrity
- encryption is a natural deterrent against tampering
- User: none
- Site: none
Non-repudiation
- Disk: none
- Partition: none
- File System: metadata logs modification times
- OS: none
- Services: extensive logging of system activity
- Network: 2xFirewalls log firewall rule violations
- User: access/logins and login failures are logged
- Site: none
Availability Review
- Disk: RAID mirror
- Partition: none
- File System: mirrored
- Server: reputable, high reliability hardware
- OS: Linux is one of the most robust OS available (only Solaris and *BSD better?); hardware is monitored for failures and down time
- Network: reputable, high reliability hardware
- User: none
- Site: off-site backups
Maintenance
- Queued trim can be performed on drives that support it. It is not recommended on my Samsung SSD (and is blacklisted in the kernel.
- Manual Trim: run a weekly trim (via systemd). Do not use DISCARD.
- It is not recommended to defragment SSD. Both XFS and EXT4 support online defrag.
- EXT4: e4defrag -c -v /dev/sdxX
- XFS: xfs_fsr -v /mount/point -OR- xfs_fsr -v /path/to/specific/file
Future Plans
There are a few items I could improve upon in the future.
- Encryption: Encrypt /home and /var. Risky and requires lots of time.
- Offsite backup. New internet speed and cloud budget reqd. No way to cloud backup large av files/design folder. There is simply too much data.
- XFS and EXT4 do not have a good provision for online periodic health detection or repariing (e.g. online fsck) thus requiring a reboot
- Mobile and cloud solution. New os and sw reqd. There is a desire for local cloud hosting or more folder syncing (syncthing) available for data to device independence. Need to more fully incorporate mobile devices and networks outside my own. At the moment I manually copy some folders I want to share via syncthing. They inevitably drift and I have to sync them. If I had a large backup repository I could use syncthing (which I do not trust) to share these folders directly. Some folders I do not want to sync, I just want to cache a specific folder but stream the rest if I want to.
- New server with kvm to improve security, capacity and performance. Enhanced availibility.
Appendix: Wear Level Monitoring
smartctl -a <dev>:
START Dec 11 2016:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
5854 |
418 |
11 |
5968708308 |
/dev/sdb |
129 |
6 |
0 |
864487023 |
/dev/sdc |
129 |
6 |
1 |
2817683918 |
INITIAL CHECKPOINT Jan 11 2017:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
6602 |
420 |
12 |
7126658407 |
/dev/sdb |
877 |
8 |
0 |
1484208418 |
/dev/sdc |
877 |
8 |
1 |
3437404118 |
2ND CHECKPOINT Dec 13 2017:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
14655 |
423 |
19 |
12032358308 |
/dev/sdb |
8930 |
11 |
2 |
8058372986 |
/dev/sdc |
8930 |
11 |
3 |
10011333856 |
On Dec 14, 2017 the partition table of /dev/sda was corrected and aligned to 2048 (instead of 63).
3RD CHECKPOINT Dec 2 2018:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
22815 |
443 |
23 |
14925486911 |
/dev/sdb |
17090 |
34 |
5 |
15374381385 |
/dev/sdc |
17090 |
34 |
6 |
17327381417 |
4TH CHECKPOINT Dec 4 2019:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
31621 |
447 |
26 |
16753269226 |
/dev/sdb |
25896 |
38 |
7 |
22056318529 |
/dev/sdc |
25896 |
38 |
8 |
24112734909 |
5TH CHECKPOINT Dec 6 2020:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
40422 |
459 |
29 |
18204062081 |
/dev/sdb |
34696 |
50 |
13 |
31256954903 |
/dev/sdc |
34696 |
50 |
13 |
33352406665 |
6TH CHECKPOINT Dec 27 2021:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
49564 |
468 |
40 |
22534818155 |
/dev/sdb |
43838 |
59 |
24 |
43882936851 |
/dev/sdc |
43839 |
59 |
22 |
45978346490 |
7TH CHECKPOINT Dec 31 2024:
SSD |
Power_On_Hours |
Power_Cycle_Count |
Wear_Leveling_Count |
Total_LBAs_Written |
/dev/sda |
75831 |
576 |
69 |
40229227743 |
/dev/sdb |
70104 |
168 |
60 |
85948354903 |
/dev/sdc |
70104 |
168 |
65 |
90535063110 |