Overview
This entry covers the technical details of the implementation.
I approached this by breaking down the stack into individual steps and then I conduct performance, security and data management reviews.
The Physical Stack
Storage Devices
smartctl -i <dev>
100% SSD solution. I provide a separate HDD for boot (1x512GB Samsung 850 EVO) and an array (2x1TB Samsung 850 EVO) for storage. This division is mostly historical but it saves me the trouble of setting up a bootable partition on the raid array.
I could not afford >2 discs or >1TB so I will have to make due with a 1TB RAID1 mirror. This is about 3x larger than the existing array so I expect this to give me a 3yr buffer until larger capacity SSD are available.
Storage Controller
lspci
This Intel bridge provides 5x3Gbps and 1x6Gbps SATA ports.
The boot drive is attached to the 6G port and the storage array (as well as a DVD drive) are attached to the 3G ports. 3G is more than sufficient for any application including streaming video and the 6G port should provide optimal boot times, desktop latency, app loading, and server response.
Server
lscpu
This is an Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz. This provides 4x 64bit cores, 1 thread per core. I've installed 4GB DDR3. I find that this is more than sufficient for any task except for processing audio and video. It does well in these applications for all but the impatient.
Network
The network is a gigabit switch and a DD-WRT firewall which also acts as an access point and a gateway. All connections on the network are ethernet except for laptops and smartphones. All servers are firewalled and the gateway is also firewalled.
Site
This is my physical residence so this is subject to power outages, home breakins, busted water pipes, things of this nature.
User
This is primarily for myself and those I choose to share my data with.
The Logical Stack
The Design
Partitioning
The boot drive contains the root partition which stores the operating system, a swap partition and a scratch partition.
The scratch partition is a working space for project experiments and in no way contains data that should be preserved.
A typical root partition holds user data in /home and /var so this will need preservation. As a result the array contains a 20GB partition for /home, /var and the remainder dedicated to data storage. For details see Data Management.
fdisk -l <dev>:
Boot Drive Partitioning (/dev/sda):
Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x000ee05f Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 41965567 41963520 20G 83 Linux /dev/sda2 41965568 50354175 8388608 4G 82 Linux swap / Solaris /dev/sda3 50354176 976773167 926418992 441.8G 83 Linux
Storage Array Partitioning (/dev/sd{b,c}):
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 23ABB100-66E6-4A0A-B0A4-E7560756A29A Device Start End Sectors Size Type /dev/sdb1 2048 39063551 39061504 18.6G Linux filesystem /dev/sdb2 39063552 78125055 39061504 18.6G Linux filesystem /dev/sdb3 78125056 1953525134 1875400079 894.3G Linux filesystem
EXT4 File System
tune2fs -l <dev> to view filesystem options and configuration
stat -f mount_point to check block size and block and inode summary
dumpe2fs /path/to/mount to check fs configuration
https://man7.org/linux/man-pages/man5/ext4.5.html
EXT4 is one of the most mature, yet state-of-the-art file systems available. It provides the most consistent and still exceptional performance over many workloads, platforms and configurations. It also reserves some portion of the drive for root user in situations where the drive is full and you are trying to recover. It also supports TRIM and is compatible with SSD. For these reasons I chose EXT4 for the root partition.
There are a few specific EXT4 optimizations:
EXT4 by default chooses a block size of 4096bytes. 4096 is also an even multiple of 512 the HDD block size so alignment is achieved.
mkfs.ext4 -m 1 -b 4096 -i 32768 /dev/sda1 tune2fs -c 100 /dev/sda1 tune2fs -i 1m /dev/sda1
XFS File System
xfs_info mount_point for filesystem configuration
https://man.archlinux.org/man/mkfs.xfs.8
For the remainder of partitions data tends to be read/write in large chunks or many small files. So I looked for a fs that is known for its performance and scalability. It must also play nice with SSD. Any optimization for large transfers that may adversely affect smaller bits of data is made up for the fact that the hardware is so fast. I looked at JFS, XFS and at F2FS. F2FS is available in Debian but uncommon on servers. They are all quite interesting, JFS is great but XFS is well known for its performance and scalability so I chose XFS for the remainder of the partitions.
There are a few specific XFS optimizations:
For the /var and /home partitions XFS chooses a default block size of 4096bytes with no striping.
For the media storage partition I wanted a larger block size to reflect the files that will be stored there (ie almost always >1MB, usually >10MB and sometimes >1GB in size). Therefore I configured the partition to use 64kB blocks and told XFS to optimize for a 2 disk RAID1 mirror.
mkfs.xfs -f -d agcount=8,sw=1,su=512k -i maxpct=1 /dev/md0 mkfs.xfs -f -d agcount=8,sw=1,su=512k -i maxpct=1 /dev/md1 mkfs.xfs -f -d agcount=8,sw=1,su=512k -i maxpct=1 /dev/md2 mkfs.xfs -f -d agcount=8 -i maxpct=1 /dev/sda3
Raid Controller
cat /proc/mdstat
mdadm --detail dev_device
mdadm --detail --scan (for creating a new /etc/mdadm/mdadm.conf)
https://wiki.archlinux.org/title/RAID
https://raid.wiki.kernel.org/index.php/RAID_setup
This implementation uses software raid. I have always avoided motherboard based / Intel RAID solutions for fear of portability issues and lack of restoration options. So I will continue to use mdadm.
Device | Options | Metadata |
/dev/md0 | default | v1.2 |
/dev/md1 | default | v1.2 |
/dev/md2 | default,64k chunk,bitmap | v1.2 |
There are a few specific mdadm optimizations:
Command Summary:
mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1 mdadm --create --verbose /dev/md1 --level=mirror --raid-devices=2 /dev/sdb2 /dev/sdc2 mdadm --create --verbose --chunk=512k /dev/md2 --level=mirror --raid-devices=2 /dev/sdb3 /dev/sdc3 mdadm --detail --scan /dev/md{0,1,2} >> /etc/mdadm/mdadm.conf cat /proc/mdstat mdstat --detail /dev/md{1,2,3}
/proc/mdstat:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdc2[1] sdb2[0] 19514368 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdc3[1] sdb3[0] 937568960 blocks super 1.2 [2/2] [UU] bitmap: 0/7 pages [0KB], 65536KB chunk md0 : active raid1 sdc1[1] sdb1[0] 19514368 blocks super 1.2 [2/2] [UU] unused devices:
Mounting Options
Because both filesystems contain in their superblocks most configuration options there is not much to fstab except telling the operating system of their existence. Perhaps more useful, here is a list of the mount points.
Mount Options Summary:
Device | Mount Point | Mount Options |
UUID | / | relatime,errors=remount-ro |
UUID | /mnt/scratch | defaults,logbsize=256k,relatime |
/dev/md0 | /var | defaults,logbsize=256k,relatime |
/dev/md1 | /home | defaults,logbsize=256k,relatime |
/dev/md2 | /mnt/files | defaults,logbsize=256k,relatime |
# /etc/fstab: static file system information. # # proc /proc proc defaults 0 0 UUID=3f227b45-acd3-478a-bccd-d700143e9951 / ext4 relatime,errors=remount-ro 0 1 UUID=d0ad06f1-7996-4f77-97a7-ad6d4d772547 /mnt/scratch xfs defaults,logbsize=256k,relatime 0 2 UUID=bc65b7d4-69e3-4ec2-8593-22fd219b133b none swap sw 0 0 UUID=b07525a3-b1c4-4b6c-84b9-92f95b41aeef /var xfs defaults,logbsize=256k,relatime 0 2 UUID=4a11ac93-87a3-45ed-815a-79a0d2369e21 /home xfs defaults,logbsize=256k,relatime 0 2 UUID=83f8576f-e261-4e0c-8645-82c2e5437bfc /mnt/files xfs defaults,logbsize=256k,relatime 0 2 /dev/sr0 /media/cdrom0 udf,iso9660 user,noauto 0 0
The relatime feature is a write optimization that eliminates a filesystem feature that is rarely (if ever) used by the OS or apps and can significantly reduce the number of small writes to a drive thereby increasing performance and lifespan of the drive.
Disk Organization Summary
Device | Start | End | Type | Mount Point | dev | Size | FS |
sda1 | 2048 | 41965567 | Linux | / | /dev/sda1 | 20.0GB | ext4 |
sda3 | 50354176 | 976773167 | Linux | /mnt/scratch | /dev/sda3 | 441.8GB | xfs |
sd{b,c}1 | 2048 | 39063551 | Linux | /var | /dev/md0 | 18.6GB | xfs |
sd{b,c}2 | 39063552 | 78125055 | Linux | /home | /dev/md1 | 18.6GB | xfs |
sd{b,c}3 | 78125056 | 1953525134 | Linux | files | /dev/md2 | 894.3GB | xfs |
IO scheduler
The deadline scheduler is known provide best performance on XFS and from the various tests I've seen online it provides a small benefit or at least does no harm on EXT4. The deadline scheduler guarantees minimum latency.
Add block/sda/queue/scheduler = deadline to sysfs Add block/sdb/queue/scheduler = deadline to sysfs Add block/sdc/queue/scheduler = deadline to sysfs
Kernel Tuning
Persistent Access
/etc/sysctl.d/* /etc/sysfs.d/*
Realtime
/proc/sys/vm/* /sys/*
Tuning Parameters
Add vm.swapiness=1 to sysctl
Operating System
The operating system is Debian "Buster" 10.x.
File system Permissions:
Apps
There are several types of apps that make use of this data:
Dolphin (and desktop apps)
Not much needs to be changed. Apps will work out of box. But tweaking dolphin can bring you to your data quicker:
Baloo (desktop search)
This is KDE's semantic search. The default settings will attempt to archive your entire disk which is incredibly wasteful both in disk size and activity but also tends to make your search results full of noise.
Unfortunately baloo uses a blacklist and I want a whitelist. The only folders I want baloo to index are:
This results in approximately 75k indexed files available for search.
Media apps
Amarok is my primary Linux desktop app. The primary media server is Plex.
System Monitoring
The goal is to enable monitoring and reporting at all levels of the stack.
Item | Monitor | Notes |
Storage Devices | smartmontools, Munin, lm-sensors | The hdd are smart enabled and I use Munin to track usage, temperature and other S.M.A.R.T. metrics. I set alarms for critical items. |
Partitions | backupninja | Tool to backup partition information nightly |
File System | Munin | Capacity and integrity tracking. |
Operating System/Server | sysstat, Munin, logcheck, lm-sensors, top, iotop | Environmental, load, performance (e.g. fragmentation) and system metrics. Logcheck monitors logs for abnormal events, RAID failures, alarms, etc. Does systemd automatically check integrity and fragmentation? |
IO scheduler | NA | Not needed |
Network | firewall | Firewall logs abnormal events and an IDS looks for attack signatures |
Site | NA | Not needed/possible |
Apps | Munin | Most services are tracked with Munin. However baloo and service pinging or network pinging are not enabled. |
User | Munin | Munin tracks load status |
Notes on upgrading the OS
Notes on partition copying
More googling revealed that rsync is a simple to use and extremely effective disc copy utility in situations where partition size or file system are different.
The command I used for copying partitions was as follows. The first performs the actual copy and the second doublechecks.
rsync -avxHAWX --info=progress2 > ~/rsync.out rsync -avxHAWX --info=progress2
Notes on replacing a partition
Partition Identification:
/sbin/blkid to show blk device UUID,labels and types /sbin/lsblk to show disks, partitions, maj:min, ro vs rw, removable vs fixed, size, type and mountpoint
Preserving a Partition:
cp -aR --preserve=all /home.old/* /home xfs_dump/xfs_restore
Notes on the boot loader
update initramfs: update-initramfs -u -k all