Improving SSD Lifespan on Linux: What I Learned Not to Do

February 5, 2018February 2, 2021 cubethethird Guide, Linux, Opinionboot, hardware, linux, rant

While Solid State Drives (SSDs) can provide a significant performance boost over traditional Hard Disk Drives (HDDs), there is one downside which often concerns users: longevity. Although, with general day-to-day usage, the average user will not likely kill their drives any more quickly than an HDD, particularly since the latter is mechanical, advanced users are often more concerned about this, and will put in the extra effort to increase the lifespan of their SSDs. Typically, this is accomplished through various means to reduce the number of reads and writes to the drive. One method I explored turned out not to be as effective as I’d hoped, but became an interesting learning experience none the less, and may allow for improvements given the right setup.

One of the larger sources of reads and writes to any disk is due to a file system’s journal. Certain file systems, such as NTFS (Windows), Apple File System, and EXT4 (Linux) reserve a portion of a disk to store metadata for any given file. This information can include a file’s creation time, recently accessed time, etc. Such data can prove useful in situations where data recovery and integrity are important, and is thus a common feature. Other file systems, such as FAT can be found on devices where there are limitations (typically in disk space) and do not require a journal, e.g. a USB flash drive. When a journal is present however, this not only (marginally) reduces the available disk space, but may equally increase reads and writes, since with every file access, creation, and deletion, the action is logged.

There is much debate about the best solution to handling journaling with SSDs. Some argue that, so long as TRIM support is enabled correctly, journaling is fine to have. Others recommend (on Linux systems) to set the noatime flag when mounting the system, which disables logging file access time. Though this does reduce the number of writes to disk, there are reports that it may break certain applications. There are still others who recommend disable journaling entirely, though under most circumstances this is not recommended. The approach that I’ve investigated has the potential to work around all of these issues: external journaling.

Unlike most file systems, EXT4 has quite a few options in terms of how the journal is handled. Apart from enabling and disabling this feature, it equally allows using a journal on a separate partition, including ones on different drives. With the PC I’ve most recently build, I use both an SSD and an HDD, and thus theorized that I may be able to have the root directory on the SSD with an EXT4 file system, while its journal will reside on the HDD. The process to set this up is fairly simple, as seen on this guide:

#Create the journal on its own partition

mkfs.ext4 -O journal_dev /dev/journal-device

#Create EXT4 partition, assigning its journal

mkfs.ext4 -J device=/dev/journal-device /dev/ext4-device

This will create a journal on one existing device/partition, and the actual file system on another while using this journal.

The Pitfall

While investigating the means to accomplish this, I began to notice a pattern. Every post I found with regards to external journals did the opposite of my goal: a journal on an SSD with the actual file system on the HDD. To me, this seemed counter intuitive, however there was a fair bit of sense to this. A related post to the previous guide performs some benchmarks on the file system. Their goal, unlike mine, was to improve the performance of an HDD, rather than the longevity of the SSD. By using the SSD as a cache for the journal, the overall throughput of the HDD can be increased significantly. While this does not make much sense with modern hardware, if one possessed an SSD with a very small capacity (e.g. < 10GB), this could be beneficial since it can be difficult to put the entire root directory on a small device. With a larger SSD though, performance is still best when root is directly on it.

The other concern I has was the indication:

..only industrial quality SLC Solid-Stade drives are suitable for external journal…

This lead me to suspect that having an external journal on an HDD may lead to performance problems. With this, I decided to run some benchmarks. Thankfully, they provide the exact tool and command used for their own tests, so it was a simple matter to recreate them.

When running the benchmark with the basic setup of a journaled EXT4 file system directly on the SSD, the benchmark ran for a total of 19m 19.662s, which should be noted is much lower than the presented ~38 minutes best they achieved with an HDD.

When running the benchmark with an external journal on the HDD, while the main file system resides on the SSD, the total time was of 51m 17.13s, which is obviously significantly more slow (roughly 2.5 times). Needless to say, while this method does reduce reads and writes to the SSD, it arguably does not merit the significant performance hit. With all this said, I will end with a list of suggestions:

Recommendations

Although my tests did not go as planned, there are still many things one can do on Linux to help reduce the usage of an SSD, while still reaping its benefits:

Enable TRIM support
- There is often a systemd service to enable this
- May be configured differently depending on the distribution
Break the system directories into separate partitions
- /home contains user files, and should generally be kept separate regardless
- /var can be quite volatile, containing temporary packages, caches, logs, etc. and thus results in more reads/writes
- /tmp should by default be of type tmpfs (RAM disk), but should at the very least not be used on the SSD
- swap, when used, may result in large amounts of data read and written, and is typically kept on an HDD regardless
Create a “quick access” partition on the SSD, if there is enough space
- Some applications might have slow startup or runtime due to large files being leveraged (e.g. certain games on Steam). These may be kept on the SSD since they are likely used infrequently.
Do not worry about block alignment
- Modern Linux file system tools handle this correctly without manual intervention

Oddly enough, I can still recommend using an external journal, however it is only best used with the correct hardware:

Multiple drives with similar performance ratings
- e.g. two SSDs
A small cache drive with higher speeds than the main drive
- SSD journal with HDD
- Faster drive (e.g. Optane) journal with an HDD or SSD

If the journal drive is slower, there is little merit to using it separately due to reduced performance.

9 thoughts on “Improving SSD Lifespan on Linux: What I Learned Not to Do”

scheuref says:

January 27, 2019 at 12:49 pm

Dear cubethethird

Thank for your article.

You should not bother with reducing SSD usage… 🙂
If you have an SSD and put swap /var etc. (or even your ext4 journal) on your slow HDD, then what is the purpose of having a fast SSD?
You should use it and enjoy the speed without any fear about the lifespan. The endurance of SSD’s if much longer than what some people fear.

Having journals on different devices is not common, so if you move physically the SSD or the HDD to another computer in the future and you forgot about moving back the journal then say hi to problems and fsck…

Now you should check the TBW value of your SSD (the endurance).
If you get 700 TBW it means you can write 700 TB before having to replace you SSD.
You can check from time to time the “Logical Sectors Written” with smartctl to know the remaining value.
On my office PC I have 91% endurance left after 5 year of daily work.
So I will need to replace it… in 50 years!
And it is a very low-end SSD (100 GB with only 72 TBW).
Common SSD’s will have an endurance of 700 TBW with a lifespan over 500 years with 8 hours daily work.
BTW my office PC is running ubuntu and is on 24/24, with everything (including swap and journals) on SSD.

But I agree about TRIM.
TRIM is important to keep the highest number as possible of zeroed blocks.
And having zeroed blocks is crucial for write performance and also to lower write amplification, and hence increase lifespan.
Usually consumer SSD’s have a hidden over-provisioning of 7% that will be kept zeroed for that purpose.
Enterprise SSD have more, like 20%.
On a consumer SSD you could create a zeroed partition (dd if=/dev/zero) of about 12% of your disk size to get a stable write perf and save lifespan.
This “user” made over-provisioning was recommended in the past but it is definitely unnecessary on enterprise SSD’s and probably also on high-end consumer SSD’s.
If you are encrypting your SSD with luks, then it is VERY important to not encrypt zeroed blocks, else the SSD controller will be unable to know which blocks are free and it will kills write perf and lifespan.

TRIM should be done with a cronjob or with a systemd service, but never with the ext4 “discard” mount option. This option can kill performance on buggy SSD’s or even leads to data corruption.

Here some links about the topic:
https://www.wikiwand.com/en/Write_amplification
https://wiki.debian.org/SSDOptimization
http://blog.neutrino.es/2013/howto-properly-activate-trim-for-your-ssd-on-linux-fstrim-lvm-and-dmcrypt/
http://lists.openwall.net/linux-ext4/2014/01/02/7
https://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte
https://www.anandtech.com/show/8239/update-on-samsung-850-pro-endurance-vnand-die-size
http://www.anandtech.com/show/2738/8

Best Regards
Francois Scheurer

LikeLike

Reply
- cubethethird says:
  
  January 27, 2019 at 5:31 pm
  
  Thanks for the feedback. I’m fully aware that SSDs have reasonable lifespans despite their differences with mechanical drives. The intent of my article was more to document my findings when going down the rabbit hole that I found myself in, which was mostly for fun and experimentation. Personally, I prefer keeping /var on my HDD due to not only its volatile nature, but also its potential size. As for SWAP, I’ve left it on there too since I find my current available memory sufficient, and seldom surpass it.
  None the less, the information you’ve provided along with the resources is much appreciated.
  
  LikeLike
  
  Reply
lastweakness says:

December 7, 2019 at 2:55 am

Thank you for this. It’s very helpful especially since it was something I considered doing myself

LikeLike

Reply
What to do with that Old SSD | cubethethird's corner says:

May 16, 2020 at 3:07 pm

[…] thing with which I had experimented previously was to place the journal of my EXT4 file system off of my SSD and onto my HDD. The idea was that I […]

LikeLike

Reply
memfault says:

February 2, 2021 at 3:49 pm

Hi,

You should RTFM what Journal is and what is not, because you post is very wrong at core(and your attempts in general). I think you should made better research, before.

https://opensource.com/article/18/4/ext4-filesystem

So journal is not related to atime at all, because atime(if enabled) will work regardless if filesystem is journaled or not. And most of non-journaled filesystems supports atime (I guess that nearly anything that’s not read-only filesystem including FAT16, FAT32, NTFS, EXT2, but I haven’t checked that). People were disabling atime on HDDs, because it changed every read operation into additional write operation, so it was slow especially on rotational drives. Beside the noatime, there were also relatime introduced(in updated only once per day). It’s very small data so if filesystem wasn’t mostly read-only and especially tuned, it was neglible.

And about Journal in EXT4, by default it’s set as ordered, so until non-default was used, only metadata are committed to journal (so in case of system crash/forced power off, metadata[filesystem strucure] is in clean state). It’s very small(in comparison to data written, that’s metadata related to it), so it’s not smart to put it on HDD, but SSD is proper device(especially when brand new Intel Optane M10 16GB costs as low as $5 in retail)
There are also other modes of EXT4 journal, and you can read about them in linked article. But they aren’t default. As default it works as filesystem log. That’s even how XFS calls it. In case of XFS it’s size is between 32MB and 200MB, so as you see, it’s very small (because it’s only metadata [not data itself]. XFS is better filesystem anyway than EXT4.
If you want reduce write-amplification you must force enable TRIM in every block layer(especially if you use mdadm/cryptsetup, for securty it’s disabled by default in encryption layers) put SWAP on HDD(zswam/zram is a must), increase commit time(sync), use tmpfs extensively.
It’s also good idea to use COW filesystem(like BcacheFS, ZFS, BTRFS, Reiser5), because they provide full data-journaling-level security, with lower write-amplification. They usually also provide checksumming and subvolumes support. Yes, subvolumes is vey nice thing, becaus you got one big are to write. You are misleading people by telling them to split filesystem into smaller paritions(especially if TRIM is disabled), because it forces them to write same blocks, increasing write-amplification.

So BTRFS is the easiest to use, because it’s in the kernel, utilizes better differently sized drives and snapshotting is much more modern and leightweight(so take takes much less space, and can be mounted all at once at different dirs[they are simply subvolumes). So if you got one drives or multi-sized HDDs, use
BTRFS. It also got deduplication that unlike ZFS doesn’t need enormous amount of RAM. It also supports defragmentation(unlike ZFS)

ZFS on the other hand is better performant, supports encryption (not as secure as FDD [like LUKS] but still) and I prevef how it handles missing drives(BTRFS is very annoying there, switching to RO and later refusing to mount). It also supports different ways of speeding up by using SSD, like external ZIL LOG on SSD, permament L2ARC on SSD, and metadata /small blocks drives. It really shines there. It’s out-of-tree, but
There are also more modern COW Filesystems like BcacheFS or Reiser5, but they are still very WIP, so I cannot recommend them, to anything but testing.
Block-level checksumming is very nice feature in case of RAID, because if any block got corrupted(it’s tested on every read) it’s re-read from other copy and bad block is replaced by good copy.
Nearly every filesystem supports also compression, and in case of ZSTD it’s usually faster than drives (except very fast NVME drives, but it’s still usually worth it)

But Copy-On-Write filesystems got also some downsides and the biggest is that they fragment really heavily, so on write-heavy workloads you must set +C (no COW parameter, but it implies no checksumming/no compression too) like databases, VMs, etc. It’s especially visible on rotation HDDs. And in case of ZFS, there is no de-fragmentation tool (you must recreate filesystem from scratch, but hey you got backups, right? And ZFS is enterprise grade filesystem, but you must overprovison it with free space to not fragment so quickly)

Still, XFS is the fastest filesystem on the earth and if you prefer simplicity and speed, it’s a big win. The only downsides for me is that it doesn’t support checksumming(but it must also got RAID-built-in, to be really usable), compression(text files does compress really well), subvolumes(it really simplifies everything that you got one big partition, unlike smaller ones that can run out-of-space). In comparison to EXT4 it’s only downside is inability to shrink, but I don’t care(I usually growed them, but with subvolumes on BTRFS it’s no longer the case).

But because I’m data hoarder, I will be migrating soon to ZFS (main and backup). Currently got multiple 12/14TB Ironwolf drives, still hunting for some in decent price and buying other equipment. I’ve bought 4* 16GB Optanes for $15(new) that will act as ZIL LOG and L2ARC. Bought also many 2TB MLC 960 Pro for new laptop setups for $300GB each. New Samsung’s Pro drives are TLC only, and I don’t know where you found info about using SLC drives as journal drives(they were thing in enterprise many years ago, and were replaced than by battery-backes RAM solutions[like ZEUS] that were deprecated recently by Optane drives.
Other PCs will be XFS+NFS, while laptops will remain BTRFS(subvolumes really saves the space there).

My current laptop got 1TB HDD + 256GB SSD. Mine filesystem(BTRFS) is kept on SSD, while the HDD is XFS with external log on 200MB SSD. It works really well.

LikeLike

Reply
- cubethethird says:
  
  February 2, 2021 at 10:12 pm
  
  Thank you for the comment, though I feel you missed the point I was making with this post. As I wrote in the title “What I learned not to do”, this post is not a guide; it’s an explanation of what I did, and why it was not successful. I make no claims here that EXT4 is the best file system, or anything like this. The reason I used it for these tests is simply because it allows making the journal separate, and it is the one that I was both familiar using, and found sources on how to accomplish this.
  In my conclusion, I also state as my #1 recommendation to enable TRIM. There is no question about this.
  I did update the one remark in my post about atime. This is not something I spent much time on, and was not the focus of my trials. I changed the way it is phrased to not imply that it is affecting the journal.
  Regardless, as I stated in my conclusion, there are many other steps one can take to be successful in this particular endeavour, and while an external journal is a viable option, there are other more impactful solutions.
  
  LikeLike
  
  Reply
  - memfault says:
    
    February 3, 2021 at 3:07 am
    
    Anyway putting data on SSD and journal/filesystem_log in case of xfs or defaults of ext4 is wrong and totally opposite what should anybody do, because of journal is usually less tha 0.1% of data written to it. External journal/filesystem log is only useful in case of faster drive. Before the SSD become a thing much more slower (and taxed byhigh volume of data) data storage(i.e. RAID5) had external journal on dedicated raid1, because it’s small and sequential.
    Generally Journal is ONLY used when some data are written and because of very small ratio of metadata to data, there’s really no reason to move it to slower TIER.
    You can read about ZFS ZIL LOG, that people moved to external drive(usually on battery backed RAM, enterprise SSDs or now Optane is the thing). Normal NAS workloads doesn’t take advantage of it, because it’s only for sync data(and normnally it’s async), but it’s because of databases or NFS(that used by default sync transfer for everything). But there’s difference, because in this specific enterpise workloads in case of ZFS there’s much data written to it, while in case of xfs or ext4(by default) the only thing that’s written to it, it’s logged current operation(what’s currently done, to speedup recovery after power failure/system crash)
    you can also read about L2ARC cache(for normal data), that slowly filling and rarely changing, so you put much more cheaper, but bigger SSD there to cache frequently used, but small, randomly accessed data(RAID speed for big sequential files is already quick).
    
    In summarision people put data on HDD RAIDs, because they are cheaper, bigger and have nearly infinite endurance. In case of readback cache like L2ARC it can be put on relatively cheap SSDs, while write buffers/writeback caches like ZIL LOG need to be put on enterprise SSD with high endurance (in enterprise, because it’s only read in case of system crash/power failure, so in home NAS i’ts not needeed at all)
    And people put small amounts of data(if they cannot put everything on reliabled RAID1 of enteprise SSDs, because of price), that needs to be accessed randomly and quickly on some kind of SSDs, because small amount of data is stored there and (usually, except write cache/ write buffers, and because it’s small).
    It’s like taking enterprise Optane and attaching slow HDD(or even the fastest on the earth) by BCache as write buffer, expecting it to reduce wear on SSD. (it will not change it dramatically, and even less in case of Journal/Filesystem log, because by default it only LOGs operations, not cahing data to be written to main storage). So all your assumptions and thinking were flawed from the beginning and tests/experiments were useless from the beginning, because you didn’t RTFM before starting it. Maybe you like learning on errors, but in case of proper preparation to this experiments, you would learn much more.
    
    LikeLike
  - cubethethird says:
    
    February 8, 2021 at 10:05 pm
    
    I must thank you once again for your comment. The activity is always appreciated as it helps to promote my blog, and keep it active. I am however a bit confused about the purpose of your remarks. As far as I understand, we appear to agree on the overall outcomes and conclusions I have made, and your critique is specifically about the means by which I learned something. Don’t get me wrong, reading manuals and documentation are certainly good ways to educate oneself, however they are by no means the only way, nor always fun (for myself at least). You may be interested to know that there are in fact many ways by which people can learn (see here for examples: https://en.wikipedia.org/wiki/Learning_styles). I do find your argument is flawed, as I in fact “like learning on errors”, but with proper preparation on the subject of learning styles, you could have avoided this mistake from the beginning, and by embracing them, as written in your words: “you would learn much more”.
    Also, you may want to consider starting your own blog. It is clear that you have a passion for file systems, and have no issue writing abundantly on the subject.
    
    LikeLike
  - memfault says:
    
    February 9, 2021 at 3:44 pm
    
    “One of the larger sources of reads and writes to any disk is due to a file systemâs journal. Certain file systems, such as NTFS (Windows), Apple File System, and EXT4 (Linux) reserve a portion of a disk to store metadata for any given file. This information can include a fileâs creation time, recently accessed time, etc. Such data can prove useful in situations where data recovery and integrity are important, and is thus a common feature. Other file systems, such as FAT can be found on devices where there are limitations (typically in disk space) and do not require a journal, e.g. a USB flash drive. When a journal is present however, this not only (marginally) reduces the available disk space, but may equally increase reads and writes, since with every file access, creation, and deletion, the action is logged.”
    
    It sounds like metadata(including file allocation table) is part of journal, but it’s not. It’s very chaotic and hard to understand(I;m even more chaotic, and my english is crap, so it’s not good idea for me to write a blog myself 😉 First sentence is also one big mistake. It’s one of the most frequently written file, because it logs metadata. But because by default it only logs metadata changes, it’s very small amount of data. And it’s only read on unclean-shutdown when recovery/fsck.
    
    “One of the larger sources of reads and writes to any disk is due to a file systemâs journal” I guess it’s poor(and incorrect) interpretation of Wikipedia’s ext4 article: “since the journal is one of the most used files of the disk” most often used/acessed/written, its frequency unit, not capacity. And in case of ext4 (unitl you set it yo journal data), it’s by default very small (i would even call it neglible amount of data).
    
    So all your experiments to move journal to hdd to lower write amplification was flawed from definition. Setting noatime was more than enough(it’s still neglible amount of data, it’s it’s only timestamps). https://www.kernel.org/doc/html/latest/filesystems/ext4/dynamic.html#inode-timestamps So each of the timestampt fileds(inluding atime) is 32bit signed integer(+epoch fileds), and al lt the timestamp fields are merely 128bits, that’s hard to not fit into one block. Anyway it’s recommended to disable atime(or switch to relatime for broken/deprecated software, like old mail clients or locale/mlocate) mainly for performance reasons(especially on HDDs), and it set as relatime or noatime for many modern filesystems already.
    
    And while there’s nothing wrong in writing about your own mistakes, this article still has incorrect assumptions, given as facts (that misleads readers searching how to prolong SSD life). It’s the thing that makes me angry (definitely not that you made some experiments[senseless or not, we can debate about it ;-]). So please change it, or add preambule, especially because your article is positioned high in google and cited by others.
    
    cheers
    
    LikeLike