Why is the Btrfs file system as implemented by Synology so fragile?

We had a few seconds of power loss the other day. Everything in the house, including a Windows machine using NTFS, came back to life without any issues. A Synology DS720+, however, became a useless brick, claiming to have suffered unrecoverable file system damage while the underlying two hard drives and two SSDs are in perfect condition. It’s two mirrored drives using the Btrfs file system (the Synology default, though ext4 is also available as an option). Btrfs is supposedly a journaling file system, which should make this kind of corruption impossible. Yet searching the Internet reveals that Synology suicides are commonplace. Here’s one example that pins the blame on the SSDs being enabled as read/write caches (but given that the SSDs are non-volatile why isn’t the Synology software smart enough to deal with the possibility of a power outage even when read/write caching (seems to be the default) is enabled? The Synology web page on the subject says you need two SSDs (which I have) for “fault tolerance” and doesn’t mention that the entire NAS can become a brick after losing power for a few seconds).

Given that Synology has only one job, i.e., the secure storage of data, this strikes me as a spectacular failure of corporate mission.

Readers: Have you seen this kind of failure before? NTFS was introduced by Microsoft in 1993 and I’ve never seen it completely destroyed by a power interruption. Oracle, IBM’s DB2, and Microsoft SQL Server use similar journaling techniques and they never become useless after a power interruption.

Separately, what are some alternatives to Synology for a home NAS? I find their admin interface to be much more complicated than it needs to be and their defaults are also unsuitable for home use, e.g., it won’t automatically restart by default after a power failure.

Finally, if I decide that I do want to rebuild this Synology NAS, which will almost certainly involve wiping all of the data and starting over (I mostly use it as a backup for my Windows machine, so losing 100 percent of the data that I paid Synology to keep isn’t the end of the world) and want to take the InterWeb’s advice to get a UPS with a USB output to smooth out the Synology’s power availability and give it a signal via USB to shut down, what is the smallest, quietest, and cheapest UPS that will do the job?

25 thoughts on “Why is the Btrfs file system as implemented by Synology so fragile?

  1. Btrfs (pronounced “biter-fs” is notorious for its bugginess and data corruptions. If you care at all about your data, you’ll use ZFS, as implemented in some QNAP NASes.

    • It should be pronounced “bites the dust fs”! How can these open source weenies release something 15 years after NTFS that is vastly inferior to NTFS and call themselves heroes?

    • @philg: To be fair, it’s relatively rarely used in the open source world; ext and ZFS seem much more popular.

      I remember a lot of hype about it around 10 years ago. I guess “btrfs has a bright future and always will”.

    • D: Thanks for that. He says that he wants to atone for his crime, but he also writes approvingly about a canceled thought criminal: “I thank Richard Stallman for his inspiration, software, and great sacrifices”.

  2. I’d never use BTRFS. As you say it’s bugginess is legendary and stories easy to find. Their RAID6 was known shit. TBF, no-one does or should use RAID6 but still.

    I wouldn’t even use ZFS, unless you really really really need all the features. Complexity brings bugs of an impossible to recover nature.

    Just use ext4 or (maybe) xfs.

    • We trialed ZFS in a semi-production situation at a small cloud service and, of course, there was a crash which somehow ruined the boot block. Recovery that time was made by an engineer from Sun manually fixing the damaged parts.

    • I don’t recall the exact year of ZFS challenges but from what I recall I departed in 2013 and it was before that. Well, I hope they fixed it.

  3. > Readers: Have you seen this kind of failure before?
    Yes! Just on personal systems, I definitely had this happen with EXT2 (not journaled), FFS (Amiga file system, not journaled). I see ReiserFS mentioned in the thread – after losing EXT2 filesystems a couple of times, I switched to ReiserFS until EXT3 came around, have not had an issue in decades since. Have not had issues with NTFS.

    I have not used BTRFS, and can’t speak to reliability, but it can do neat things like detect bit rot, copy on write, and snapshots.

    • About 15 years ago, I used ReiserFS on an old disk that I recently wanted to revive. Naturally, it was not recognized by my current machine.

      So my advice for disks with archival status (all of them?) is to use a simple, well-known fs that will be supported for decades.

    • Yes, ReiserFS was pretty short lived – I probably just used it in the ~1999-2002 timeframe.

  4. I keep my Synology on a UPS for the annoyance of it not automatically powering back on, and giving it enough time to shut down gracefully. I also don’t have SSDs in there, all 3TB spinning rust drives. Several have failed.

    • Anon: What’s an example of a compact quiet UPS that has a USB output to tell the Synology about an impending power failure and also one that maintains good clean power through an electric company glitch?

    • I’m surprised stationary computers, NASes, and the rest, are not yet routinely built to include lithium batteries that permit the system to shut down in at least a somewhat controlled and recoverable fashion.

    • Tom: That is a great idea. Why not add $100 to the cost of a pimped-out ATX 3.0 power supply and have a battery built in to provide 60 seconds of power? The typical power glitch lasts only 5 or 10 seconds. A shutdown signal can be sent after 20 seconds without mains power. It looks like at least some feeble ATX power supplies have this feature: http://www.nipron.com/dc_ups/dc_ups/

      https://linustechtips.com/topic/988803-integrated-ups-why-it-does-not-exist/ is from 2018. So the idea is obvious and yet nobody does it. Synology should certainly have this feature because the NAS’s power demands are minimal and 100 percent predictable. It would be tougher for a gamer PC.

    • Our Synology is a Floridian so its COVID and vaccination status aren’t topics of discussion and, therefore, we don’t know! (A few neighbors have mentioned getting COVID during trips to the Lands of Science, however, e.g., up in New York.) For the same reason we also don’t know the Synology’s pronouns, but perhaps we can guess that they are Totally/Braindead.

  5. This probably explains why my auld Synology NAS has been so taciturn lately. They must be running the company in maintenance mode.

  6. cyberpower ups’s in the 1500va size are decent. the lead acid batteries will decline in 3-5 years but the ups electronics are good enough that I have one still running fine after a third battery change. the status display is useful too. might be worth paying up for sine wave output depending on what you have attached.

  7. I have a Synology DS920+ and have it hooked to a CyberPower CP1500PFCLCD. The UPS has a usb connection to the array which triggers it to go into standby when the UPS reports low battery. Works reliably whenever I’ve needed it.

  8. Good, fast, cheap, pick one? I would never use windows on my personal computer, but my 3 simple, external USB backup disks have NTFS, xfs, and ext4, for diversity. Cough. The NTFS disk once had btrfs, until, you guessed it – everything on it was lost when it was unplugged from USB improperly. Simple Rsync command bash scripts works for me. So far.

  9. The Synology DSM has a tick box under hardware settings that will power on the device when the power is restored.

    Also from what I’ve read if you care about your data you can use read cache but not write cache.

  10. I’ve used btrfs for quite some time on basically all my machines, raid1 when there is an array, and am very happy with it. It used to be buggy, but that seems solved for some time now.

    It isn’t robust to hardware *lying* to it though: if it tells an SSD to serialize some writes, and the SSD does them out of order, and you get a crash right then, that’s that. (Unless you took snapshots: those are super cheap and allow you to roll back very easily.) There are recovery tools, but they are not super mature.

    It sounds to me like the makers of that box basically set it up in an unstable way. They set a lie-to-me bit on the disk, which breaks fundamental assumptions btrfs relies on. That’s not like some accidental problem: they went out of their way to do it.

Comments are closed.