Rebuilding the server without VMware and with ZFS?

Thanks to the many helpful responses to my posting about VMware, we have concluded that the combination of Linux software mirroring and VMware is never going to work. There doesn’t seem to be any way to get this server to work without wiping and reinstalling. Here is my proposed solution, to be mercilessly critiqued by the experts among the readers.

  1. wipe Disk 1 and install ZFS and a fresh operating system (CentOS) on Disk 1, creating one group per development service
  2. reboot the machine from Disk 1 and install most things by copying from Disk 2 (maybe a fresh install of Oracle from the tar file
  3. wipe Disk 2 and tell ZFS that Disk 2 can now be used as a mirror for Disk 1 (I could be wrong, but I think this is something that ZFS is known to do, i.e., adding a mirrored disk dynamically)

We will not run VMware on the rebuilt machine, but rely on standard Unix user/group permissions. This eliminates a lot of moving parts (the one wizard to whom we have access has no experience with VMware, which by itself is probably a good reason to chuck it). Instead of whatever ad hoc bag-on-the-side mirroring has been kludged into Linux by volunteers we will run ZFS, a system designed from the start to include mirroring as a fundamental part of the file system.

Risks:

a) does Oracle run well over top of ZFS? (understand that the write performance of ZFS can be poor but we are barely doing any updates as this is primarily a development server; the production servers it will run are read-only)

b) can we truly add a mirror after ZFS has been up and running and in use?

What do you guys think of this idea?

Note that there are a few goals on which we cannot compromise: (a) the server must be able to survive the failure of a disk without any human intervention, (b) the server must run Oracle and AOLserver to support some legacy code, and (c) the server must support a couple of simple read-only production services.

18 thoughts on “Rebuilding the server without VMware and with ZFS?

  1. ZFS should end up relieving many of your worries.

    a) Oracle does run on top of ZFS, at least well enough to not give any real problems. I have not tested it under tough conditions however. You can carve out a piece of disk for Oracle to use as a raw partition, however I have no experience with doing that.

    b) yes, the command is “zpool attach”, like so: “zpool attach -f mypool c5d0s0 c6d0” where c5 is the existing device and c6 is the new one. Use “zpool status” to check that the mirror is up and working.

    I don’t know if you are using Linux or Solaris, I recommend doing a search for any potential issues with Linux/ZFS; Solaris/ZFS is pretty close to bulletproof.

  2. So instead of an “ad hoc bag-on-the-side mirroring [that] has been kludged into Linux by volunteers”, you are planning on using ZFS on Linux? The last word from the ZFS+FUSE project is a blog posting from January, which says the project isn’t dead. Looks pretty dead to me.

  3. Oh, and it sounds like all you really want is RAID1. Disk arrays can easily be reshaped on Linux if needed.

  4. I am assuming you are going to run Solaris 10 on this box (Oracle doesn’t work properly on OpenSolaris yet).

    Solaris 10 has an amazing feature called Zones, which offers virtual machines à la VMware for minimal overhead because they all share the same kernel instead of wastefully running multiple OS copies and attempting to juggle between them. You can run thousands of zones on a single CPU machine, when even ESX chokes at more than a handful.

    I wouldn’t run Oracle in a zone as doing so loses some of the optimizations Sun has in place for Oracle like dynamic intimate shared memory (essentially the ability to lock physical RAM for use by the database, something you obviously don’t want to allow a virtual machine to do).

    Developers, on the other hand, can each be given their own sandbox zone to play with. You can even give the developer root privileges restricted to within the zone, and use resource capping to ensure each developer gets a fair share of the machine. Hosting provider Joyent provide virtual colo services (“accelerators”) using this technology, I have an account with them and it works very well..

    You can also zones to produce a virtual network of servers to mimick a production environment. I have 7 QA zones mimicking locked-down production servers running within my DEV box (a 3.2GHz Xeon with 2GB of RAM).

    Oracle runs just fine on top of ZFS but the recommended approach is to use a ZFS filesystem for the datafiles and executables, and a UFS filesystem for redo logs and undo tablespaces. Also, do not use ZFS’ RAID-Z mode (not that you really can with only two drives). If this server is not going to be used for stress testing, I would just go with ZFS for everything, that’s how my company’s own DEV Oracle box is configured.

    You can expand a ZFS filesystem by adding devices to a pool, but only in a RAID-0 way, i.e. yu cannot turn a zpool with 2 drives in RAID-0 or RAID-1 into a RAID-Z (i.e. RAID-5). You can add a disk and turn a RAID-0 zpool into a RAID-1 zpool, however.

    Based on your requirements I would use the hardware mirroring on the machine (assuming the RAID controller is supported by Solaris). Otherwise, you can use DiskSuite to mirror the root filesystem (it is a pain to set up, however) and ZFS mirroring for everything else. OpenSolaris Indiana allows you to use ZFS for everything, but Oracle refuses to run on it (in fact, it doesn’t even have the obsolete Motif libraries Oracle uses in its crappy graphical Java installer). If the system runs on SATA drives, I don’t know that you can really have fully hands-free disk failure recovery because the boot device address will have changed, and PATA/SATA doesn’t handle this as well as SCSI does. If that is an absolute requirement, a hardware RAID controller that supports hot-plug is a must.

  5. Sorry, guys, I should have said that I was planning to run Linux (CentOS). For no particularly good reason other than that is what is currently running the various sites (and it is free). The machine has no hardware RAID card. I didn’t see the need to spend the extra $$ because there will be so few writes.

  6. It does sound as though ZFS and Linux might be a bridge too far. What would be the best way to do software disk mirroring in Linux with a file system to support Oracle? I don’t want to buy a RAID card and then figure out how to use it. We used to do software mirroring under Solaris all the time back in 1999 and it worked perfectly.

  7. I just learned that Solaris can’t boot from ZFS. I think that is the last nail in the ZFS coffin. What about Linux software RAID? Can you build up a disk full of stuff and then say “Add this 2nd disk as a mirror” without wiping out the first disk?

  8. “Can you build up a disk full of stuff and then say “Add this 2nd disk as a mirror” without wiping out the first disk?”

    If you have configured software RAID on the first disk, then yes, you can add more devices to the array later (or remove them). Converting a disk with a regular file system to a RAID array might be a bit more complicated though.

  9. Phil, OpenSolaris can boot from ZFS, for Solaris 10 you basically create a small boot partition and then all your data can go on ZFS. If you are going with Linux and software RAID read up on the “mdadm” command, which will let you set up RAID and will synchronize the drives when you add the second one.

    I have installed Oracle on OpenSolaris, only needing to do one small thing to have it install (a DTrace script which dynamically changes the value returned from the “uname” system call). It has been up and running for a couple months now with no problems.

    My recommendation would be OpenSolaris (the LiveCD you can download, called “2008.05”), which will automatically install on an all-ZFS setup; in combination with the DTrace script I mentioned.

  10. Hm, I wouldn’t use Solaris unless I had a very good reason to, and I can’t think of any – and I say that as a former (long expired) certified Solaris admin. It’s a strange platform with some wonderful features but is archaic in others, is hard to find software for and difficult to install if you can, you won’t find people who know it, lacks many basic utilities used all day every day in Linux, etc. Not really recommended unless you really, really need it, and it doesn’t sound like you do.

    ZFS is a fantastic technology which will change the world – in a little while. I don’t think it’s quite there yet. The best OS for it now and in the future is probably the BSDs, including the next version of MacOSX, but it just doesn’t have the maturity yet. Maybe this time next year. And Linux is engaged in a cat fight over whether they’ll ever change things around enough to support it, who knows where that will go.

    ZFS on FUSE is a neat solution but doesn’t sound appropriate to your needs. You’d have to maintain a system FS anyway so why bother with two? I certainly wouldn’t run any production FS through FUSE and don’t think you should either.

    Hm, still think software RAID is a bad idea, although I admit I don’t know all that much about its current state. Can Linux running software RAID really just happily failover to the other disk if the first one fails? I thought that was pretty much a hardware-only trick. A RAID card expects drives to die and to handle it gracefully – can Linux really do that? I would want to confirm that before committing, seems a little too good to be true but like I said, no experience with it really.

    But software RAID does have some other limitations which should be intuitively obvious. Is this server expected to come under any kind of heavy load? If so, remember that using software RAID will surely reduce its capacity. The reason those hardware RAID cards cost a few hundred dollars is because they have their own CPUs and RAM – it’s not just there for show, doing any kind of RAID is a fairly expensive proposition. Are you really saving money if you have to replace/supplement your server because its capacity is cut in half by use of software RAID? Depends on your projected usage, but to me software RAID is usually a false economy – if you expect any kind of serious use, buy hardware, just like you’d buy a machine with a discrete graphics card if you ever want to do anything that requires decent 3D graphics. There’s no such thing as a free lunch.

    Oh, I just read philg’s later comment and he says he doesn’t expect much write activity .. in that case, the outlined setup (softraid on Centos) sounds perfectly fine to me! Check if unattended failover actually works, though. I mean get the softraid up and running and then literally pull the power plug on one of the HDs and see how it handles it. I would have expected a KP but if it actually handles it gracefully, that’s great, and shows how much I know .. heh

  11. * No reason to use ZFS with Linux in 2008
    * I would just buy the hardware RAID card and be done with it. Much easier to administer. You’ll spend much more time “figuring out how to use” software RAID. Not that it’s hard…it’s just more software to administer!
    * Don’t give up on VMWare Server. And even if that doesn’t meet your needs, VMWare ESX really isn’t that expensive. You’ll likely get a lot of utility from the virtuals.

  12. As for ZFS, I’m in a similar situation, waiting for Sun Solaris (not OpenSolaris) to support ZFS boot, but fiddling with ZFS+Fuse/OpenSolaris in the meantime. Another suggestion that’s half way between Sun Solaris and OpenSolaris is Solaris Express. Recent releases (build 90 and on) support ZFS root (Mini Howto). Solaris Express wasn’t for me, but it’d be worth considering cause you shouldn’t have to tweak uname with dtrace like with OpenSolaris.

    Don’t know the bus (PCI-X/PCIe) this machine has, but I’d recommend picking up a cheap Addonics SATA hardware raid card instead of hassling with this software raid garbage. They are fully supported under Windows/Linux and Certified Solaris Ready (kind of rare). A 4port can be had for sub $100.

    If you’ve got space for 2 additional drives, add the raid card and two cheap 80GB SATA disks (Seagate 7200.10, $40/ea from NewEgg) and for less than $200 you could be running Solaris with ZFS. Run the boot volume (UFS) as hardware raid and let ZFS handle mirroring on the data disks (JBOD). This way you can upgrade to a pure ZFS solution down the road when you want without having to endanger your data, just blow away the UFS disks.

    So basically:
    * Forget Linux+FUSE+ZFS and Linux+SoftRaid.
    * Hardware RAID, worth $100.
    * VMWare ESX, still rocks the house if you can afford it. (free for .edu)
    * Look into Solaris Zones down the road, server partitioning/isolation made easy. It’ll also let you run CentOS/RH binaries under solaris if you install the necessary libraries.
    * Look at VirtualBox for running other VMs under solaris.

  13. Running a relative new file system not standard on the platfrom does not sound healthy to me. I’m probably too much off but why not using ext3 or something like thas with logical volumes and do a software mirroring?

    Regards
    Friedrich

  14. “Can Linux running software RAID really just happily failover to the other disk if the first one fails?”

    Yes! What’s the point of having RAID in the first place if it couldn’t do that? Where do all these people doubting Linux software RAID come from? It’s solid, fast, and used in production just about everywhere.

    “Is this server expected to come under any kind of heavy load? If so, remember that using software RAID will surely reduce its capacity.”

    Modern CPUs are certainly fast enough to handle software RAID. RAID1 especially isn’t that complicated. Unless you have a huge number of drives, hardware RAID won’t be significantly faster.

    Still, if you for some reason decide to go with hardware, pretty much the only worthwhile SATA RAID controllers come from Areca and 3Ware. I wouldn’t use anything else in production, software RAID is usually faster and more reliable than cheaper RAID controllers.

  15. Sorry, to disturb,
    If you want to stick with Linux and Software Raid, you might check out the Veritas Storedge Foundation Basic from Symantec. The Advantage is that the VxFS Filesystem is running on Linux and Solaris too. It’s closed Source, but this is only an academic Question, cause that’s Vmware too.

    Also it’s very reliable and you will have easy going with possible Migration to another System maybe from Linux to Solaris.

    But i would always stick to Hardware Raid Controller, it’s the far better solution especially if you expect High Load.
    Forget about Fuse and ZFS its a crappy solution, if you will use ZFS but won’t touch Solaris, then have a look on FreeBSD 7 its working with ZFS. That could be a serious alternative to a linux based System, cause Free BSD is very reliable and has the advantage of Jails, they are comparable to Solaris Zones.

    Think about it. 🙂

Comments are closed.