In the primitive old days we would get a new server with a 400 MHz CPU, 512 MB of RAM, install Unix, Oracle, and AOLserver, and be up and running after one long miserable day of system administration. Fortunately we live in the modern age. Two months ago, I bought a server with 4 GB of RAM, a processor with multiple gerbils running at the speed of light, and handed it over to a couple of young whiz kids. They laughed at my idea of simply installing Linux, Oracle, and AOLserver. “You’re going to use this box for multiple development servers,” they said. I replied that Unix was all set up for this with users, groups, and file/directory permissions. We would just create one group per development server and add people to the group. This response resulted in peals of laughter. Didn’t I know about VMware? They would create a virtual machine for each development server and entirely separate user account bases for each virtual machine. The poor little pizza box would now be burdened with running four copies of Linux, one for the underlying machine and one for each of the three development servers. This seemed like a waste of the gift that the brilliant hardware engineers had given us of 4 GB of RAM, but isn’t it the job of programmers to render worthless the accomplishments of hardware engineers?
I let them do it their way. Two months later, the box still isn’t up and running. What are the issues? We have one minor issue with time keeping. VMware supposedly lets you have the underlying box look up the time from NTP servers and set the system clock and then the virtual machines are supposed to get their time from there. That isn’t working for some reason and the virtual machines always have the wrong time. (We can laugh at Windows Vista, but I have never seen a Vista machine that was off by more than a second or two.)
The more serious issue is that the machine simply hangs up and won’t respond to keystrokes for several seconds out of every minute or two. At first I figured that the problem was virtual machines being paged out to disk so I asked the whiz kids to disable swap for each and every one of the four Linux installations. They did that and the machine is still halting temporarily. Here’s a description of the problem from one of the whiz kids (they will remain nameless so that they don’t need to be more ashamed than they already should be)…
We're running VMware Server on Linux and have noticed that the virtual machines will hang for several seconds at a time. This is creating serious performance problems and making even the most basic console interactions painful. The disk access light in the status bar of the management console is lit for the duration of the freeze. When the light goes out the VM resumes running smoothly. At first we thought this was a paging problem but upon adjusting memory allocation neither the VMware host nor any of the virtual machines report any swap use whatsoever. The problem continues to occur even with swapping disabled. Checking the vmware.log file reveals hundreds of disk timeouts. Here's a five minute span from the log file:
Jul 07 14:57:30: vmx| DISK: DISK/CDROM timeout of 13.156 seconds on ide0:0 (ok)
Jul 07 14:58:00: vmx| DISK: DISK/CDROM timeout of 7.435 seconds on ide0:0 (ok)
Jul 07 14:58:35: vmx| DISK: DISK/CDROM timeout of 9.563 seconds on ide0:0 (ok)
Jul 07 14:59:13: vmx| DISK: DISK/CDROM timeout of 9.763 seconds on ide0:0 (ok)
Jul 07 14:59:45: vmx| DISK: DISK/CDROM timeout of 5.775 seconds on ide0:0 (ok)
Jul 07 15:00:21: vmx| DISK: DISK/CDROM timeout of 8.015 seconds on ide0:0 (ok)
Jul 07 15:00:57: vmx| DISK: DISK/CDROM timeout of 9.939 seconds on ide0:0 (ok)
Jul 07 15:01:31: vmx| DISK: DISK/CDROM timeout of 10.320 seconds on ide0:0 (ok)
Jul 07 15:02:00: vmx| DISK: DISK/CDROM timeout of 3.345 seconds on ide0:0 (ok)
Jul 07 15:02:37: vmx| DISK: DISK/CDROM timeout of 6.182 seconds on ide0:0 (ok)
Any advice from a VMware hero? When is it time to wipe the machine and run it like we would have in 1978?