Storage Tips

The Issue:
With dedicated servers, running even a gigabyte of RAM, storage reference patterns were never an issue. But with VM, there is a significant advantage to running many servers with shared memory. If one server does not need storage, then it should be available to other servers.

Unfortunately, Linux has not been taught to share. In analyzing both idle and non-idle servers, linux references most of it's storage on a frequent basis. If you define a virtual machine size of 128MB, a linux server will address all of it. This makes it's working set close to 128MB. Add 10 linux servers to 512MB box and even though idle, these servers will page.

From a dedicated server perspective when having to do I/O to potentially very slow disk drives, Linux and Unix make good use of storage to cache. This however is detrimental to a VM environment, where minidisk cache may be caching data for several guests. One shared cache (MDC) is certainly more cost effective than many (hundreds or thousands) all caching the same data internally.

So from a storage perspective, there are two issues: 1) many servers caching identical data using up storage, and 2) linux referencing every 'real page' enough that paging on VM is required.

The Solution:
Linux does NOT need all of it's storage. Many linux administrators will tell you not to use swap if you can avoid it. But what if that swap was not a slow SCSI device but a very fast Virtual Disk? This turns out to be an optimal solution. Drop the storage size of your linux server to 32MB or even 24MB if you can, and define a virtual disk as a swap disk to make up the difference.

The result is that linux greatly reduces the amount of storage referenced, your storage requirements go down, your paging goes down, your virtual disk takes up the slack, and you've taught linux to share it's storage.

Click here (VDISK case study) for the associated case study, and here (VDISK implementation) for implementation tips.