Virtual Disk for 
Swap Case Study


The Problem: Too much paging
With one Linux server running a standard profile, our system paged so badly that performance of all users were impacted.

Given a virtual machine size, Linux will reference most of it, such that the working set is very close to the virtual machine size. This makes it very difficult to share real storage, and limits the number of linux servers you can run efficiently.

Given Linux storage requirements greater than your real storage availability, you will page.

Linux on INTEL servers does not need to 'share' storage, so this 'feature' of linux is not a problem on other dedicated servers.  But the benefits to operating many servers under VM is the ability to share storage, cycles and disk space.  I really don't think David Boyes ran 97,000 images with dedicated storage especially at 128mb per image.

The problem then is how to make linux share it's storage using standard features of linux and VM.

The solution(s): Reduce Virtual Machine Size

Minidisk cache?


Linux uses storage to cache.  So does VM.  Multiple linux servers should share disk, not cache, and use VM's ability to cache minidisks. This provides a shared cache, not umpteen private caches. Reducing the cache size and utilizing MDC instead would be a good experiment, but this was not the solution for us (yet).

Use SWAP instead of Real storage?

It looks like nobody seems to have actually published any research on real storage requirements of a linux system.

The real solution is to cut down the virtual machine size very significantly and make up the difference with swap. Linux design considers a swap device slow, and would not consider using that space unless necessary.


The choices are limited:
 - Expanded Storage, must be dedicated. Not a good solution.
 - Cached DASD, better. But only as a last resort.
 - Virtual Disk: Perfect....

Going from a 128MB virtual machine to a 32MB virtual machine with 100 mb of virtual disk worked. Linux still ran the tar successfully. Only quicker.  Instead of an 80MB working set size, the working set was closer to 20MB. And better yet, very little of the swap space (virtual disk) was actually
referenced.  So linux requirements dropped, because linux didn't see the resource as something that should be used.  This led to VM requirements being dropped by 60MB.  And we may try this again with an even smaller server....


The High Level Analysis

Base case: Using ROT: (rule of thumb) Note that the system was thrashing hard enough to impact the performance measurements.

 Linux with 128MB: 3/8/01, 15:19-16:00 (0308DAY OUT02)
    Queue and State analysis (ESAXACT Report)
        PageWait:  53%
        CPUTime:   25%
        CPUWait:   15%
        ELIGIBLE:  000%  DID NOT USE or need QUICKDSP

 Storage Analysis
    Linux001:  80MB Working Set (20,000 Pages) (ESAUSP2 Report)
    MDC: 30MB (ESAMDC Report)

 Paging Analysis
    Linux001:  20/second (ESAUSR2 Report)
    SystemBlkPgs: 6/sec (normally .1)

 System responsiveness:
    80% triv < .2, normally 98%
    User Avg resp 3 seconds, normally subsecond

 I/O analysis (DASD I/O Per second)
    Total:     .8
    Vdisk       0
    MDC        .1
    BlockI/O   .7

2nd case, using Virtual Disk for Swap (0310NITE OUT02)
Linux with 32MB, 3/10/01 07:00-08:00

 State Analysis
    PageWait:   3%
    CPUTime:   52%
    CPUWait:   21%
    Asynch I/O: 24

 Storage Analysis
    Linux001:  20MB Working Set ( 4,800 Pages)
    MDC: 70MB

 Paging Analysis
    Linux001:  2/second
    SystemBlkPgs: 1/sec (normally .1)

 System responsiveness:
    96% triv < .2, normally 98%
    User Avg resp 1 seconds, normally subsecond

 I/O analysis (DASD I/O Per second)
    Total:   1.6
    Vdisk:    .1
    MDC:      .2
    BlockI/O 1.3


3rd case, Not using Virtual Disk for Swap
Linux with 32MB, 3/13/01 10:00-11:00

 State Analysis
    PageWait:   3%
    CPUTime:   60%
    CPUWait:   20%
    Asynch I/O:18%

 Storage Analysis
    Linux001:  20MB Working Set ( 4,800 Pages)
    MDC: 70MB

 Paging Analysis
    Linux001:  2/second
    SystemBlkPgs: 1/sec (normally .1)

 System responsiveness:
    99% triv < .2, normally 98%
    User Avg resp 1.2 seconds, normally subsecond

 I/O analysis (DASD I/O Per second)
    Total:      2
    Vdisk:      0
    MDC:        .4
    BlockI/O   1.4



Reading the reports
System data:
   ESAHDR:  System configuration
   ESASSUM: Subsystem Overview
   ESAMDC:  Minidisk cache size, hit rate
   ESAUSLA: System responsiveness
   ESASTR1: Storage functional requirements
   ESABLKP: Block paging analysis

Linux Server Data
   ESAUSR2: Linux Server CPU time, working set, Page rate
   ESAWKLD: Linux wait state analysis
   ESAVDSK: Virtual disk storage, page rate
   ESAUSR3: I/O rate to MDC, VDISK
   ESAUSPx: Resource rates