DASD/MDC

The following ESAMON display shows dasd performance of a range of devices during experiments with minidisk caching. The advantage of using ESAMON for this experiment is that you can watch the data during the experiment and quickly evaluate performance and modify your experiment as you learn.

The problem being addressed here was high response times for the disks that could be attributed to very long connect times, which in turn created very high device busy.

In the display under response times, there are 5 components:

Response Time: This is total of the I/O components. Service Time: Total of the components, excluding the queue time delays. This is the value used for device busy. In today's technology, 3 to 5 ms is common. Pend Time: This is the protocol time between the channel and the S/390. It is normally less than .3 seconds. Disc Time: Disconnect time is the time that the storage controller spends on seek time and rotational delays. Caching of data in the controller reduces disconnect time. Normally, the first time data is accessed, disconnect time will be in the 10-20 ms range. Connect Time: This is the time the controller is connected to the channel to transmit data. Normally for a 4K block, this should be less than 2 ms on almost all technologies today.

Experiments:

Step 1: Boot a linux server with default minidisk cache (MDC) enabled. This ended at 11:31. Note device busy of 90% and more? and the service times of 24-26 ms? These numbers are above normal expectations. Step 2: Disable MDC and rerun. Note disconnect, connect times and device busy are all much lower. And the I/O rate (SSCH, or Start Subchannels per second) is much higher. Step 3: Re-enable MDC and rerun. Note the disconnect time is still lower, but connect time and device busy are high again. And the I/O rate is lower. Step 4: Boot an alternate linux with MDC disabled. Disconnect times were high, but the connect time was very low.

 
Screen: ESADSD2  xxxxxxxxxxxxxxxxxx             ESAMON V2.2  05/04 11:14-13:10
1 of 3  DASD Performance Analysis - Part 1      DEVICE 1007-100c     9672 xxxxx
 
 
 
          Dev        Device %Dev  <-----Response times (ms)--->
Time      No. Serial Type   Busy   avg  peak  Resp  Serv  Pend  Disc  Conn
-------- *--- ------ ------ ---- *---- ----- ----- ----- ----- ----- -----
11:15:23 1007 LIN501 3390-3 95.6  36.2  36.2  26.4  26.4   0.2  15.4  10.8
         1008 LIN502 3390-3 21.1   7.5   7.5  28.2  28.2   0.2  17.0  11.1
         1009 LIN503 3390-3 20.5   8.3   8.3  24.6  24.6   0.2  13.8  10.6
11:16:23 1007 LIN501 3390-3 80.8  31.0  31.0  26.1  26.1   0.2  15.1  10.8
         1008 LIN502 3390-3 98.7  36.1  36.1  27.3  27.3   0.2  16.1  11.0
         1009 LIN503 3390-3 45.5  18.9  18.9  24.1  24.1   0.2  13.4  10.5
11:17:23 1008 LIN502 3390-3 96.8  36.3  36.3  26.7  26.7   0.2  15.6  10.9
11:18:23 1008 LIN502 3390-3 99.2  37.1  37.1  26.8  26.8   0.2  15.7  10.8
11:19:23 1008 LIN502 3390-3 98.5  37.5  37.5  26.2  26.2   0.2  15.3  10.7
11:20:23 1007 LIN501 3390-3 28.8  13.4  13.4  22.7  21.5   0.2  12.1   9.1
         1008 LIN502 3390-3 41.7  17.0  17.0  24.5  24.5   0.2  14.0  10.3
11:30:23 1007 LIN501 3390-3 26.7  10.7  10.7  24.9  24.9   0.3  14.3  10.3
11:31:23 1007 LIN501 3390-3  8.0   5.2   5.2  15.3  15.3   0.2   3.5  11.5
         1008 LIN502 3390-3 10.7  20.6  20.6   5.2   5.2   0.2   1.3   3.6
11:50:23 1007 LIN501 3390-3  2.2   6.1   6.1   3.7   3.7   0.2   0.2   3.3
11:52:23 1007 LIN501 3390-3  3.2   9.0   9.0   3.5   3.5   0.2   0.4   2.9
11:58:23 1007 LIN501 3390-3  5.3  16.7  16.7   3.2   3.2   0.2   0.1   2.9
12:00:23 1007 LIN501 3390-3 35.1 114.8 114.8   3.1   3.1   0.2   0.1   2.7
         1008 LIN502 3390-3 21.6  64.8  64.8   3.3   3.3   0.2   0.3   2.8
         1009 LIN503 3390-3 22.4  37.7  37.7   5.9   5.9   0.2   0.1   5.6
12:01:23 1007 LIN501 3390-3  4.5  13.0  13.0   3.5   3.5   0.2   0.7   2.6
         1008 LIN502 3390-3 35.6 160.0 160.0   2.2   2.2   0.2   0.1   1.9
12:06:23 1007 LIN501 3390-3  9.4  10.9  10.9   8.6   8.6   0.2   0.3   8.2
12:41:23 1007 LIN501 3390-3 24.3  20.3  20.3  12.0  12.0   0.2   1.4  10.4
12:42:23 1007 LIN501 3390-3 61.0  47.1  47.1  13.0  13.0   0.2   1.8  10.9
         1008 LIN502 3390-3 81.8  65.2  65.2  12.5  12.5   0.2   1.3  11.0
         1009 LIN503 3390-3 30.0  27.0  27.0  11.1  11.1   0.2   0.3  10.7
12:43:23 1008 LIN502 3390-3 95.9  74.1  74.1  12.9  12.9   0.2   1.9  10.9
12:44:23 1008 LIN502 3390-3 34.8  27.3  27.3  13.4  12.7   0.2   2.0  10.6
12:49:23 100A LINMST 3390-3 49.2  46.8  46.8  15.1  10.5   0.2   7.6   2.7
12:50:23 100A LINMST 3390-3 96.8  66.6  66.6  14.5  14.5   0.2  11.0   3.3
12:51:23 100A LINMST 3390-3 47.3  40.4  40.4  13.8  11.7   0.2   8.0   3.5
12:53:23 100A LINMST 3390-3 24.3  59.3  59.3   5.5   4.1   0.2   1.1   2.8
12:54:23 100A LINMST 3390-3 40.5  91.1  91.1   5.2   4.5   0.2   1.1   3.2
12:56:23 100A LINMST 3390-3 60.9 144.6 144.6   4.6   4.2   0.2   1.0   3.0
12:57:23 100A LINMST 3390-3 10.6  27.8  27.8   5.6   3.8   0.2   1.1   2.5
13:00:23 100A LINMST 3390-3  6.9  21.5  21.5   3.2   3.2   0.2   1.1   1.9
13:03:23 100A LINMST 3390-3  6.4  21.3  21.3   3.0   3.0   0.2   1.1   1.7
13:04:23 100A LINMST 3390-3  6.3  20.3  20.3   3.1   3.1   0.2   0.9   2.0
13:08:23 100A LINMST 3390-3  7.0   7.5   7.5   9.3   9.3   0.2   0.0   9.1
13:10:23 100A LINMST 3390-3  0.5   7.5   7.5   0.7   0.7   0.2   0.0   0.4
         100B LINRO  3390-3  0.3   5.4   5.4   0.5   0.5   0.2   0.0   0.3

Analyzing disconnect time:

Disconnect time is impacted in today's technology mostly by cache. If the data has already been accessed and resides in the cache then the disconnect time should be close to zero. In the experiments, when a new linux was brought up, the disconnect time was high. But the 2nd and 3rd time the data was accessed, the disconnect time was very low.

Analyzing connect time:

At the point where data may be read from the rotating disk, controllers have different technologies. Some do a read through, meaning the controller will reconnect immediately to the channel and the data will be transmitted at the disk speed, between 4 and 5 MB per second in today's technology. Some controllers will read the data into cache and then send it over the channel at high speed, up to 17MB/sec. Obviously, the latter will have slight longer disconnect times, but very short connect times - and will have higher throughput potential.

The very long connect times in these experiments is because minidisk cache by default will read in a full track whenever any data on a track is accessed. And it appears that this storage controller does a read through, so is transmitting data at disk speeds.

MDC Track caching has proven to degrade performance on many applications including data base, shared file system, and now Linux. Two alternatives exist: 1) Turn off MDC - but this loses many benefits provided to guest servers running under VM. 2) Convert the linux minidisks to use block cache instead of track cache. This has to be done manually for each minidisk either in the directory or with the CP SET MDCACHE command.