Velocity Software, Inc.

The z/VM Tuning Reference, by Velocity Software

The information and suggestions contained in this document are provided on an as-is basis without any warranty either expressed or implied. The use of this information or the implementation of any of the suggestions is at the reader's own risk. The evaluation of the information for applicability is the reader's responsibility. Velocity Software may make improvements and/or changes to this publication at any time.

Overview

This reference guide is extracted from Velocity Software's presentation on configuration guidelines. It provides high level configuration and tuning recommendations that can have their results measured using zVPS. Most installations will find significant results from the recommendations suggested here. Installations with more complex requirements will want to evaluate the recommendations based on measurements of their particular systems.

This abreviated Tuning Guide will discuss the following from both a traditional z/VM environment and from a Linux server farm environment:

DASD Subsystem Performance Guidelines
Storage Subsystem Performance Guidelines
Paging Subsystem Performance Guidelines
Processor Subsystem Performance Guidelines
Minidisk Cache Performance Guidelines
Linux Configuration Guidelines
Service Machines Performance Guidelines
Traditional CMS Workload Guidelines
Performance Management Defined
Velocity Software Products and Services

For performance questions or further information about evaluating these recommendations in your z/VM Linux environment please contact Barton Robinson of Velocity Software at:

Velocity Software, Inc.
PO 390640
Mountain View, CA 94039-0640
650-964-8867

DASD Subsystem Performance Summary

DASD Configuration Guidelines for z/VM:

Do NOT combine spool, paging, TDISK and minidisks at the volume level. (This means dedicated volumes!) Page and spool use algorithms designed to minimize seeks and overhead within the file. Putting either page or spool on the same volume as any other active area will result in contention and overhead.

Furthermore, multiple page or spool allocations should not reside on the same volume.

TDISK is formatted regularly and should not be assigned to the same volume as data with a performance requirement.

z/OS and VM data should be segregated at the control unit level to avoid error recovery complications and to reduce performance spikes when z/OS runs I/O intensive batch jobs.

DASD Planning Using Access Density

When allocating DASD using the access density methodology, consider the following planning guidelines. Note, they are not hard and fast rules of thumb. Installations should always review performance to ensure that the user's needs have been met. Access Density is defined as the number of I/O expected per Gigabyte of data. The ESADSD6 report provides data access densities at a device level. The following recommendations for current DASD technology are intended to keep device busy below 10%. This number is intentionally conservative as a guideline to provide positive results when estimates are wrong.

For some Linux volumes and z/VM paging volumes, the I/O are larger and longer in duration. Plan for Linux volumes and page volumes to have service times at 1-2ms per I/O, thus a device should be targeted at 50-100 I/O per second. Traditional I/O at 4K per I/O have service times in the 1-2 ms range, which means 50 I/O per volume is a reasonable target.

SCSI is currently not suited for high access data or paging due to reduced performance.

Control Unit Planning

He with the most cache wins. In a linux environment, the issue is often the non-volatile write cache, as Linux will buffer writes and then write out data in large bursts overflowing the write cache. Ensure there is a mechanism for detecting NVS full conditions. Minimizing Linux server storage sizes also minimizes the potential of this problem by reducing the available storage to cache write data.

Channels

Channels today rarely impact performance. PAV/HiperPAV is always good.

Measuring the DASD Subsystem

Each of the above tuning recommendations can be evaluated using the following zVPS reports:

ESADSD1: Device Configuration
ESADSDC: Cache Configuration
ESADSD2/6: DASD Performance and access rates
ESADSD5: DASD Cache performance
ESAPSDV: Page/Spool Device Performance
ESACHAN: Channel Performance
ESACHNH: Hypersockets Performance

Storage Subsystem Performance

Storage requirements should be reduced as much as possible to avoid unnecessary paging delays. Linux adds several guidelines. Plan on 2GB of storage for z/VM, MDC, and the infrastructure (TCPIP, DIRMAINT, zVPS).

Linux Storage Planning Guidelines

With Linux, the over-commit ratio is the planning target. If you plan for 20 Linux servers, and they are 1GB each, with a target over-commit ratio of 2, then 12 GB is required. (20 servers times 1gb, divided by 2, plus z/VM 2GB of storage). For WAS and Domino environments, an over-commit target of 1.5 is reasonable. For Oracle and other virtual friendly applications, over-commit of 3 is reasonable.

To put more servers into existing storage, decrease Linux server storage sizes until they start to swap. Repeat. This is the largest tuning knob available to improve storage utilization.

System Settings

Many SRM settings are no longer useful:

SET SRM STORBUF
SET SRM LDUBUF
SET SRM DSPBUF

Storage Analysis

Use the following reports to evaluate storage.

ESASTRC: Storage Configuration
ESASTR1: Storage Analysis
ESASTR2: Storage Analysis Details
ESADCSS: NSS/DCSS Analysis
ESAASPC: Address Space Analysis
ESAUSR2: User Resource Utilization
ESAUSPG: User page Analysis

Paging Subsystem Performance

Review the options for reducing storage requirements BEFORE analyzing or enhancing the paging subsystem. Many times, storage requirements can be reduced so that paging requirements drop significantly. If this is the case, any time spent on the paging subsystem will be wasted.

Paging Configuration Requirements

The following requirements for the paging subsystem are in order. Ensure page packs are dedicated.

Page space requirements: Page space must be no more than 50% allocated. More than that, and blocking factors drop, and Page I/O goes up. This is the most critical consideration.
Page device requirements: Ensure that paging devices do not exceed 20 percent busy. Allocate the same page space to each paging device to ensure the load is balanced at peak intervals.

Spooling Configuration Requirements

Spool very rarely impacts performance. Have sufficient number of spool volumes to keep the device busy at less than 20 percent peak period. Maintain sufficient space to ensure console logs are available for problem determination.

Paging/Spooling Analysis

The following reports should be used for analyzing the paging and spooling subsystems:

ESAPAGE: Page/Spool requirements (system level)
ESABLKP: Block Paging analysis
ESAPSDV: Page/Spool (Device level)
ESAUSPG: User requirements

Processor Subsystem Performance

Moore's law is dead, long live the mainframe. Processor/cycle speeds have not significantly changed in several generations. Now the objective is to get more work done with less cycles. Reducing CPU requirements from a system tuning perspective can be done with the following actions.

Minimize Linux Virtual Processors: Linux should not be allowed to have multiple processors when the workload does not need it. Giving a server an extra virtual processor may provide a few milliseconds improvement in performance, but will result in spin locks, consuming processor unnecessarily.
Minimize polling: Linux hertz time is just one example of polling within a Linux server. This can and should be corrected using the timer patch. Note that WAS, Domino, SAP and some other applications have since implemented polling.

System Settings

Many guidelines for SRM settings have changed over the years and for those that have their own "SRMSET EXEC" that has been carried forward for years, there may be some new recommendations.

Many guidelines had to do with controlling access to the dispatch list, and when resources were constrained, virtual machines would be delayed on the eligible list. This function no longer exists.

SET SRM DSPBUF | LDUBUF | STORBUF are no longer useful
SET SRM IABIAS has no meaning anymore
SET SRM DSPSLICE minslice Can be useful for systems with few processors and CPU intensive workloads. For Linux workloads, use the default of 5 (ms).
SET SRM MAXWSS has always been a useless setting
SET SRM VERTICAL | HORIZONTAL - VERTICAL is required if using SMT. HORIZONTAL has helped performance of many systems where SMT was not a good option.

Processor Performance Analysis

Use the following reports to evaluate the processor performance:

ESACPUA: CPU Analysis
ESACPUU: CPU Utilization
ESAIUCV: IUCV Analysis
ESAIUER: IUCV Error Analysis
ESALPAR: LPAR Performance Analysis
ESALPARS: LPAR Summary
ESADIAG: Diagnose Analysis
ESALOCK: System Lock Analysis

Minidisk Caching

There are three areas of interest for MDC.

CMS workloads (including Velocity Software's zVPS)
VSE Workloads
Linux workloads sharing disks

For CMS and shared Linux disk workloads, analysis of several systems has shown a pattern of diminishing returns from MDC. The largest gain is from the first 100mb. Note that Linux servers sharing one or two disks can avoid I/O with MDC. In no case should the z/VM control program (CP) be allowed to determine how much storage is to be used for MDC. Many case studies have shown that CP will cause paging spikes by allocating too much storage to MDC. The following commands should be issued to control MDC Storage where the maximum sets a reasonable limit on the size of MDC.

SET MDC STORAGE 0M 256M

For VSE systems that will get benefit from the MDC track read, meaning that for every I/O to disk, the full track is read and cached, ensure that MIN and MAX are the same to maintain consistent performance:

SET MDC STORAGE 1024M 1024M

Measuring MDC

The following reports can be used for analyzing different aspects of MDC performance:

ESAMDC: Mini-Disk Cache Analysis
ESADSD6: Analyze MDC impact on disks
ESAUSR3: Analyze MDC impact on users and Linux servers

Linux Configuration Recommendations

These are recommendations that are considered best practices and have been validated in 100's of Linux installations.

Swap to virtual disk - swapping to virtual disk does not impact response time and means swap is not a bad thing. Swap to vdisk is measured in microseconds, not milliseconds.
Multiple and Small Swap Disks - Swap disks must be multiple and small, and prioritized so that one is used before the next one. Alerts should be set for when the 2nd vdisk becomes needed.
Minimize virtual machine size until Linux swaps: The only way to reduce storage requirements is to stop Linux from caching unnecessary data and programs.
Minimize number of virtual CPUs - if the workload requires only 1 virtual CPU, providing a second CPU will waste CPU by creating spin locks.

These are best practices. All have been validated many times in many installations.

Service Machine Performance Summary

Configuring TCPIP for z/VM

TCPIP should have the following option set to provide optimum service. The Share setting can be modified later to fit requirements if TCPIP's requirements are very large.

SET SHARE TCPIP ABSOLUTE 5% To ensure TCPIP is not over prioritized as the default REL 3000 accomplishes.

Tuning z/VM Database Service Machines

Database service machines such as SQL are a shared resource. They should have the following options set to provide optimum service. This does not include Linux servers, unless they are shared by many other servers as a resource.

SET SHARE SQLxxx RELATIVE 300

Measuring Service Machines

Each of the above tuning recommendations can be evaluated using the following zVPS reports:

ESAUSR1: User Configuration to validate settings
ESAXACT: Transaction Analysis - understand server delays
ESAUSR2/3/4: User Resource Utilization
ESAUSRQ: User Queue Analysis (to understand queue sizes)

Tuning Traditional CMS Workloads

The following guidelines are for traditional CMS workloads, and have no impact on Linux server farm workloads.

File Directories in Storage: Use the SAVEFD facility to save file directories in saved segments for disks that are often accessed. For example the HELP disk and the tools disks. This will eliminate I/O needed to access the minidisk.
EXECs in storage Put often-used EXECs from the S, Y, and tools disk into the installation saved segment. This reduces both I/O and storage use since the EXEC will not be loaded into the user storage, but instead executes from a single shared copy. IBM provides 'SAMPNSS EXEC' to define the segment and a sample 'CMSINST EXECLIST' that contains the list of EXECs and XEDIT macros that will be loaded into the saved CMSINST segment. Installations should add all EXECs and XEDIT macros that are likely to be used frequently.
Help Disk Blocking the Help disk at 1K requires 25% less DASD space without changing the I/O rate. Always save the file directory with SAVEFD to reduce directory access I/O and time. Installations with heavy use of Help should force users to directly access it by modifying the SYSPROF EXEC.

Performance Analysis

Use the following reports to evaluate impacts of these functions on performance:

ESAWKLD: Determine impact on user workloads
ESAUSPG: User Paging Analysis
ESAPAGE: Determine impact on system paging

Functional Requirements for Managing Linux Performance Under zVM....

What Every IT Professional Should Know

Performance Measurement skills and tools to ensure current service levels are met. This includes current performance measurements and the ability to analyze performance from previous time frames.
Capacity Planning skills and performance database to ensure future needs are met, including the ability to transfer data to MICS or MXG.
Operational alerts implemented to allow operations to detect current issues such as looping processes, exceeding disk capacity, etc., for hundreds of servers. Alerts can be sent to any SNMP based management console, 3270, or a browser on a workstation.
Charge back and accounting capability to provide data used in a mainframe business model to charge for resources consumed either using zVPS facilities or MICS.

Achieving these results introduces the following challenges:

Accuracy of the Data - The CPU data provided by Linux in a virtual environment prior to SLES10 was wrong. Velocity Software was the first to understand this issue and offered the ONLY product to correct the results. The same is now true for Linux in SMT environment.
Complete Data Collection: Multi-platform Data Collection - Through the use of a standard interface (SNMP and NETSNMP) an installation using zVPS may monitor many different platforms (NT, Linux, Sun, HP).
Complete Data Collection: Ability to collect data from 100s or even 1000s of servers.
A 100% Capture Ratio insures that you know exactly how much system resource is being used and by whom - down to the Linux process level.
Cost of Data Collection - Cost of collecting data should be kept to a minimum. Some management tools require as much as 5% of the processor resource.
Velocity's target is .1% or less of ONE processor at 1 minute data granularity per linux server.

Velocity Software, Inc. Products and Services

Velocity Software's focus is to provide performance products and services for z/VM. Velocity Software offerings currently include:

zVPS: The Velocity Performance Suite is designed for installations using Linux under z/VM. It includes the standard z/VM measurement facilities (zMAP and zMON) as well as Linux and network data collection (zTCP), and a full function z/VM based webserver (zVWS).
zMAP, the z/VM Monitor Analysis Program, and
zMON, the z/VM Real-Time Monitor analyze system performance monitor data produced by z/VM. zMAP generates reports for use in performance analysis activities, and stores, retrieves and reports from history files to facilitate capacity planning and long-term performance trend analysis. zMON generates real-time displays that show z/VM system performance measurements. zMON captures system performance data and records it on disk, as well as creating history files that can be employed with zMAP. Together, zMAP and zMON provide a complete z/VM performance monitoring system.
zTCP provides Network and Linux data collection
zVWS (Velocity Web Server) provides a full function web server to allow browser based interface to z/VM functions and performance data.
z/VM performance workshops are offered regularly. See the Velocity Software website for details.
zTUNE is Velocity Software's service to ensure performance problems are resolved quickly and allow access to Velocity Software 100 years of experience in solving performance problems. This includes system performance reviews when ever requested.
zPRO is Velocity Software's solution for implementing private clouds as well as providing an easy to use web page for managing your z/VM environments.

See what our customers say

Performance Tuning Guide

Have a Velocity Software Sales Account Exec contact me.

IBM Z Ecosystem

Test drive our products
zVPS demo

Follow Velocity Software on LinkedIn!