The CPU Level
The CPU - is the highest level of measurement in z/VM.
- CPU response time is a function of processor speed and number of CPUs. If there are issues with the CPU
utilization, it will be one of the first things to be seen.
It is key to know the baseline numbers of your environment to be able to see when abnormalities appear. - Sometimes looking at the utilization numbers can be misleading especially in a Linux environment. zVPS measures
impacts in CPU seconds from the hardware.
It measures the impacts of LPAR(s), z/VM virtual machines, Linux processes, zVSE jobs/partitions, etc. This gives a very accurate measurement of the system. - In a z/VM MP environment, adding processors can reduce queuing time and increase availability
but it also costs money.
For example, Linux should not have multiple processors when the workload doesn't need it. An extra virtual processor may provide a few milliseconds improvement in performance,
but can result in spin locks, which consumes the processor(s) unnecessarily. - Minimizing polling is another action that can reduce CPU requirements. Linux hertz time is just one example of
polling within a Linux server. This can and should be corrected using the timer patch.
Note that WAS, Domino, SAP and some other applications have since implemented polling. - Watch for master processor bottlenecks:
- Pay attention to which processor is the master processor.
- The master runs things like RACF, Spooling, IUCV, Paging, CP commands, linemode commands like screen scraping
- If the master processor is overloaded, move the workload or add another PHYSICAL (not virtual) processor.
- Watch ESACPUU (system/user overhead), ESAXACT (simulation wait) and ESALPAR (master engine with weighting).
- IF THE MASTER PROCESSOR IS OVERLOADED - ADDING ANOTHER ENGINE WILL NOT HELP - IT WILL ADD TO THE PROBLEM.
A second LPAR will need to be added so there is a second master processor. - Affinity processing is the concept that because all information for an instruction needs to be in L1 cache before it can execute, a virtual CPU will try first to be dispatched on the same thread/CPU to reduce the need to move data into L1 cache. However, with the way z/VM server systems tend to poll, this doesn't tend to work.
- If seeing available CPU capacity (ESALPARS/ESACPUU) AND CPU wait (on ESAXACT) - use the SYSCONTROL command below.
- What is right for your environment depends on reviewing the current environment using
the zVPS tools/information to gain an understanding of what improvements or corrections are possible.
Again, it is incredibly helpful to know your baseline environment then abnormalities become more obvious and easier to find.
Some clarification on CPU naming:
- The machine is equipped with physical cores - not engines, not processors, not CPUs, not IFLs, not CPs.
- Cores, whether physical or logical, come in different types: CP, IFL, etc. Logical cores and physical cores have a percent-busy metric called core utilization.
- For a logical core - this is the percent of time the logical core is dispatched on a physical core.
- For a physical core, this is the percent of time the physical core has a logical core dispatched upon it.
- Contained within a core, either physical or logical, are instruction execution units called processors. Physical cores contain physical processors. Logical cores contain logical processors.
- An IFL core-type can have either one or two processors contained in the core, depending upon the SMT level. SMT-2 has two processors in the core.
- Logical processors have a percent-busy metric called processor utilization. This is the percent of elapsed time the processor has a non-wait PSW loaded (this hasn't changed).
- Synonomous with processor utilization is processor busy, processor load, CPU utilization, CPU load and CPU busy.
- Keep in mind, you pay for physical cores (not threads or logical cores/processors - which gets confusing with SMT).
For a presentation about the CPU environment and utilization, see Processor Analysis and Tuning
Helpful system settings:
- SET SRM DSPSlice minslice - This can be useful for systems with few processors and CPU intensive workloads.
For Linux workloads, the default of 5 (ms).
Setting it to 1 (ms) can greatly help with servers running online transactions. Note that when turning on SMT, the dispatch time slice default goes to 10. - Q SYSCONTROL - Shows the following (with a default level of 1):
DISPATCH THDAFFINITY ON DISPATCH PREEMPTLOCAL OFF DISPATCH TSEARLY 50 DISPATCH INCHIPBUSY 50000 DISPATCH INCHIPDELAY 50000 DISPATCH INNODEBUSY 100000 DISPATCH INNODEDELAY 100000 DISPATCH INSYSBUSY 200000 DISPATCH INSYSDELAY 200000
It will allow available capacity to be used immediately instead of waiting for the dispatch delay. (Set it back to 1 to return to the original default setting).
q syscontrol (at the setting of 0) DISPATCH THDAFFINITY OFF DISPATCH PREEMPTLOCAL ON DISPATCH TSEARLY 0 DISPATCH INCHIPBUSY 0 DISPATCH INCHIPDELAY 0 DISPATCH INNODEBUSY 50000 DISPATCH INNODEDELAY 50000 DISPATCH INSYSBUSY 200000 DISPATCH INSYSDELAY 200000
Settings that are no longer relevant/useful:
- SET SRM DSPBUF | LDUBUF | STORBUF
- SET SRM IABIAS
- SET SRM MAXWSS
Understanding how to view CPU utilization with SMT
Helpful ESAMON screens/ESAMAP reports (further explained below):
- ESAMAIN - System overview - shows current total CPU processor utilization
- ESACPUU - CPU Utilization Analysis (Part1) - shows current CPU processor utilization details
- ESACPUA - CPU Utilization Analysis (Part2) - shows more current CPU processor utilization details
- ESAMFC - Processor Cache Analysis - Shows processor instruction information
- ESADIAG - Diagnose code rate - shows information on which diagnose codes are being used at what rate
- ESAIUER - IUCV error analysis - shows errors in inter-system communication
- ESALCK - Spin lock activity - shows where spin locks are happening
- ESATOPU - Top Users Resource Use - shows Top users for last 30 minutes
Using zVPS to find information for solving issues with the CPU utilization:
Use zVPS real time monitoring and daily reports to see how efficently the environment is running.
What is the total CPU utilization?
How is that broken down by LPARs/IFLs?
What are the users consuming? What else might be happening? Here are some places to start:
ESAMAIN - System overview information:
ESACPUU - Shows information for each CPU/engine on the box.
ESACPUA - Shows similar information as ESACPUU.
ESAMFC - Shows processor instruction information. (must have Measurement Facility turned on in the LPAR to collect the correct records for this screen/report - See Enabling CPUMFC Records)
ESADIAG - Shows Diagnose rates.
ESAIUER - Shows IUCV errors.
ESALCK - Shows spin lock activity.
ESATOPU - Shows CPU utilization by user - top users first.
Conclusions:
Looking at CPU utilization is one of the quickest ways to find processing issues.
Just like on the freeway:
- Sometimes it is just one car (out of control user/system)
- Sometimes additional lanes or further tuning can be needed (CPU utilization stays high or spikes frequently)
- Sometimes the lanes are clogged by an accident (not a CPU problem, but other hardware issues).
Back to top of page
Back to Performance Tuning Guide