User Wait State Analysis

Examining User Wait States

If there are complaints of users/servers/virtual machines having slow to no response time they are usually waiting on a resource. The ESAXACT screen/report can be used to determine which resource is the culprit. However, as seen in the background information, if there is a bottleneck in Linux (or other applications), the ESAXACT screen/report may not show it. So if all looks well with z/VM, check the Linux system processes - ESALNXP (or other application reports).

Background information:

General Guidelines on where to focus (details below):


The ESAXACT screen/report: Transaction Delay Analysis - This shows an analysis of virtual machine states and wait states.

?

  • Note: For the screen - Click (zview) or zoom/pF2 (z/VM) to see each user/server in a class. For the report - Use the Top User Analysis to see the top users/servers per 15 minute interval.
  • UserID/Class - This shows the system total and totals for servers/classes.
  • Percent non-dormant Run - The percentage of virtual machines in Running state are actually dispatched and doing work. This will number will go down as response times go up. If a server is constantly high in RUNNING, it needs more cycles - the only way to speed up its processing is to: 1) Break the workload out to run on multiple CPUs, 2) Get a faster CPU or 3) Reduce the CPU requirements of that application.
  • Percent non-dormant SIM - This shows the percentage of time a user/server/class is waiting for the z/VM control program to execute (or simulate) instructions on its behalf. This state is a function of master processor utilization and contention. The master processor is a single threaded resource as certain functions can only run on the master processor, such as spool, diagnose and IUCV functions. Users performing a significant amount of these functions have high simulation wait. SIM wait will always show up with CPU wait, but if it is high on its own - a number over 10 is excessive - then research needs to be done on the functions above.
  • Percent non-dormant CPU - This shows the percentage of time a user/server/class is waiting for CPU. The virtual machine (virtual processor) is ready and waiting to be dispatched to run but there is no physical processor currently available. A number over 20 is excessive. If a machine has a high CPU wait, check the ESAUSR2 report to compare its CPU use to the total CPU use. Check the CPU utilization for the system, the relative SHARE of the id and also check LPAR weights/overhead.* See LPAR weights/overhead
  • Percent non-dormant SIO - This shows the percentage of time a user/class is waiting for I/O. This is a measure of the effectiveness of I/O tuning such as the use of minidisk caching, or DASD cache. Check the DASD and cache utilization for the system. There may be a device problem or a cache issue.
  • Percent non-dormant PAG - This shows the percentage of time a user/class is waiting on needed pages that are no longer in memory and must be read in from DASD. A number over 10 is excessive. Check the storage/paging utilization for the system. Also reserving storage for an id in this situation (such as - SET RESERVED id 1000) will also help.
  • Percent non-dormant Async I/O - This shows the percentage of time a server (multiprocessing system) is waiting for asynchronous I/O. This state is entered when there is an outstanding I/O and the user has loaded a wait state PSW. It occurs for guests running an operating system such as Linux, VSE, z/OS or second-level z/VM.
  • Percent non-dormant Async Pag - This shows the percentage of time a server (multiprocessing system) is waiting on paging. If a server is in this state, they are waiting for a page to be read by the system. Check the storage/paging/DASD utilization for the system.
  • Percent non-dormant Ldg - This shows the percentage of time a user/server/class is 'loading' if it has a high count of page reads or if its pages were paged out. This can indicate a thrashing condition. This is where the system is struggling to get storage resources to run machines. Check the storage/paging/DASD utilization for the system.
  • Percent non-dormant Lim Lst - This shows the percentage of time a user/server/class is on the "Limit List". This could be due to SHARE LIMIT being set or possibly a resource pool constraint. Check SHARE size for the id, CPU utilization and/or resource pool utilization.
  • Percent non-dormant Pct Elig - This shows the percentage of time a user/server/class is on the eligible list waiting to enter the dispatch list. If this number is not zero, it is not good. The guests are waiting on some resource - check the other columns to find the bottleneck. No longer relevant after z/VM 6.3.

  • Conclusions:

    The ESAXACT screen is a very good way to see at a glance how the system is performing. Using the ESAXACT report, trending or "time of day" bottlenecks can be easily seen. If more information is needed, an EXTRACT can be run - See Extracting Performance Data Basics. Also, in zVIEW, specific time frames and user/server/class may be entered to see a performance trend. This is extremely helpful for pinpointing when a problem started.

    * Eliminating CPU wait can improve performance by 10x!


    Back to top of page
    Back to Flow Chart main page