LPAR Analysis

The CPU environment and the LPAR environment go together for providing CPU processing to applications. Having dedicated engines is very straight forward for tracking performance, but most businesses aren't that lucky. In today's systems with multiple LPARs, it is normally a business decision to allocate resources to specific workloads on specific LPARs that share engines. There are multiple layers of resource allocation and all layers will require analysis and understanding. First is allocating resources (how the box was configured originally), then how to measure what has been configured (does the current configuration work). Real work time is all that matters.

Note - if cores are dedicated (not shared), the LPAR gets 100% and no weighting, etc is done. It can get tricky to measure performance with SMT and dedicated engines.

LPAR Weight/Entitlement -

Allocating CPU resource to LPAR (explained in detail in LPAR weights/overhead analysis)

Physical overhead (ESALPMGS - Mgmt) - if there are too many vCPUs assigned, this number will go up and problems may follow.
LPAR overhead (ESALPMGS - Ovhd) - overhead percentage should be approximately 1-2% of the total
LPAR assigned time (ESALPMGS - Assign) - the actual processing time the guest sees (real work time).
LPAR entitlement (ESALPARS - Entitld CPU Cnt) - Shows how much processing power an LPAR has
(which is not as important if the system is running below 80% but is definitely important if running over 80%)

z/VM SHARE - (LPAR Assigned time - virtual)

Allocating z/VM resources to virtual machine - See Setting SHARE values for Virtual Machines

z/VM System Control Program time (ESACPUU - Overhd Syst) - time used for the z/VM system functions.
User overhead time (ESACPUU - Overhd User) - time used for system functions attributed to users.
Emulation time (ESACPUU - Emul time) - the actual processing time the guest sees (real work time).

Linux - (Emulation - z/VM guest time)

Allocating Linux resource to processes - See Configuring Linux - Best Practices

Linux system time/kernel time (ESALNXS - Krnl) - time used for system processing or system overhead.
IRQ - Interrupt Request time (ESALNXS - IRQ) - time used for software interrupts or user overhead.
User time (ESALNXS - User) - the actual processing time the user sees (real work time).

Very Important - When defining LPARs, it is very important to define your virtual CPUs on the same book and/or the same chip. This helps immensely with performance since it better utilizes the cache. More work is done per cycle as less cycles are wasted getting data from other levels of cache

Parking/Unparking effects on LPAR usage: Each engine based on LPAR weights is provided a polarization of high, medium or low. The system hypervisor will park LPAR engines - meaning it will not dispatch those engines to the LPAR based on the requirements of the LPAR and total system utilization. In an SMT environment, both threads on an engine get parked, z/VM is alerted and will not dispatch on those engines. High levels of parking on smaller systems have shown to be problematic. Performance will then improves when the hypervisor is not allowed to park engines.
See CPU/LPAR Parking for more information on CPU parking and how it affects the system.

Helpful ESAMON screens/ESAMAP reports:

ESALPAR - Logical Partition Analysis - shows logical partition characteristics and utilization for each LPAR and the system as a whole
ESALPARS - Logical Partition Summary - shows logical partition configuration and utilization for each LPAR
ESALPMGS - Physical CPU Utilization by type - shows the CPU percentage busy by each CPU type (CP/IFL/ZIIP)

Knowing the capacity on the box in terms of CPUs and the requirements of the other LPARs then leads to the business decision of how to allocate the CPU resources.

ESALPAR - Shows information for each LPAR running on the box:

CEC Physical CPUs - This shows how many physical CPUs are in the box that are available for use (7).

Logical Partition Name - This shows the individual systems (LPARS) that have been set up to use these CPUs.

CPU Type/Count - This shows how many of each type are available. There are 7 virtual GP cores and 11 virtual IFL cores. VSIVM4 has three virtual IFL's - the next three green lines give information for each of the three virtual IFL's.

Logical Processor/%Assigned/Total/Ovhd - This shows the total amount of time a CPU was assigned to each LPAR and the amount of overhead. There is a percentage total, then split out by each virtual CPU.

Polar - This shows the processor polarization - Hor=Horizontal, VHi=Vertical High, VMe=Vertical Medium or VLo=Vertical Low. VLo can cause PR/SM hypervisor overhead. The only way to change the vertical assignment is by changing the weights for the LPARS (or adding additional cores to the box). There is a fixed algorithm.

Multi-thread Idle Time - This shows time an individual dispatched CPU of a core is in a non-running state for Simultaneous MultiThread (SMT). Too much idle time shows SMT may not be needed or vCPUs are overallocated.

Multi-thread cp1/cp2 - Shows the two thread numbers for each vCPU if SMT is enabled.

ESALPARS - Shows the processor sharing configuration and utilization for each LPAR

Name - This is the LPAR name.

Nbr - This is the LPAR number.

Virtual CPUs - This is the number of virtual CPUs this LPAR can access.

Typ - This is the CPU type - CP/IFL/ZIIP.

%Assigned/Total - This shows the total amount of time a CPU was assigned to the LPAR, including overhead.

%Assigned/Ovhd - This shows the system overhead time. If this time is high, investigation is required.

Assigned Shares/LPAR/Weight - This is the total weight of the assignments for each virtual processor.

Assigned Shares/Weight/Pct - If not dedicated, this is the processing weight and percentage of the total assigned share.

Assigned Shares/VCPU Pct/SYS - This is the percent of the total CPUs of that type assigned to the Virtual CPUs in this LPAR.

Thread Idle - This shows time an individual dispatched CPU of a core is in a non-running state for Simultaneous MultiThread (SMT). Too much idle time shows SMT may not be needed or vCPUs are overallocated.

Entitled CPU Cnt - This shows how many IFLs this LPAR is entitled to. For each individual LPAR, this will be a percentage. This is a minimum target resource delivery. (Again, see LPAR weights/overhead analysis) for more information about LPAR configurations.)

Note: On the ESALPARS report, the bottom of the report shows a summary of the totals by processor type. Notice there are 7 physical CPUs - 2 CPs, 4 IFLs and 1 ZIIP. This shows the total percent busy by each type. (Remember for virtual CPUs, there are 7 virtual CPs defined, 11 virtual IFLs defined and 4 virtual ZIIPs defined that are shared between 12 LPARS in this box).

ESALPARS-2 - Shows the affect of capping an LPAR.

Looking at the VM LPAR, it seems to be only running a little over 40%. But notice:
%Assigned Total - 41.2%
Pct - 40.0
Capped - Yes
This means that even though it looks like VM is only running at ~40%, it is really running at 100%. This is important if the LPAR was not configured as expected.

ESALPMGS - Shows how the hardware/processing resources are distributed in the box.

CPU Type - This shows the different types of physical CPUs.

Shared Processor Busy CPU% - This shows the average utilization per CPU of that type. Currently there are four IFL engines averaging 19.4% each and two GP engines averaging 96.8% each.

Shared Processor Busy Total - This shows the total utilization for each type. Currently there are four IFL engines running at 77.4% (out of 400%) and two GP engines running at 193.7% (out of 200%). The ESALPARS report will show the average for the day (there is no ESALPMGS report specifically).

Shared Processor Busy Ovhd/Mgmt - This shows the Logical (Ovhd) overhead and Physical (Mgmt) overhead for each CPU type. High overhead numbers can indicate there are too many vCPUs in use.

Conclusions

There are a lot of metrics which require a need to understand how each LPAR uses CPU and how a specific workload can be guranteed their required CPU. Understanding LPAR entitlement is critical to understanding CPU resources available to each LPAR. Entitlement is the amount of real CPU time each partition is guaranteed. Entitlement for an LPAR with shared CPUs is a function of the LPAR's weight, the sum of the weights for all other shared partitions and the number of shared physical CPUs in the CPC. It is calculated for each CPU type as the number of shared CPUs multiplied by the ratio of an LPAR's weight to the sum of the weights of all shared partitions. From a business planning perspective, this becomes a critical metric for providing specific workloads with the required processing power. All of these needs to be understood when the box is configured. Many of these calculations can only be changed by updating the hardware in the box via the Hardware Management Console (HMC).