NCM-MCI: Nutanix Certified Master - Multicloud Infrastructure Study Guide
The NCM-MCI (Nutanix Certified Master - Multicloud Infrastructure) validates advanced, design- and troubleshooting-level expertise across the Nutanix stack: cluster resiliency design, AHV/VM management, advanced networking, storage and performance internals, data protection and DR (Leap, Metro, NearSync), security, monitoring, and lifecycle/upgrade operations. It is aimed at senior administrators, architects, and consultants who already hold or exceed NCP-level skills and must justify design trade-offs and diagnose complex multi-component failures. The 180-minute exam is a live, performance-based lab of roughly 16-20 hands-on scenarios (not multiple-choice questions) and requires a scaled score of 3000 (on a 1000-6000 scale) to pass.
Domain 1: Advanced Cluster Management and Design
- Resilient capacity is the fill threshold up to which a cluster can still self-heal back to its configured RF after a fault domain is lost; it equals total usable capacity minus the contribution of the single largest fault domain (node, block, or rack).
- RF2 keeps two data copies and uses three-way Cassandra metadata replication, tolerating one fault-domain failure; RF3 keeps three data copies and uses five-way metadata replication, tolerating two simultaneous fault-domain failures.
- RF3 requires a minimum of five nodes for full protection and consumes roughly a third of raw capacity as additional copies, versus RF2's ~50% overhead (two full copies).
- Block awareness needs a minimum of three blocks for RF2 so a complete second replica can be reconstructed after losing the largest block; with only two blocks both copies of some data can be lost together.
- RF3 block awareness needs enough remaining blocks to hold a full replica set after the largest block fails (more blocks than RF2 block awareness).
- Severe capacity skew between blocks (one block holding disproportionately more data) breaks block awareness because Curator can no longer place a balanced second replica in the surviving blocks.
- Rack fault tolerance reserves one rack's worth of capacity for rebuild and computes resilient capacity after subtracting the largest rack rather than the largest node.
- Concentrating a majority of nodes in a single fault domain (e.g., 6 of 10 nodes in one rack) defeats rack awareness because losing that rack removes most compute/storage and stalls rebuilds.
- Erasure Coding (EC-X) provides little to no savings and reduces rebuild flexibility on small clusters (e.g., 3 nodes); it needs more fault domains for an efficient data-plus-parity strip layout.
- Larger per-node capacity raises the rebuild reservation: more data must be re-protected after a node loss, so proportionally more free space must be held back below resilient capacity.
- Size clusters for n+1 fault domains under RF2 and n+2 under RF3 so the largest fault domain's worth of capacity is always free for Curator to rebuild.
- Prism Central scale-out deploys three large PCVMs to distribute IAM, search/indexing, and microservices; PCVM members require low-latency, high-bandwidth links to form Zookeeper/Cassandra quorum.
- RBAC custom roles grant only specified operation permissions; an action like power-on or migrate fails if the custom role omits that exact operation permission.
- When a user matches multiple role mappings, Prism Central grants the union of all matched roles' permissions, so the most permissive mapping prevails; category-scoped role mappings let access follow VMs tagged for each business unit.
- Project quota is enforced against reserved-plus-requested resources, so existing reservations plus a new request can exceed a limit (e.g., 200 vCPU) even when raw usage looks lower.
Domain 2: Advanced AHV and VM Management
- Acropolis Dynamic Scheduling (ADS/Lazan) runs roughly every 15 minutes, is contention-driven, and migrates a VM only when the move will actually resolve sustained contention and improve placement.
- ADS evaluates both host CPU and the CVM's storage-controller (Stargate) CPU; high Stargate CPU on a CVM is treated as a hotspot and can trigger migrations even when host CPU is low.
- VM-VM anti-affinity is a soft (preferential) rule: HA can still restart all the VMs even if it forces co-location, and ADS later re-separates them once capacity returns.
- VM-host affinity is a hard rule: a VM pinned exclusively to one host is never restarted elsewhere by HA, which is exactly the behavior license-compliance pinning relies on.
- Pinning a critical VM to two or more hosts (VM-host affinity to a host set) still allows HA restart on an alternate listed host if one fails, balancing strict placement against availability.
- Lazan solves for the minimal-cost set of migrations to clear a hotspot, so it may move a small idle VM rather than the largest consumer to reduce migration overhead.
- vGPU-attached VMs and VMs with directly assigned passthrough devices cannot be live-migrated by ADS unless vGPU live migration is explicitly supported, so Lazan cannot use them to relieve a hotspot.
- A VM with a hard VM-host affinity listing only one host blocks that host from entering maintenance mode; the VM must be powered off or its affinity edited first.
- Lowering the ADS contention threshold makes Lazan declare hotspots more readily, increasing migration frequency and risking migration thrashing and constant network/CPU overhead.
- ADS live migration copies VM memory over the network, so a congested backplane lengthens migrations and causes Lazan to defer or limit concurrent moves.
- To align a wide VM with physical NUMA, expose multiple vNUMA nodes (e.g., 20 vCPUs as two 10-vCPU vNUMA nodes mapped to two sockets) so guest memory access stays local.
- Crossing a physical NUMA boundary (e.g., 16 to 17 vCPUs on a 16-core socket) introduces remote-memory latency and triggers vNUMA exposure that did not exist at the smaller size.
- Hot-added vCPUs do not recalculate the running vNUMA topology (fixed at boot) and require guest CPU hot-plug support plus bringing the new vCPUs online before use.
- Memory overcommit that triggers balloon-driver reclaim and host swapping forces guests to fault on swapped pages; the fix is to reduce overcommit or add physical RAM, not to disable ballooning.
- High CPU Ready / run-queue wait indicates vCPU scheduling contention from overcommit; small bursty desktops tolerate high overcommit while wide concurrent-core VMs tolerate far less.
Domain 3: Advanced Networking
- balance-tcp is an active-active OVS bond mode that requires LACP-negotiated link aggregation on the switch; standalone access ports cannot form the LAG and the bond fails to come up.
- active-backup keeps only one uplink active for failover, so total host throughput is capped at a single link's bandwidth regardless of east-west demand.
- balance-slb (source-MAC + VLAN load balancing) needs no LACP/LAG and rebalances flows roughly every minute; the same MAC appearing on two switch ports is expected behavior, so disable static MAC pinning/port-security on those uplinks.
- When uplinks span two switches, an LACP LAG cannot cross chassis, so the switches must run multi-chassis link aggregation (MLAG or vPC).
- An LACP member NIC that is up at the OS level but shows 'not bundled' means the switch port is not negotiating LACP into the same LAG; verify the matching switch port is in the same LAG and actively exchanging LACP frames.
- Any hash-based bond mode places all flows of a single conversation (VM to a given peer) on one uplink, so a single TCP conversation cannot exceed one link's bandwidth.
- Jumbo frames (MTU 9000) must be set consistently on the AHV bond AND every physical switch port and inter-switch trunk in the path; validate with a do-not-fragment ping using an 8972-byte payload.
- All uplink members within a single bond must use NICs of the same speed and type; mixing speeds is unsupported and degrades load distribution.
- Changing a virtual switch in Prism triggers a rolling, node-by-node reconfiguration that live-migrates VMs off each host before applying the change.
- Configure uplink ports as STP edge/PortFast so they skip listening/learning and forward immediately, avoiding connectivity delays during bond/host changes.
- Create additional virtual switches (vs1, vs2) each with their own dedicated uplink ports and bond to physically separate traffic classes from vs0; a NIC loss on vs1 then degrades only that traffic.
- Network segmentation isolates backplane (intra-cluster CVM/storage) and management traffic onto dedicated VLANs/subnets and ideally dedicated uplinks; it is separate from Flow Network Security, which enforces tenant VM microsegmentation.
- Service-specific network segmentation places a single Nutanix service (e.g., Volumes or DR replication) on its own VLAN/subnet and binds it to a chosen virtual switch so its traffic egresses on selected uplinks.
- A common segmentation failure is the segmented VLAN not being trunked/allowed on the affected nodes' physical switch ports, so verify end-to-end VLAN trunking and a non-overlapping subnet before committing.
- For RoCEv2 storage backplanes, use a dedicated bridge with RDMA-capable NICs and enable PFC/ECN only on the RDMA switch ports, keeping CVM management and user VLANs on a separate bridge (br0).
Domain 4: Advanced Storage and Performance
- Oplog coalesces small random writes and acknowledges them quickly from SSD; Stargate bypasses oplog for large sequential writes and sends them straight to the extent store since coalescing offers no benefit.
- Under RF2 a write is acknowledged after the local primary plus one remote replica are persisted to oplog; under RF3 it is acknowledged only after the local copy plus two remote replicas (three durable copies) are persisted.
- The unified cache is multi-tier: hottest data is served from RAM at lowest latency, next-warmest spills to the SSD-backed portion, and a miss is read from the local SSD extent store and inserted into cache for next time.
- Sustained random overwrite that outpaces oplog drain to the extent store fills the oplog, so Stargate applies admission control (throttles writes) to protect durability, especially on hybrid clusters draining to HDD.
- After a CVM is down or a VM migrates, data locality is lost; Stargate re-establishes locality over time by migrating egroups for accessed data to the local node and re-warming the local cache, so initial reads are slower.
- Curator disk balancing redistributes egroups to a newly added node for capacity balance at lower priority than foreground I/O, while Stargate keeps a local primary replica and re-establishes locality on access.
- Reads served while oplog data has not yet drained are satisfied by merging oplog with extent-store data; latency stays low because oplog is SSD-backed.
- EC-X is a Curator post-process applied to write-cold extent groups (not inline on the hot write path) and reduces overhead from ~100% (RF2 two copies) toward ~25-33% depending on strip width; a failure raises read latency because data must be rebuilt from parity.
- Data too hot for EC-X repeatedly breaks and re-replicates strips before they can be coded, so EC-X should target cold, stable data, not active write-hot data.
- Inline compression is applied as data drains from oplog to the extent store, so the guest write is acknowledged from oplog before compression work is performed.
- Deduplication fingerprints data with SHA-1 during inline ingest and Curator performs the actual capacity dedupe as a background scan; it benefits full-clone VDI sharing a common base image and requires enough CVM memory for the fingerprint map.
- For write-cold, low-duplication backup data, enable EC-X and compression but leave dedupe off because dedupe overhead is unjustified there.
- Disable deduplication on a container when cross-VM redundancy is low while keeping inline compression enabled, to avoid wasting CVM memory on fingerprint metadata.
- ILM up-migration promotes frequently accessed cold data from HDD to SSD and the unified cache retains the working set; after the access window the data down-migrates to HDD, so a later cold pass is slow again.
- vDisk sharding (autonomous extent store / oplog sharding) distributes a single large hot vDisk's oplog and metadata across multiple nodes so one node's oplog capacity is not the bottleneck.
Domain 5: Advanced Data Protection and Disaster Recovery
- Synchronous replication acknowledges every write only after it commits at the remote site (RPO 0), so applying it to latency-tolerant tiers adds needless write latency; split tiers into separate protection policies (sync for the database, async for the web tier).
- NearSync uses lightweight snapshots (LWS) recorded in oplog and shipped frequently over an async-capable link to deliver RPOs as low as ~1 minute, bridging the gap when synchronous is impractical at high RTT (e.g., 40 ms).
- NearSync requires all-flash (or a flash tier sized for the LWS store) and compatible AOS versions on both source and remote; it first runs a seeding phase with standard snapshots until the LWS store sustains the sub-15-minute cadence.
- If replication bandwidth cannot keep LWS lag under threshold, NearSync falls back to async and re-attempts when it catches up; the fix is to reduce change rate or increase bandwidth/oplog capacity.
- LWS minute-granular recovery points are retained only for a bounded recent window, after which they roll up into heavier periodic snapshots to control metadata and capacity overhead.
- Metro Availability provides RPO-0 transparent storage failover; Leap Recovery Plans provide orchestrated, sequenced full-site VM failover, and the two are combined when both transparent storage failover and orchestration are required.
- A Recovery Plan needs a network mapping between source and target virtual networks (subnet/IP remapping, plus floating/external IPs for internet-facing or cloud VPC VMs); without it, VMs power on with unusable addresses.
- Recovery Plan stages guarantee ordered power-on but not service readiness, so add inter-stage delays or NGT-executed in-guest readiness scripts (e.g., confirm a database is listening or AD services are up) before dependent tiers boot.
- Run a Recovery Plan in test mode to isolate recovered VMs on a separate test network so their identical IPs do not collide with live production VMs.
- Leap IP mapping only re-addresses guest NICs via NGT; it cannot rewrite hard-coded references in application config files, which require an NGT post-recovery script.
- RTO is dominated by orchestration time, not replication frequency, so parallelize independent tiers into the same stage and trim unnecessary inter-stage delays/scripts to meet RTO.
- When VSS quiescing fails, the snapshot falls back to crash-consistent: recoverable and point-in-time consistent on disk but not transactionally quiesced, requiring database crash recovery on restart.
- Configure a synchronous policy's automatic failure handling to disable sync on link partition (allowing local writes to continue and resync on return), accepting a small RPO gap rather than blocking writes.
- After an unplanned failover, perform a failback (reverse replication) that syncs only the deltas from the DR site back to the recovered primary before migrating VMs home, rather than re-seeding everything.
- The Prism Central that orchestrates Leap must survive loss of the production site (e.g., a PC reachable from the recovery site or a paired/availability-zone PC topology), or failover cannot be driven.
Domain 6: Security and Compliance
- Native software-based encryption encrypts data inline with AES-256 as Stargate writes to the extent store and works on any qualified drives including non-SED hardware; SED encryption offloads AES to the drive controller with negligible CVM CPU overhead.
- A self-encrypting drive always encrypts its media, but in its default state the authentication key is a well-known default so the drive auto-unlocks at power-on; configuring a key manager replaces that default and provides real protection.
- The Nutanix native (local) key manager runs on the cluster and satisfies data-at-rest encryption with the fewest external dependencies; an external KMS is chosen when the security team must own keys that cluster admins cannot manage or export.
- Encrypted clusters cannot serve data without their keys: if an external KMS is unreachable after a full power outage, Stargate cannot unlock the extent store until the KMS recovers.
- For KMS availability, configure multiple KMIP endpoints (clustered/replicated KMS) across separate failure domains so a single KMS node or site loss does not block key retrieval; in Metro, a single shared KMS in one site can leave the surviving cluster unable to unlock storage.
- Rotating the key encryption key (KEK) re-wraps the existing data encryption keys (DEKs) online; bulk stored data is not decrypted and re-encrypted, so KEK rotation is non-disruptive.
- Enabling encryption on an existing cluster runs a background Curator-driven process that gradually encrypts existing data while new writes are encrypted immediately, with the cluster online throughout.
- Cryptographic (secure) erase destroys the drive's media encryption key, instantly rendering all encrypted data unrecoverable without overwriting every block.
- Registering an external KMS requires mutual TLS trust with valid client/CA certificates; an expired or invalid per-node KMS client certificate causes only that node to fail to authenticate and unlock its SEDs.
- Enable a key backup protected by a passphrase split among multiple custodians so restoring keys requires the designated custodians rather than a single administrator.
- Cluster Lockdown disables password-based SSH authentication and accepts only key-based sessions; add administrators' public SSH keys before removing default password remote login.
- The Security Configuration Management Automation (SCMA) framework runs scheduled checks and self-heals the STIG-aligned hardening baseline; Nutanix publishes the corresponding STIG content for the platform.
- Installing a Prism certificate requires importing the signed server certificate, the matching private key, and the full CA certificate chain together; omitting the intermediate breaks trust-path building for clients lacking it.
- A custom login banner can be presented on the Prism web console (Welcome Banner setting) and on the CVM SSH console (configured SSH/console banner).
- Do not firewall-block cluster-to-KMS connectivity; reliable KMS reachability is required for key operations and recovery, and severing it risks storage being unable to unlock after a restart or rekey.
Domain 7: Advanced Monitoring and Troubleshooting
- Repeated oplog episode messages with an oplog-full condition mean the write ingest rate is outrunning oplog drain to the extent store; reduce write burst size or scale SSD/throughput.
- Run logbay collect with an explicit --from anchor and --duration that brackets the incident window so only relevant logs and default tags are gathered, keeping the bundle small and the upload fast.
- A Cassandra node in kToBeDetached with failed gossip heartbeats means the dynamic ring changer (autoring) is preparing to detach the unresponsive node to protect metadata quorum, after which data rebuilds to maintain RF.
- Repeated 'Failed to acquire shutdown token' during an upgrade means another node holds the single cluster-wide shutdown token, so the operation correctly serialized and waited rather than risking two simultaneous reboots.
- To confirm reads are coming from slow media, examine read-source statistics for a high proportion served from the HDD/cold tier rather than the unified cache or SSD; a working set that outgrew SSD forces ILM down-migration.
- For a flapping Stargate, read stargate.FATAL and stargate.out for the exact crash reason, then correlate restart timestamps with hardware/disk events in hades and dmesg to find a failing or unresponsive drive.
- For Curator activity, use the Curator master page and curator logs to see full/partial scan start/stop times and tasks such as disk balancing, ILM, and garbage collection.
- Zookeeper logs and genesis.out around the same timestamps reveal leader-election events and service status transitions when diagnosing intermittent control-plane disruptions.
- A data resiliency status that is not OK after a failure means a rebuild/recovery is still in progress or extents are under-replicated, so full fault tolerance has not yet been restored.
- High per-VM CPU Ready time on VM analysis charts indicates vCPU scheduling contention; overlay VM I/O latency with outstanding I/O (queue depth) and controller latency with read-source breakdown to separate compute, storage, and locality issues.
- Low guest queue depth (single-queue, insufficient concurrent I/O) makes a workload latency-bound rather than bandwidth-bound; increase parallelism or use multi-queue (multiple vDisks/queues).
- A vDisk read served remotely means the VM is not running on the node holding its active oplog/egroups, so data locality has not yet been achieved after a move.
- A growing replication backlog and missed RPO usually mean available replication bandwidth/throughput is insufficient for the protected change rate.
- For an intermittent issue, capture the exact occurrence time and run Logbay scoped to a window bracketing the event with the relevant component tags, rather than collecting everything.
- Saturated host NIC/uplink utilization combined with high vNIC transmit/receive drops or queueing on a co-resident VM points to a network bottleneck rather than a storage or CPU problem.
Domain 8: Lifecycle, Upgrades, and Migration
- On a connected cluster LCM auto-updates its framework by downloading the manifest from release-api.nutanix.com; a proxy whitelist that omits that URL blocks the framework auto-update even when general internet access works.
- In a dark-site setup the LCM framework must be supplied locally; a 'framework cannot auto-update' pre-check failure usually means the dark-site bundle was not fully extracted, so the matching lcm_dark_site_<version> files are missing from the web server root.
- Select AOS and firmware entities (e.g., SATADOM/BMC) together in a single LCM update plan so LCM resolves dependencies, orders the work, and batches reboots so each node is taken down only once.
- LCM dependency resolution blocks a plan when a chosen firmware entity declares a minimum AOS the cluster does not meet, even if no AOS upgrade was selected, until AOS is upgraded first or added to the plan.
- LCM only detects and offers updates for components whose hardware-specific modules/payloads exist in the configured source; missing disk/HBA modules mean those entities never appear even though the hardware is present.
- For LCM to auto-recover a stuck node without data unavailability, cluster data resiliency must be OK so survivors can serve the offline node's data, and the other CVMs must reach the affected node's IPMI/BMC to drive it out of phoenix.
- If data resiliency is not OK (a disk or node already degraded/down), wait for the rebuild to complete and resiliency to return to OK before re-running the plan so LCM can safely place a node in maintenance.
- Run LCM pre-checks (Perform Inventory, then the validate/pre-check stage) to evaluate readiness without performing upgrades; also run NCC health_checks run_all and resolve FAIL/WARN before upgrading.
- A host that cannot be evacuated for a rolling upgrade usually has VMs pinned by hard affinity or holding a vGPU/passthrough device no other host can satisfy; power off or migrate those VMs to resume.
- A space pre-check evaluates physical and per-CVM/local space needed to stage upgrade payloads and metadata, which can fail even when container logical space looks ample.
- Use the LCM recovery / lcm_node_recovery procedure to clear stale in-progress operation state so the framework returns to idle and accepts new operations after a failed run.
- If a node still reports old firmware after staging, check whether its BMC requires a cold reset / AC power cycle to commit the staged firmware, then re-run inventory.
- For multi-cluster lifecycle management, stage one dark-site bundle on a central web server, point all clusters' LCM source at it, and drive inventory/updates through Prism Central LCM so a single baseline and schedule apply; phase updates in waves and validate each wave.
- In a Metro pair, confirm Metro Availability is healthy with appropriate failure handling/automatic resume before upgrading so node reboots do not trigger an unintended decouple.
- Replication and recovery between paired sites can fail when AOS version skew exceeds the supported interoperability range, so keep paired clusters within supported version bounds during phased upgrades.
NCM-MCI exam tips
- Treat every design question as a capacity-and-fault-domain math problem first: identify the largest fault domain, subtract it to get resilient capacity, then check minimum node counts (5 for RF3, 3 blocks for RF2 block awareness) before evaluating the answer.
- Memorize which affinity rules are hard versus soft: VM-host affinity is hard (blocks HA restart and maintenance-mode evacuation), VM-VM anti-affinity is soft (HA can override it). Most AHV scenario answers hinge on that distinction.
- For networking, map each bond mode to its switch requirement: balance-tcp needs LACP/LAG (MLAG/vPC across two switches), active-backup caps at one link, balance-slb needs no LAG. Then check end-to-end MTU and VLAN trunking on every hop, including inter-switch links.
- Match the DR technology to the RPO/RTO and distance: Metro for RPO-0 transparent failover, Synchronous for RPO-0 within latency limits, NearSync (all-flash, LWS) for ~1-minute RPO over async links, and Leap Recovery Plans for orchestration. RTO is about orchestration, not replication frequency.
- When a question is a troubleshooting scenario, pick the answer that names the correct log or chart for that component (stargate.FATAL/hades/dmesg for storage, Cassandra/genesis for ring changes, read-source stats for tiering, CPU Ready for scheduling) and use Logbay/NCC/LCM pre-checks scoped to the relevant window.
Study guide FAQ
How is the NCM-MCI different from the NCP-MCI exam?
NCP-MCI validates day-to-day administration of a Nutanix cluster, while NCM-MCI is the master-level exam focused on advanced design, optimization, and troubleshooting. NCM questions are heavily scenario-based and require you to justify trade-offs (RF2 vs RF3, sync vs NearSync, bond modes) and diagnose multi-component failures rather than just perform tasks.
What score do I need to pass and how long is the exam?
You need a scaled score of 3000 (on a 1000-6000 scale) to pass. The exam is 180 minutes. Because the items are hands-on, weighted lab scenarios, budget your time and do not over-invest in any single multi-part scenario.
Do I need hands-on experience, or can I pass by studying alone?
Hands-on experience is strongly recommended. The troubleshooting and lifecycle domains expect familiarity with Prism analysis charts, NCC, Logbay/logbay collect, LCM inventory and pre-checks, and reading component logs (Stargate, Cassandra, Curator, Genesis, Zookeeper). Reading alone rarely builds the intuition these scenario questions test.
Which domains carry the most weight and where should I focus?
Advanced Networking (92), Advanced Cluster Management and Design (89), Lifecycle/Upgrades/Migration (86), Advanced Monitoring and Troubleshooting (85), and Data Protection and DR (83) are the largest pools. Prioritize fault-domain/resilient-capacity design, OVS bond modes plus segmentation, and the Leap/Metro/NearSync decision matrix, since these recur across many scenarios.