CertGrid
Nutanix Certification

NCM-MCI: Nutanix Certified Master - Multicloud Infrastructure Practice Exam

Validates advanced expertise designing, optimizing, and troubleshooting Nutanix multicloud infrastructure - advanced cluster design, AHV and VM management, networking, performance, DR, security, and upgrades.

Practice 512 exam-style NCM-MCI questions with full answer explanations, then take timed mock exams that score like the real thing.

512
Practice questions
60
On the real exam
700
Passing score
120 min
Exam length

What the NCM-MCI exam covers

Free NCM-MCI sample questions

A sample of 10 questions with answers and explanations. Sign up free to practice all 512.

  1. Question 1Advanced Cluster Management and Design

    A customer runs a 4-node RF2 cluster where each node holds 8 TB of physical SSD capacity (32 TB raw, 16 TB usable after RF2). They have provisioned guest VMs consuming 13 TB of effective data and are pushing close to 85% utilization. During a planned firmware upgrade one node is taken offline and the cluster begins to alert that it cannot maintain its resiliency target. What is the MOST accurate root cause?

    • AThe cluster did not reserve enough rebuild capacity for a single-node failure, so with one node down the remaining nodes cannot host a second full RF2 copy of all dataCorrect
    • BCurator stopped running because the cluster crossed 85% utilization and disabled all background scans
    • CRF2 requires a minimum of 5 nodes, so a 4-node cluster can never tolerate a node being offline
    • DStargate switched to synchronous oplog draining, which doubled the space consumed by every write
    ✓ Correct answer: A

    Resilient capacity is the amount of usable space a cluster can fill while still being able to re-protect every extent back to its configured RF after losing a fault domain. On a 4-node RF2 cluster you must keep roughly one node's worth of usable space free so Curator can recreate the second replica of the downed node's extents onto the three survivors. At 85% utilization the working set already exceeds that resilient-capacity threshold, so when firmware takes a node offline there is nowhere to rebuild the missing copies and Prism correctly raises a resiliency-target alert. The fix is to lower utilization below resilient capacity or add a node, not to change any background service behavior.

    Why the other options are wrong
    • BCurator does not stop scanning at 85% utilization; high usage triggers ILM down-migration and disk balancing to free SSD, and Curator keeps running its partial and full scans, so a halted-Curator theory does not explain the resiliency alert.
    • CRF2 is supported on a 3-node minimum, so the claim that five nodes are required is false; a 4-node cluster can tolerate a single node loss whenever enough free rebuild space exists.
    • DStargate drains the oplog to the extent store asynchronously and coalesces writes, so it never doubles the steady-state space footprint, making oplog draining irrelevant to running out of resilient capacity.
  2. Question 2Advanced Cluster Management and Design

    An administrator places an AHV host into maintenance mode for memory replacement. They expect the host's user VMs to be live-migrated automatically, but several VMs fail to evacuate and the maintenance-mode task hangs. What is the MOST likely reason specific VMs cannot be evacuated?

    • AThose VMs have a host affinity policy pinning them to the host, or they are agent/CVM-class VMs that cannot live-migrateCorrect
    • BThe cluster RF is set too high to permit live migration during maintenance
    • CCurator must complete a full scan before any VM can be migrated off the host
    • DMaintenance mode always powers off all user VMs, so migration is not expected
    ✓ Correct answer: A

    When a host enters maintenance mode, AHV tries to live-migrate its user VMs to peers, but a VM bound by a host affinity policy to that specific host cannot move, and agent or CVM-class VMs are not eligible for live migration by design. Any such VM blocks the evacuation and hangs the maintenance-mode task. The fix is to relax the affinity policy or handle the non-migratable VMs explicitly. The cause is VM-specific constraints, not a cluster-wide setting.

    Why the other options are wrong
    • BThe cluster RF governs data replication and has no role in permitting or blocking live migration, so a high RF does not prevent VMs from evacuating.
    • CLive migration does not depend on a Curator full scan completing; migration is handled by the hypervisor and Acropolis, so a pending scan is not the blocker.
    • DMaintenance mode does not power off user VMs by default; it live-migrates them to peers, so the expectation of automatic migration is correct and forced power-off is not the behavior.
  3. Question 3Advanced AHV and VM Management

    A cluster hosts several GPU VMs scheduled by Lazan. An administrator notices that GPU VMs are not being evenly distributed and some are failing to power on even though aggregate GPU capacity exists. The most probable cause relates to how vGPU profiles are bound to physical GPUs. Which statement correctly describes this constraint?

    • AA single physical GPU can typically host only one vGPU profile type at a time, so mixing different profiles on the same GPU is not allowed and fragments available capacityCorrect
    • BEvery physical GPU can simultaneously host any mix of profile types without restriction
    • CvGPU profiles are assigned per cluster, not per physical GPU, so profile type is irrelevant to placement
    • DLazan ignores GPU profile types entirely and balances purely on CPU load
    ✓ Correct answer: A

    NVIDIA vGPU enforces homogeneous profiles per physical GPU in the standard time-sliced model: once a GPU begins serving a given profile (frame-buffer size), all instances on that GPU must use the same profile. This means a GPU already running a large profile cannot also serve small profiles, so aggregate GPU memory can appear free while no slot of the requested profile exists, causing power-on failures. Lazan must place a VM on a GPU that is either empty or already serving the matching profile with a free slot. Recognizing this homogeneity rule explains the fragmentation symptom.

    Why the other options are wrong
    • BEvery physical GPU can simultaneously host any mix of profile types is wrong because the standard vGPU model requires one profile type per GPU.
    • CvGPU profiles are assigned per cluster is wrong because the profile binding is to the physical GPU and is central to placement.
    • DLazan ignores GPU profile types entirely is wrong because Lazan is explicitly profile-aware when placing GPU VMs.
  4. Question 4Advanced Networking

    An architect is choosing between using multiple bridges/virtual switches versus a single virtual switch with multiple VLANs to separate management, backplane, and guest traffic on a cluster with eight NICs per node. The customer's top priority is physical fault-domain isolation and the ability to assign different NIC speeds to different traffic classes. Which choice best meets the priority?

    • AA single virtual switch carrying all VLANs, because VLAN separation alone provides identical fault isolation to separate switches
    • BMultiple virtual switches, each bound to a distinct set of physical uplinks, so traffic classes have independent bonds, independent fault domains, and can use NICs of different speedsCorrect
    • CA single virtual switch with LACP so all eight NICs aggregate and self-isolate traffic
    • DMultiple virtual switches all sharing the same eight-NIC bond to maximize aggregate bandwidth
    ✓ Correct answer: B

    The customer's priorities are physical fault isolation and assigning different NIC speeds to different traffic classes, which a single VLAN-separated switch cannot deliver because all traffic still rides the same physical bond and fault domain. Using multiple virtual switches, each bound to its own uplinks, gives management, backplane, and guest traffic independent bonds and fault domains, and lets each class use NICs matched to its needs. This is the design that meets both priorities directly. VLAN separation alone is logical, not physical.

    Why the other options are wrong
    • AA single virtual switch with VLANs does not provide identical fault isolation; all VLANs share one physical bond, so a bond failure affects every class at once.
    • CAggregating all eight NICs into one LACP switch shares a single fault domain and cannot assign different NIC speeds to different classes, missing both priorities.
    • DMultiple virtual switches sharing the same eight-NIC bond defeats the purpose, since they would all share one fault domain and one bandwidth pool.
  5. Question 5Advanced Storage and Performance

    An administrator enables inline compression on a container and later asks why write latency did not measurably increase even though data is being compressed. Which explanation is MOST accurate regarding where compression occurs in the I/O path?

    • AInline compression is applied as data is drained from oplog to the extent store rather than blocking the initial oplog acknowledgment, so the guest write is acknowledged from oplog before compression work is performedCorrect
    • BCompression is applied synchronously before the oplog acknowledges the write, but modern CPUs make it free
    • CCompression only ever runs as a Curator post-process job, so there is no such thing as inline compression on Nutanix
    • DCompression happens in the unified cache during reads, not on the write path at all
    ✓ Correct answer: A

    On Nutanix, inline compression does not sit synchronously in front of the guest acknowledgment; the write is first acknowledged from oplog for low latency, and compression is applied as the data drains from oplog to the extent store. Because the compression work happens during the drain rather than before the ack, the guest does not see added write latency. This is why enabling inline compression did not measurably raise write latency. The term inline refers to compressing on the way to the extent store, not blocking the initial write.

    Why the other options are wrong
    • BCompression is not applied synchronously before the oplog acknowledges; the write is acked from oplog first, and CPU cost does not make synchronous compression free.
    • CInline compression does exist on Nutanix and runs during the oplog-to-extent-store drain; it is not only a Curator post-process job.
    • DCompression on the write path acts during the drain to the extent store, not in the unified cache during reads, so this misplaces where it happens.
  6. Question 6Advanced Data Protection and Disaster Recovery

    A financial services customer runs a 3-tier application across two Nutanix AOS clusters connected by Leap (synchronous-capable hardware, 1ms RTT). The business requires an RPO of 0 for the database tier but tolerates 1 hour of data loss for the web tier. The architect places all VMs in a single Synchronous protection policy with a 0-second recovery point objective. After deployment, the web tier experiences periodic write latency spikes. What is the BEST design correction?

    • ASplit the tiers into separate protection policies: keep the database tier Synchronous (RPO 0) and move the web tier to an Asynchronous policy with a 1-hour RPOCorrect
    • BIncrease the oplog size on the source cluster so synchronous replication no longer blocks web-tier writes
    • CConvert the entire policy to NearSync with a 1-minute recovery point to reduce write latency for all tiers
    • DDisable Curator scans during business hours so replication bandwidth is reserved for the synchronous stream
    ✓ Correct answer: A

    Synchronous replication acknowledges every write only after it is committed at the remote site, so applying RPO 0 to the web tier needlessly subjects its writes to that round-trip and causes the latency spikes. Since only the database tier requires zero data loss, the right design is to separate the tiers: the database stays Synchronous for RPO 0, while the web tier moves to an Asynchronous policy with a 1-hour RPO that matches its actual tolerance. This removes the synchronous penalty from the web tier while still meeting the database requirement. Matching protection to each tier's RPO is the correct correction.

    Why the other options are wrong
    • BIncreasing oplog size does not remove the synchronous write acknowledgment round-trip that is causing the web-tier latency, so it does not address the design flaw.
    • CConverting everything to NearSync gives the database a 1-minute RPO instead of the required RPO 0, failing the zero-data-loss requirement for that tier.
    • DDisabling Curator scans does not change the synchronous acknowledgment latency on web-tier writes, so it does not resolve the spikes.
  7. Question 7Advanced Data Protection and Disaster Recovery

    A company performs quarterly DR readiness validation using Nutanix Leap recovery plans. They want to confirm applications actually come up correctly at the recovery site WITHOUT impacting the running production VMs and WITHOUT consuming a production failover. Which recovery plan capability should they use?

    • AExecute a test failover, which clones the latest recovered snapshots into an isolated test network at the recovery siteCorrect
    • BExecute a planned failover during the maintenance window and immediately fail back
    • CExecute an unplanned failover with the source cluster powered off to simulate disaster
    • DManually clone production VMs at the source and power them on in a sandbox VLAN
    ✓ Correct answer: A

    A test failover is purpose-built for non-disruptive DR validation: it brings up the latest recovered snapshots as cloned VMs in an isolated test network at the recovery site without touching production or consuming a real failover. The production VMs keep running and replication continues, while the team verifies that applications actually come up. This satisfies the requirement to validate recovery without impacting production. It is the recovery-plan capability designed for exactly this quarterly check.

    Why the other options are wrong
    • BA planned failover actually moves the workload to the recovery site and back, which is disruptive and consumes a real failover, contrary to the requirement.
    • CAn unplanned failover with the source powered off is a real disaster simulation that disrupts production, not a non-disruptive validation.
    • DManually cloning and powering on production VMs in a sandbox bypasses the recovery plan and does not validate the actual recovered snapshots or the orchestration.
  8. Question 8Security and Compliance

    A service account used by an infrastructure-as-code pipeline authenticates to the Prism Central v3 API. Security policy requires that this account cannot use the interactive Prism web console and is restricted to a narrowly scoped automation role. Which approach is the MOST appropriate?

    • ACreate a dedicated directory or local user mapped only to a purpose-built custom role with just the API operations the pipeline needs, and exclude it from any console-capable role mappingsCorrect
    • BUse the built-in admin account for the pipeline but rotate its API key after every run
    • CShare a single Prism Admin service account across all pipelines and rely on source-IP allowlisting at the load balancer to limit exposure
    • DDisable the web console globally while the pipeline runs and re-enable it afterward
    ✓ Correct answer: A

    An automation service account should follow least privilege and be limited to programmatic access. Creating a dedicated user mapped solely to a custom role containing only the API operations the pipeline requires grants exactly what is needed, and excluding it from console-capable role mappings keeps it out of the interactive Prism web console. This isolates the pipeline's identity and scope. It is the appropriate design for a narrowly scoped automation account.

    Why the other options are wrong
    • BReusing the built-in admin account gives the pipeline full privileges far beyond what it needs, violating least privilege regardless of API key rotation.
    • CSharing one Prism Admin account across pipelines removes accountability and grants excessive rights, and source-IP allowlisting does not narrow the role.
    • DDisabling the web console globally during runs disrupts all administrators and is not a per-account control, so it does not meet the requirement.
  9. Question 9Advanced Monitoring and TroubleshootingSelect all that apply

    A noisy-neighbor investigation must rule out CPU contention before blaming storage. Which TWO Prism observations together would justify concluding CPU (not storage) is the bottleneck for a victim VM? (Choose TWO)

    • AThe victim VM shows high CPU Ready/wait time while its storage latency stays within normal rangeCorrect
    • BAnother VM on the same host shows sustained high host-attributed CPU usageCorrect
    • CThe cluster's total usable capacity is below 20%
    • DThe container's compression savings ratio dropped this week
    ✓ Correct answer: A, B

    Elevated CPU Ready/wait on the victim with normal storage latency localizes the bottleneck to CPU scheduling rather than the storage path. Pairing that with an aggressor VM that exhibits sustained high host-attributed CPU on the same node identifies the noisy neighbor causing the scheduling contention. Together these two observations form a defensible root cause for CPU contention.

    Why the other options are wrong
    • CThe cluster's usable capacity being below 20% is a capacity/space concern that does not establish CPU contention for a specific VM.
    • DA drop in the container's compression savings ratio reflects data compressibility changes and is unrelated to CPU scheduling on the host.
  10. Question 10Lifecycle, Upgrades, and Migration

    A complex LCM update fails midway with the node successfully back in the hypervisor and the CVM up, but the overall LCM task is stuck and subsequent inventory/updates are blocked with a message that an operation is already in progress. Cluster data services are healthy. What is the recommended recovery action?

    • AUse the LCM recovery/lcm_node_recovery procedure to clear the stale in-progress operation state so the framework returns to idle and accepts new operationsCorrect
    • BReboot every CVM in the cluster simultaneously to clear all task state
    • CDestroy and recreate the cluster to remove the orphaned LCM task
    • DDelete the Cassandra entry for the LCM task by editing the metadata directly
    ✓ Correct answer: A

    When an LCM operation fails but the node is back in the hypervisor with a healthy CVM, the framework can be left holding a stale in-progress lock that blocks further inventory and updates. The supported fix is to run the LCM recovery procedure (lcm_node_recovery), which clears the orphaned operation state and returns the framework to idle so it accepts new operations. Cluster data services being healthy means this is purely a stuck task state. The recovery procedure is the targeted, non-destructive remediation.

    Why the other options are wrong
    • BRebooting every CVM simultaneously would take the whole cluster down and is not how the stuck LCM task is cleared; the recovery procedure handles it.
    • CDestroying and recreating the cluster is catastrophic and wildly disproportionate to clearing an orphaned LCM task.
    • DHand-editing Cassandra to delete the LCM task entry is unsupported and dangerous; the LCM recovery procedure exists precisely to clear this state safely.

NCM-MCI practice exam FAQ

How many questions are in the NCM-MCI practice exam on CertGrid?

CertGrid has 512 practice questions for NCM-MCI: Nutanix Certified Master - Multicloud Infrastructure, covering 8 exam domains. The real NCM-MCI exam has about 60 questions.

What is the passing score for NCM-MCI?

The NCM-MCI exam passing score is 700, and you have about 120 minutes to complete it. CertGrid scores your practice attempts the same way so you know when you are ready.

Are these official NCM-MCI exam questions?

No. CertGrid is an independent practice platform. Questions are written to mirror the style and concepts of NCM-MCI: Nutanix Certified Master - Multicloud Infrastructure, with full explanations, but they are not official or copied vendor exam items. They are original practice questions designed to help you genuinely learn the material.

Can I practice NCM-MCI for free?

Yes. You can start practicing NCM-MCI: Nutanix Certified Master - Multicloud Infrastructure for free with daily practice and sample questions. Paid plans unlock full timed exams, complete explanations, and domain analytics.