NCP-MCI: Nutanix Certified Professional - Multicloud Infrastructure Study Guide
The NCP-MCI (Nutanix Certified Professional - Multicloud Infrastructure) validates your ability to deploy, administer, and troubleshoot a Nutanix multicloud environment built on AHV - covering cluster deployment, VM management, networking, distributed storage, data protection and DR, security, monitoring, and lifecycle upgrades. It is aimed at administrators and engineers who manage Nutanix clusters day to day through Prism Element and Prism Central. The exam is 120 minutes with a passing score of 700, and rewards hands-on familiarity with Prism workflows, AOS defaults, and resiliency behavior.
Domain 1: Cluster Management and Deployment
- Foundation discovers unconfigured (factory or wiped) nodes by sending IPv6 link-local multicast on the local segment; because link-local traffic is never forwarded by a router, the Foundation VM and the nodes must be on the same Layer 2 broadcast domain or discovery fails.
- Foundation images bare-metal nodes with the chosen hypervisor (AHV, ESXi, or Hyper-V) plus a selected AOS version and can create the cluster in one workflow; if a single node fails during imaging, re-image only that failed node rather than restarting the whole job.
- Foundation Central runs as a service inside Prism Central for remote, centralized imaging and cluster creation across many sites; edge nodes obtain Foundation Central's address via DHCP option 234 (or are registered through the Foundation Central API).
- A cluster is created from the CLI with 'cluster -s <CVM IPs> create' run from a CVM; the CVMs of the cluster are the targets for cluster operations.
- Redundancy Factor 2 (RF2) keeps two data copies on two different nodes and tolerates one node or drive failure; RF3 keeps three data copies, tolerates two simultaneous failures, requires a minimum of five nodes, and consumes more usable capacity.
- Replication factor is applied at the storage container level (on written data), not at the node level; RF3 on a container additionally requires the cluster to meet the minimum node and fault-tolerance prerequisites.
- Block (chassis) awareness places data replicas, metadata (Cassandra ring), and Zookeeper instances on separate physical blocks so losing one block (with its shared power and fans) does not take data offline; it engages automatically when there are enough balanced blocks.
- Rack fault tolerance (rack awareness) extends the same placement logic to entire racks so the failure of a rack does not cause data unavailability.
- Always validate hardware, hypervisor, AOS, and firmware combinations against the Nutanix Compatibility and Interoperability Matrix before deploying or upgrading.
- Each Nutanix node runs three management endpoints: the hypervisor host, the Controller VM (CVM), and the IPMI/BMC (out-of-band) interface.
- The cluster Virtual IP (VIP) is a single floating address used to reach Prism Element and cluster services; it follows the Prism leader if a CVM fails.
- Prism Element manages a single cluster; Prism Central manages one or more registered clusters and provides aggregated capacity/runway reporting plus centralized category-based management.
- Each Prism Element cluster registers with exactly one Prism Central instance; a scale-out (three-node) Prism Central deployment provides high availability for the management plane.
- If a cluster's CVMs lose connectivity to Prism Central, entity synchronization from the cluster to Prism Central is delayed or impaired while local cluster operations continue.
Domain 2: AHV Virtualization and VM Management
- In the AHV VM dialog, vCPU(s) is the number of virtual sockets and Cores Per Socket multiplies it; total logical processors = vCPU(s) x Cores Per Socket (e.g. 4 vCPU x 2 cores = 8, and 1 vCPU x 8 cores = 8).
- A VM's vDisks live in a storage container, so the container's replication factor, compression, deduplication, and erasure-coding settings govern how that VM's data is stored.
- AHV vDisks are thin-provisioned by default, consuming physical space only as data is actually written.
- ISOs are managed through the Image Service (image repository); to install an OS you add a CD-ROM device, mount the ISO, and place the CD-ROM ahead of the (empty) boot disk in the boot order, otherwise firmware tries to boot the blank disk and never reaches the installer.
- After installation, eject the ISO so the CD-ROM is empty (or remove the CD-ROM device) so the VM boots from disk on the next power cycle.
- Secure Boot is a UEFI feature, so the VM must use UEFI firmware and be powered off before Secure Boot can be enabled; UEFI without enforced signature validation boots without checking boot-component signatures.
- Firmware and boot-security changes (BIOS to UEFI, enabling Secure Boot) require the VM to be powered off first.
- For PXE/network installs, set the network adapter as the first boot device so the VM boots from the PXE server.
- Increasing VM memory takes effect online only if the guest supports memory hot-add; otherwise the VM needs a power cycle for the new memory to register.
- Cloning a VM uses redirect-on-write so clones reference the source's data blocks and only consume space for new writes - efficient for VDI golden images and rapid provisioning.
- Live Migration moves a running VM to another AHV host with no downtime; sustained host CPU or memory contention beyond a threshold over a monitoring interval triggers contention alerts.
- VM High Availability reserves compute capacity so VMs can be restarted on surviving hosts after a host failure; if no permitted host has enough capacity, the affected VM is not restarted.
- Restoring a VM from a snapshot can either overwrite the VM or create a new clone with a different name; restoring as a new clone avoids overwriting the running VM.
- AHV is the bundled, license-free Nutanix hypervisor; VM management is performed through Prism Element or Prism Central rather than a separate management server.
Domain 3: Networking
- During AHV install, AOS automatically creates the default virtual switch vs0 mapped to Open vSwitch bridge br0 on every host, with the uplinks placed in the br0-up bond; this carries CVM, host, and VM traffic.
- A virtual switch is a cluster-wide management construct that maps to an Open vSwitch (OVS) bridge on each host; bond type and uplinks are configured under Settings > Network Configuration > Virtual Switch, and AOS pushes the change to every host consistently.
- Active-Backup is the default bond mode and is fully switch-independent: only one uplink is active at a time and the others stand by for failover, so reduced aggregate bandwidth on a single active link is expected behavior.
- balance-slb (source-load-balancing) is switch-independent and hashes on source MAC to spread different VMs across both uplinks; it needs no LACP or port-channel on the upstream switch.
- balance-tcp provides true per-flow load balancing that lets a single VM exceed one NIC's bandwidth, but it requires an LACP port-channel configured on the upstream switch.
- If you set balance-tcp without a matching switch port-channel, LACP negotiation fails and the host can lose connectivity unless LACP fallback (active-backup fallback) is enabled - configure both ends together and enable fallback during the transition.
- Virtual-switch updates use a Standard (rolling) update that places each host into maintenance mode in turn to apply the change without disruption.
- Bond and uplink troubleshooting on the host uses 'ovs-vsctl' and 'ovs-appctl bond/show' against the OVS bridge (br0) running on the AHV host.
- Add a second virtual switch (e.g. vs1) backed by a new bridge (e.g. br1) with its own uplink bond when you need network separation; each host must have the dedicated, consistently named NICs available and a bond mode selected for the new uplink.
- A managed network (subnet) uses AOS IPAM and a built-in DHCP service - served by the Acropolis leader - to assign and track guest IP addresses; you set DNS servers, domain name, and other DHCP options in the managed network's DHCP settings in Prism.
- An unmanaged subnet provides only Layer 2 connectivity on a VLAN and relies on an external DHCP server or static addressing; if no external DHCP responds, guests get no address.
- VLAN tagging for a VM is configured by setting the VLAN ID (e.g. 250) on the subnet in Prism, after which AHV tags that VM's traffic on the bridge.
- Bonds should use physical interfaces of the same highest available speed (for example, group all 10 GbE NICs together rather than mixing 1 GbE and 10 GbE).
- Losing one uplink in a bond keeps traffic flowing over the remaining uplink(s) but reduces that host's aggregate uplink bandwidth until the link is restored.
Domain 4: Storage Management
- A storage pool aggregates all physical disks (SSD and HDD) across the nodes assigned to it, forming the raw capacity from which containers draw; most clusters use a single storage pool.
- A storage container is a logical construct carved from a storage pool where you set replication factor, compression, deduplication, erasure coding, and capacity policies; a vDisk is a logical object inside a container representing a VM's virtual disk.
- On AHV, a newly created container is automatically presented to every host as an available NFS-backed datastore; on ESXi the same container is mounted on the hosts as an NFS datastore.
- Create a container via Storage dashboard > Table view > + Storage Container, specifying name, storage pool, and optional Advanced Settings.
- Advertised Capacity sets the maximum logical size a container can grow to and is enforced as a hard limit - once reached, writes fail with an out-of-space error rather than silently consuming the pool.
- Reserved Capacity pre-allocates physical space from the pool exclusively for that container, so the reserved amount (e.g. 1 TB or 2 TB) is subtracted from the pool's available space and cannot be claimed by other containers.
- Compression is set under Advanced Settings; a compression delay of 0 means inline compression as data is written, while a non-zero delay defers it to a post-process pass after data ages.
- Capacity (post-process) deduplication removes duplicate copies on the persistent extent store; cache deduplication dedupes data in the unified cache (RAM/SSD read cache) to improve cache efficiency.
- Deduplication is most effective with many full-clone VMs from the same golden image (VDI); thin-provisioned redirect-on-write clones already share blocks, so written data may be far smaller than advertised (e.g. ~80 GB written vs a large logical size).
- Compression, deduplication, and advertised capacity can be modified later by editing the container in Prism, while reducing replication factor may have constraints.
- Flash-pinned (storage tier pinning) policy on a container keeps that container's data in the SSD tier for predictable low-latency performance.
- A container cannot be deleted while it still has vDisks or VM files residing on it, and it cannot be renamed once it is mounted as a datastore (the rename is blocked to keep the datastore name consistent).
- Setting RF3 on a container requires the cluster to first meet the minimum node and fault-tolerance (Redundancy Factor) requirements; then RF3 can be applied per container.
- Free space on a container reflects Advertised Capacity (e.g. a 5 TB container shows 5 TB) minus what is written, and reserved capacity guarantees a minimum while advertised capacity caps the maximum.
Domain 5: Data Protection and Disaster Recovery
- An async Protection Domain (PD) with a schedule that has no remote site selected produces local-only snapshots retained on the local cluster with no replication.
- A valid async PD requires at least one entity (VM or volume group) added to it plus a schedule defining snapshot frequency and local retention.
- A consistency group bundles one or more entities so they are all captured at a single point in time; a PD can hold multiple consistency groups, but the point-in-time guarantee applies within each consistency group.
- Add a VM's volume group to the same consistency group as the VM so application data on the volume group is captured consistently with the VM; split large sets across multiple consistency groups so each quiesces a smaller, application-aligned set.
- Application-consistent (VSS-based) snapshots require Nutanix Guest Tools (NGT) installed, enabled, and communicating on the VM; without NGT only crash-consistent snapshots are possible.
- For applications without native VSS support, supply custom pre-freeze and post-thaw scripts so the application is quiesced inside the guest before the snapshot.
- Snapshots use redirect-on-write: new writes go to new extents while the snapshot continues to reference the original unchanged data.
- Retention is enforced as a rolling count - a schedule keeping N snapshots expires the oldest each time a new one is taken (e.g. six most recent 4-hour snapshots cover roughly the last 24 hours; a 24-snapshot policy keeps the latest 24).
- Use multiple schedules with different frequencies and retentions to implement tiered grandfather-father-son retention covering both short-term and long-term recovery points.
- Restoring a PD snapshot restores all member VMs to the same point in time; restore a VM as a new clone with a different name to avoid overwriting the existing VM.
- Replication to another site requires a remote site that points to the destination cluster; the schedule's remote retention value (e.g. 10) controls how many snapshots are kept at the remote.
- After a replication outage, replication resumes by sending the most recent local snapshot and transferring only the changed data since the last successful replication.
- Add a second remote site to the same schedule to fan out a snapshot to two targets; a vStore name mapping on the remote site maps source and destination containers.
- Control replication impact with a bandwidth throttle policy that has a time-based schedule, and protect data in transit with replication encryption configured on the remote-site connection.
Domain 6: Security
- Authentication to Prism requires both a configured directory (e.g. type Active Directory, with directory URL/LDAP(S), domain, and a binding service account) and a role mapping that associates AD users or groups with a Prism role - the directory alone grants no permissions.
- The directory service account lets the cluster query the directory and resolve users and groups during authentication.
- Built-in roles (Viewer, Operator, etc.) have fixed permission sets that cannot be edited; the built-in Viewer role is read-only across all entities and is the least-privilege choice for full visibility with no mutating ability.
- Custom roles let administrators assemble specific entity permissions (e.g. view/create/update/delete on VM and Image) while withholding others such as networks and storage.
- When a user is granted multiple roles, the effective permissions are the combined (union) permissions of those roles (e.g. Viewer plus Operator).
- Prism Central supports two-factor authentication (2FA), combining standard credentials with a time-based one-time passcode, and SAML 2.0 federation by adding an external IDP (Okta, ADFS, Entra ID) under authentication settings.
- Authorization policies using categories provide scoped access so users only manage the entities matching assigned categories; RADIUS auth requires the server address, port, and a shared secret.
- Data-at-Rest Encryption is configured on the Data-at-Rest Encryption page under the cluster Settings (gear) menu and comes in two forms: SED-based (encryption done in the drive hardware controller) and software-based (encryption done in the CVM data path).
- Software-based (cluster) encryption uses AOS's encryption capability and does not require self-encrypting drives; the choice between SED and software encryption is about where the crypto operations run.
- Key management uses either the native key manager (built into the cluster, distributes key fragments across CVMs, requires a minimum cluster size for key redundancy, no external server) or an external KMIP-compliant KMS.
- With an external KMS, you generate a CSR for the CVMs, have it signed by the KMS or a CA, upload the signed certificates, and the cluster retrieves authentication keys from the KMS to unlock the drives at boot; configure multiple redundant KMIP servers replicating the same keys for availability.
- Rekey generates new key encryption keys that re-wrap the existing data encryption keys without rewriting stored data - a fast operation that does not re-encrypt all data.
- If the key encryption keys become permanently unavailable (e.g. all KMS access and copies lost), the encrypted data becomes permanently unrecoverable.
- Client authentication can require a trusted CA certificate so only clients presenting a valid certificate can connect to Prism.
Domain 7: Monitoring, Health, and Alerts
- The Home (main) dashboard is the default Prism Element landing page and aggregates widgets such as Hypervisor Summary, Storage Summary, VM Summary, Hardware, Data Resiliency Status, cluster-wide Controller IOPS/latency/throughput, and Critical Alerts and Events.
- The Data Resiliency Status widget reports whether the cluster can still tolerate its configured component failures while keeping data redundant; yellow means resiliency is degraded but not yet critical.
- The Analysis page is where you build and retain custom performance charts; add a metric chart, choose the metric, entity, and time range, and the chart persists across sessions.
- An entity chart plots a chosen metric for one or more named entities (e.g. select the Host entity type, Hypervisor CPU Usage metric, and add all three hosts to compare them on one graph).
- In Prism Central, use Manage Dashboards > New Dashboard to create a custom dashboard, arrange widgets from the gallery, and optionally set it as the default view.
- A Top List (top-N) widget ranks entities by a chosen metric in descending order to surface the heaviest resource consumers; you configure it by selecting the entity type and the metric/aggregation.
- The Controller VM (CVM) services I/O for the cluster's distributed storage; per-VM detail views show performance metrics, configuration details, alerts, and the host on which the VM runs.
- The Capacity Runway (capacity forecast) view projects how long current resources will last and when you will run out of CPU, memory, or storage based on trends.
- Reports can be scheduled to run recurringly and be emailed automatically to recipients, and can be generated in PDF and CSV output formats.
- Email alerting requires an SMTP server configured under cluster/Prism settings; use the Test option on the SMTP Server configuration page to verify delivery with a test email.
- Events are records of cluster activity or state changes, while alerts are conditions raised by an alert policy that may need attention; acknowledging an alert signals ownership but does not resolve it - it remains until the condition clears.
- To stop a recurring or noisy alert, disable or modify the corresponding alert policy in the Alert Policies configuration rather than just dismissing instances.
- Save a filtered entity list as a reusable focused list so you can reopen it without reapplying the filters each time.
- The hypervisor type and version (e.g. AHV with its version number) is reported in the host/hypervisor summary, useful for confirming compatibility before upgrades.
Domain 8: Lifecycle Management and Upgrades
- LCM Inventory scans every node and CVM, queries installed versions of AOS, AHV, BIOS, BMC, disk/HBA/NIC firmware, and software entities (NCC, Foundation, Calm), and compares them against bundles from the configured source; run inventory regularly to refresh the catalog.
- LCM performs firmware and AOS updates in a rolling, one-node-at-a-time fashion to preserve quorum and redundancy; running VMs are live-migrated off the AHV host and the CVM is gracefully shut down before that node is taken down.
- LCM pre-checks run automatically after entities are selected and before any node is modified; they validate data resiliency OK, sufficient free space, no other upgrades in progress, connectivity to the LCM source, CVM memory, and compatibility - and can abort an unsafe update. You can also run pre-checks independently first.
- Dark-site (offline) LCM: download the LCM dark-site framework bundle and the relevant update/release bundles on an internet-connected machine, extract them onto a local web server (HTTP/HTTPS), then point LCM at that URL in LCM Settings (select the local web server option and enter the URL).
- For dark-site issues, verify each CVM can reach the local web server URL and that the bundle is correctly extracted in the served directory, then run a new LCM inventory so the framework and catalog refresh from the dark-site server.
- LCM updates are staged on the node before being applied; if an update is interrupted, LCM auto-resumes the staged operation from where it left off once the node and its CVM recover.
- If LCM reports a node cannot be taken offline, data resiliency is degraded and removing the node could risk data availability - resolve resiliency first.
- Order matters: upgrade AOS first, then AHV, and Prism Central must be at a version equal to or newer than the AOS clusters it manages.
- LCM can run at the Prism Central scope to manage updates across registered clusters centrally, in addition to per-cluster (Prism Element) operation.
- A firmware update may be blocked when a dependency or compatibility requirement (such as a minimum AOS version) is not yet satisfied - upgrade the prerequisite first.
- Nutanix Cluster Check (NCC) is the health-check suite; to update only NCC, run inventory then select only the NCC entity on the Updates page and proceed.
- The one-click rolling AOS upgrade in Prism performs pre-upgrade checks to confirm the cluster is healthy and meets prerequisites so the rolling upgrade completes without disruption.
- During any rolling upgrade, VMs on the host being updated are live-migrated to other hosts so workloads stay available.
- Removing a node (or its CVM) is refused when the cluster is not in a fully resilient/fault-tolerant state, because losing a CVM could risk data availability.
NCP-MCI exam tips
- Memorize the resiliency math: RF2 = 2 copies, tolerates 1 failure; RF3 = 3 copies, tolerates 2 failures and needs a minimum of 5 nodes. Replication factor is set per storage container, and block/rack awareness needs enough balanced blocks/racks.
- Know the bond modes cold: Active-Backup (default, switch-independent, one active link), balance-slb (switch-independent, per-VM by source MAC), and balance-tcp (requires upstream LACP port-channel, true per-flow, enable LACP fallback). Many networking questions hinge on which mode needs switch configuration.
- Distinguish Advertised Capacity (hard maximum logical size) from Reserved Capacity (guaranteed minimum physical space subtracted from the pool), and inline (delay 0) vs post-process (delay > 0) compression - container-setting questions are common.
- For data protection, remember the prerequisites: NGT is required for application-consistent (VSS) snapshots, an async PD needs at least one entity plus a schedule, retention is a rolling count, and replication needs a remote site pointing at the destination cluster.
- Watch for 'powered off' and ordering requirements: Secure Boot/UEFI changes and firmware changes require the VM powered off; upgrade AOS before AHV; Prism Central must be equal to or newer than the AOS clusters it manages; and LCM does everything rolling, one node at a time.
Study guide FAQ
What are the exam logistics for the NCP-MCI?
The exam runs 120 minutes with a passing score of 700 (on a scaled basis). It is delivered by Nutanix and covers eight domains spanning cluster deployment, AHV VM management, networking, storage, data protection/DR, security, monitoring, and lifecycle management. Expect scenario-based questions that test Prism workflows and AOS resiliency behavior, not just definitions.
How much hands-on Nutanix experience do I need?
The exam targets administrators with practical experience managing AHV-based clusters. You should be comfortable navigating both Prism Element and Prism Central, creating and protecting VMs, configuring storage containers and virtual switches, running LCM upgrades, and reading the Data Resiliency Status. Time in a real cluster or the Nutanix Test Drive/lab environment is the best preparation.
What is the difference between Prism Element and Prism Central, and why does it matter on the exam?
Prism Element (PE) manages a single cluster, while Prism Central (PC) manages one or more registered clusters and adds aggregated capacity/runway reporting, category-based management, custom dashboards, reports, and centralized LCM. Each cluster registers with exactly one PC, and PC must be at a version equal to or newer than the clusters it manages. Many questions ask which tool or scope performs a given task.
Which topics carry the most weight and trip people up?
Networking (bond modes and when LACP/port-channels are required), storage container settings (advertised vs reserved capacity, inline vs post-process compression, dedup use cases), and data protection (consistency groups, NGT for application-consistent snapshots, retention counts, and replication recovery) are the highest-value and most error-prone areas. Reinforce these with the structured concepts above and lots of scenario practice questions.