DP-100: Azure Data Scientist Associate Practice Exam
Validates ability to design and run ML experiments and deploy models using Azure Machine Learning.
Practice 542 exam-style DP-100 questions with full answer explanations, then take timed mock exams that score like the real thing.
What the DP-100 exam covers
- Design and Prepare a Machine Learning Solution148 questions
- Explore Data and Train Models135 questions
- Prepare a Model for Deployment127 questions
- Deploy and Retrain a Model132 questions
Free DP-100 sample questions
A sample of 10 questions with answers and explanations. Sign up free to practice all 542.
-
Which Azure service is the primary platform for end-to-end machine learning (training, deployment, MLOps)?
- AAzure Machine LearningCorrect
- BAzure Blob Storage
- CAzure Key Vault
- DAzure DNS
✓ Correct answer: AAzure Machine Learning is the comprehensive, first-party platform from Microsoft designed to support the entire end-to-end machine learning lifecycle. It provides integrated tools for data preparation, model training (with managed compute clusters), experiment tracking, model registration and versioning, and production deployment through endpoints and MLOps capabilities. The workspace serves as the central hub that connects all these components together, enabling teams to collaborate effectively on ML projects at scale.
Why the other options are wrong- BAzure Blob Storage is incorrect because while it stores data, it is a generic storage service without ML-specific training, model management, or deployment capabilities.
- CAzure Key Vault is incorrect because it is designed for secrets and credential management rather than end-to-end machine learning platform functionality.
- DAzure DNS is incorrect because it is a domain name service for name resolution and has no involvement in machine learning operations, training, or model deployment.
-
Which deployment is best for real-time, low-latency scoring?
- AA DNS record
- BA storage lifecycle policy
- CA managed online endpointCorrect
- DA batch endpoint
✓ Correct answer: CManaged online endpoints in Azure ML provide REST APIs for synchronous, low-latency predictions on individual records. They automatically handle infrastructure (VMs, load balancing, auto-scaling), authentication (keys and tokens), and logging. Latency is minimized because the model is loaded in memory and predictions execute immediately without queueing or batch processing. Auto-scaling adjusts compute capacity based on traffic, making endpoints responsive to demand spikes. Managed endpoints are ideal for interactive applications requiring immediate predictions (recommendation engines, real-time fraud detection).
Why the other options are wrong- AA DNS record is incorrect because it maps domain names, not related to deployment architecture.
- BA storage lifecycle policy is incorrect because it manages data retention, not real-time scoring.
- DA batch endpoint is incorrect because batch processing introduces queuing delay, incompatible with real-time requirements.
-
Which storage tier choice optimizes cost for large historical training datasets accessed only occasionally?
- AAlways replicating data to every region
- BHot tier for all data regardless of access frequency
- CCool or Cold access tier for infrequently accessed blobsCorrect
- DPremium block blob for archival data
✓ Correct answer: CAzure storage access tiers provide cost-performance trade-offs: Hot tier offers maximum availability but highest per-GB monthly cost; Cool tier reduces cost for data accessed less than once monthly; Cold tier (in preview or select regions) offers the lowest cost for long-term retention accessed rarely. Storing historical training data in Cool or Cold tier reduces storage expenses by 50-70% compared to Hot tier, with acceptable retrieval delays.
Why the other options are wrong- AA is incorrect because replicating data to every region multiplies storage capacity and bandwidth costs; replication is appropriate for disaster recovery requirements, not cost optimization of historical archives.
- BB is incorrect because Hot tier is designed for frequently accessed data and carries the highest storage cost per GB; applying Hot tier to all data regardless of access patterns wastes resources on premium pricing for infrequently needed historical datasets.
- DD is incorrect because premium block blob is a different storage account type optimized for low-latency, high-transaction workloads with per-transaction charges; premium pricing is inappropriate for archival data accessed infrequently.
-
Your team must isolate the Azure ML workspace so that no inbound or outbound traffic traverses the public internet. Which design best meets this requirement?
- ADisabling local authentication on the storage account only
- BA managed virtual network (managed VNet) for the workspace with private endpoints to dependent resourcesCorrect
- CA public workspace protected only by an IP allow-list firewall rule
- DAssigning a public IP to the compute cluster and using NSG rules
✓ Correct answer: BA managed VNet isolates all workspace traffic from the public internet by creating a private network perimeter and routes all communication through private endpoints. This design prevents any inbound or outbound internet traversal because private endpoints establish direct, non-internet-routable connections to dependent Azure services (storage, key vault, container registry, etc.) via Microsoft's private backbone network.
Why the other options are wrong- AA is incorrect because disabling local authentication on only the storage account does not isolate network traffic; it only restricts authentication methods but still allows traffic across public internet paths.
- CC is incorrect because an IP allow-list firewall on a public workspace still exposes the workspace to internet routing and potential network eavesdropping; IP filtering is an access control measure, not network isolation.
- DD is incorrect because assigning a public IP to compute and using NSG rules actually exposes the compute to the internet directly and does the opposite of isolation; NSGs cannot prevent internet routing when a public IP is present.
-
You deploy an MLflow-format model to a managed online endpoint. Which statement about the scoring script and environment is correct?
- ANo custom scoring script or environment is required because Azure ML auto-generates them from the MLflow modelCorrect
- BYou must always provide a custom scoring script even for MLflow models
- CMLflow models can only be deployed to batch endpoints
- DYou must manually pin the environment because MLflow models do not record dependencies
✓ Correct answer: AMLflow-format models carry their flavor and dependency information, so Azure ML supports no-code deployment: it auto-generates the scoring script and environment from the MLflow model when deploying to a managed online (or batch) endpoint. You can still override them, but it is not required. This is a key advantage of packaging models in MLflow format.
Why the other options are wrong- BYou do not always need a custom scoring script for MLflow models; no-code deployment generates it for you.
- CMLflow models can be deployed to both online and batch endpoints, not only batch.
- DMLflow models do record their dependencies (in the conda/requirements files of the model), so you do not have to pin the environment manually.
-
What should you validate from logs immediately after a new deployment receives a small canary percentage of traffic?
- AError rate and latency of the new deployment versus the existing oneCorrect
- BThe DNS TTL of the workspace
- CThe number of Key Vault secrets
- DThe storage account's redundancy SKU
✓ Correct answer: ARight after a new deployment receives a small canary share of traffic, you validate it by comparing its error rate and latency against the existing deployment over the same window. Worse error rates or latency are the signals to halt the rollout and revert. These operational metrics are what you check first from the logs.
Why the other options are wrong- BThe DNS TTL of the workspace tells you nothing about how the new deployment is performing.
- CThe number of Key Vault secrets is unrelated to deployment health.
- DThe storage account's redundancy SKU is a durability setting, not a canary performance indicator.
-
After deploying a new model version, prediction accuracy in production drops sharply although offline test metrics were good. Monitoring shows the live input feature distributions differ from training data. What is the most likely cause?
- AThe compute cluster scaled to zero
- BData drift between training data and current production dataCorrect
- CThe endpoint is using HTTP instead of HTTPS
- DThe model registry quota is exceeded
✓ Correct answer: BStrong offline metrics but a sharp live accuracy drop, combined with monitoring showing input feature distributions diverging from the training data, is the textbook signature of data drift. The production data the model now sees no longer resembles what it was trained on, so its learned relationships no longer hold. The transport protocol, registry quota, and idle compute scaling would not change prediction quality in this way. Drift is the most likely cause and points to retraining on current data.
Why the other options are wrong- AThe compute cluster scaled to zero affects training-job availability, not the accuracy of a deployed model serving requests.
- CThe endpoint is using HTTP instead of HTTPS is a transport-security concern that does not alter prediction correctness.
- DThe model registry quota is exceeded would block registering new versions, not degrade a model already serving predictions.
-
Which practice prevents accidental leakage of sensitive feature values into experiment tracking?
- Alog only aggregate metrics/params and avoid logging raw PII as artifactsCorrect
- BLog every raw input row to the run for completeness
- CDisable encryption on the artifact store
- DStore PII in the run name
✓ Correct answer: ATo prevent sensitive feature values from leaking into experiment tracking, log only aggregate metrics and parameters and refrain from logging raw rows or PII as artifacts. This keeps personal data out of the run history while preserving useful experiment information. Minimizing what you log is the correct privacy practice.
Why the other options are wrong- BLogging every raw input row to the run captures PII in the tracking store, which is the leakage you must avoid.
- CDisabling encryption on the artifact store reduces protection rather than preventing PII from being logged.
- DStoring PII in the run name embeds sensitive data in metadata visible throughout the workspace.
-
An administrator at Tailwind Traders is planning to use datastores and datasets. Which two of the following are requirements or features of this solution? (Choose two.)
- AMLflow trackingCorrect
- BAzure Machine Learning designer
- Cautomated ML configurationCorrect
- DML pipeline design
- Edatasets and datastores
✓ Correct answer: A, CWorking with datastores and datasets is complemented by features that consume that data productively. MLflow tracking records the runs and metrics produced when jobs read registered data, and automated ML configuration drives AutoML jobs that take those datasets as input to search for models. Both are concrete features that operate on the registered data. The designer and reworded options are not the requested pairing.
Why the other options are wrong- BAzure Machine Learning designer is a visual pipeline tool and is not the tracking-or-AutoML feature paired here.
- DML pipeline design organizes workflow steps generally and is not the feature paired with data assets in this item.
- EDatasets and datastores is a reworded restatement of the topic, not a distinct paired feature.
-
A consultant is reviewing the pruning configuration at Fabrikam Inc. Which two actions should be performed to optimize the implementation? (Choose two.)
- AA/B testing modelsCorrect
- BDisable pruning monitoring
- CScoring script
- Dmodel validation
- EONNX formatCorrect
✓ Correct answer: A, EOptimizing a model-pruning configuration is supported by validating the slimmer model and packaging it efficiently. A/B testing models compares the pruned model against the original on live traffic to confirm pruning did not hurt quality, and exporting to ONNX format gives a portable, optimized representation that pairs well with a leaner model for fast inference. Both are concrete optimization actions. The disable, scoring-script, and validation options are not the requested pairing.
Why the other options are wrong- BDisable pruning monitoring removes oversight rather than optimizing the implementation.
- CScoring script defines inference logic and is not the A/B-and-ONNX pairing specified here.
- DModel validation checks correctness generally but is not the live-comparison-and-ONNX pairing requested in this item.
DP-100 practice exam FAQ
How many questions are in the DP-100 practice exam on CertGrid?
CertGrid has 542 practice questions for DP-100: Azure Data Scientist Associate, covering 4 exam domains. The real DP-100 exam has about 50 questions.
What is the passing score for DP-100?
The DP-100 exam passing score is 700, and you have about 100 minutes to complete it. CertGrid scores your practice attempts the same way so you know when you are ready.
Are these official DP-100 exam questions?
No. CertGrid is an independent practice platform. Questions are written to mirror the style and concepts of DP-100: Azure Data Scientist Associate, with full explanations, but they are not official or copied vendor exam items. They are original practice questions designed to help you genuinely learn the material.
Can I practice DP-100 for free?
Yes. You can start practicing DP-100: Azure Data Scientist Associate for free with daily practice and sample questions. Paid plans unlock full timed exams, complete explanations, and domain analytics.