AI-900: Azure AI Fundamentals Study Guide
The AI-900: Azure AI Fundamentals exam validates foundational knowledge of machine learning and AI concepts and the Azure services that implement them. It is aimed at technical and non-technical people beginning with AI on Azure, with no data-science or coding prerequisites. The exam is roughly 45 minutes, scored on a 1000-point scale with 700 to pass, and covers AI workloads and responsible AI, ML fundamentals, computer vision, NLP, and generative AI.
Domain 1: Describe AI Workloads and Considerations
- AI is software that mimics human cognitive capabilities such as visual perception, speech recognition, language understanding, and decision-making by learning patterns from data.
- Microsoft defines six responsible AI principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability.
- Fairness aims to treat all people equitably and avoid biased outcomes against groups defined by protected characteristics like gender, ethnicity, or age.
- Reliability and safety means AI systems should perform dependably, consistently, and safely even under unexpected conditions, minimizing harm.
- Privacy and security requires AI systems to protect data and comply with privacy laws, securing both training data and the model in use.
- Inclusiveness means designing AI to empower everyone, including people with disabilities, by supporting multiple input and output modalities and removing accessibility barriers.
- Transparency means making AI understandable: disclosing when AI is used, explaining how decisions are made (explainability/interpretability), and stating capabilities and limitations.
- Accountability means people remain responsible for AI systems and their impacts, with governance structures and meaningful human oversight, especially in high-stakes domains like healthcare and finance.
- Common AI workload types include machine learning (prediction/forecasting), computer vision, natural language processing, anomaly detection, knowledge mining, document intelligence, and generative AI.
- Computer vision workloads interpret images and video; examples include defect inspection on a manufacturing line and reading text from scanned images.
- Natural language processing workloads understand and generate text or speech; examples include analyzing customer sentiment and powering chatbots.
- Anomaly detection identifies unusual patterns that deviate from the norm, such as spikes in server CPU that may indicate a security breach or fraudulent transactions.
- Knowledge mining extracts information and insights from large volumes of unstructured data, such as indexing and searching thousands of scanned legal documents or research papers.
- Conversational AI enables natural-language interactions through chatbots, virtual assistants, and voice interfaces, combining NLP, ML, and dialogue management to understand intent and respond.
Domain 2: Describe Fundamental Principles of Machine Learning
- Supervised learning trains on labeled data where each example pairs input features with a known correct output, so the model learns the input-to-output relationship.
- The two main types of supervised learning are regression (predicts continuous numeric values like price or temperature) and classification (predicts discrete categories like spam or not-spam).
- Unsupervised learning uses unlabeled data; clustering is the key example, grouping similar data points by feature similarity without predefined categories.
- A feature is an individual measurable input variable or attribute used to make predictions (for example, square footage or number of bedrooms); the value being predicted is the label.
- The typical ML workflow is: collect and prepare data, train the model on labeled data, evaluate on validation/test data, then deploy and monitor.
- Datasets are split into training, validation, and test sets; the held-out test set estimates how well the model generalizes to unseen data, while the validation set tunes hyperparameters.
- Overfitting occurs when a model memorizes training-data noise and detail and fails to generalize; splitting data into train/test sets helps detect it.
- Common data-preparation steps include feature normalization (scaling numeric values to a common range) and handling missing values through imputation or removal.
- Classification metrics include accuracy, precision, recall, and F1 score, computed from the confusion matrix of true/false positives and negatives.
- In a confusion matrix, a false positive is predicting the positive class when the actual class is negative, and a false negative is predicting negative when the actual class is positive.
- Regression metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the coefficient of determination (R-squared).
- Azure Machine Learning Automated ML (AutoML) automatically tries multiple algorithms and hyperparameter combinations to find the best-performing model for a dataset.
- Azure Machine Learning designer provides a visual drag-and-drop interface to build training and inference pipelines with little or no code.
- Deep learning uses multi-layer neural networks and underpins advanced workloads; it is a subset of machine learning that excels at images, language, and other complex unstructured data.
Domain 3: Describe Features of Computer Vision Workloads
- Image classification assigns a single category label to an entire image based on its overall content.
- Object detection identifies multiple objects in an image and returns, for each, a class label, a confidence score, and bounding box coordinates indicating where it is located.
- Semantic segmentation classifies every pixel into a category, revealing the exact shape and boundaries of objects, providing finer detail than object detection bounding boxes.
- Optical character recognition (OCR) extracts printed and handwritten text from images and scanned documents and converts it into machine-readable text.
- Azure AI Vision (formerly Computer Vision) provides image analysis including captions, tags and categories, object detection, OCR, and adult/racy content detection.
- Azure AI Custom Vision lets users with minimal ML expertise train custom models for image classification or object detection using their own labeled images.
- Azure AI Face service provides face detection, face verification (1:1 comparison of two faces), and analysis of attributes such as head pose and presence of glasses, plus facial landmarks like eye positions.
- Azure AI Document Intelligence (formerly Form Recognizer) extracts text, key-value pairs, and tables from documents like invoices, receipts, and forms.
- Face detection locates faces and returns their location and landmarks, while face verification determines whether two face images belong to the same person.
- Image captioning generates a natural-language description summarizing the content of an image.
- Transfer learning uses a pre-trained model as a starting point and fine-tunes it with a smaller domain-specific dataset, reducing the data and time needed to train.
- Object detection performs both classification (what the object is) and localization (where it is), making it essential for autonomous vehicles, surveillance, and automated inspection.
- A bounding box is a rectangle, typically defined by coordinates such as x, y, width, and height (or top-left and bottom-right points), that marks the location of a detected object.
- Image analysis can generate tags and descriptions for images embedded in documents and supports content moderation by flagging adult or racy material.
Domain 4: Describe Features of NLP Workloads
- Azure AI Language is the unified service for text NLP, providing sentiment analysis, key phrase extraction, named entity recognition, language detection, and question answering.
- Sentiment analysis classifies text as positive, negative, or neutral and returns confidence scores at both the document and sentence level.
- Key phrase extraction automatically identifies the main topics and important concepts in a body of text without predefined lists or training.
- Named entity recognition (NER) identifies and categorizes entities in text such as people, places, organizations, and dates.
- Language detection identifies the language of input text, which is useful for routing content to the correct downstream processing.
- Custom question answering builds a knowledge base from FAQ pages, manuals, and documents and returns answers to user questions in conversational language.
- Conversational Language Understanding (CLU) builds custom models that identify user intents and extract entities from natural-language utterances to power chatbots and assistants.
- An utterance is a sample phrase or sentence a user might say or type to express an intent, used to train a CLU model; an intent is the goal behind it and entities are the relevant details.
- Azure AI Speech provides speech-to-text (transcription of spoken audio into text) and text-to-speech (synthesis of natural-sounding speech from text using neural voices).
- Speech-to-text supports real-time streaming and recorded audio across many languages and accents; text-to-speech enables verbal responses for assistants and accessibility.
- Azure AI Translator performs text translation across 100+ languages and supports real-time scenarios such as translating a chat message or a product manual.
- Building a voice-enabled assistant typically combines speech-to-text to transcribe requests, CLU to interpret intent, and text-to-speech to deliver spoken responses.
- A complete chatbot solution commonly pairs Azure AI Language (question answering) with Azure AI Bot Service to host and channel the bot.
- NLP workloads broadly cover entity and key phrase extraction, sentiment, language detection and translation, speech-to-text and text-to-speech, and intent recognition.
Domain 5: Describe Features of Generative AI Workloads
- Generative AI creates new original content (text, code, and images) in response to prompts, rather than only classifying or predicting from existing data.
- Large language models (LLMs) are deep learning models trained on massive, diverse text datasets using the transformer architecture to understand and generate natural language.
- Azure OpenAI Service provides access to OpenAI models such as GPT-4, GPT-4o, and DALL-E with Azure's enterprise security, compliance, private networking, and identity management.
- DALL-E is the Azure OpenAI model purpose-built to generate images from natural-language text descriptions.
- Prompt engineering is the practice of designing and refining the input text given to a model to elicit accurate, relevant, and desired outputs.
- A system message sets the model's role, behavior, personality, and constraints before user interaction, ensuring consistent and appropriate responses.
- The temperature parameter controls randomness and creativity of output; lower temperature yields more focused, deterministic responses while higher temperature yields more varied output.
- Grounding connects a model's responses to specific, verifiable information sources to improve accuracy and traceability.
- Retrieval-Augmented Generation (RAG) retrieves relevant data from external sources (such as an organization's documents) and adds it to the prompt to ground responses in that data.
- Embeddings are numerical vector representations of text that capture semantic meaning and enable similarity search, which underpins retrieval in RAG.
- Generative models can hallucinate, producing content that is harmful, biased, or factually inaccurate, so outputs should be reviewed and grounded.
- Azure AI Content Safety provides content filters that detect and block harmful content such as hate, violence, sexual, and self-harm categories in both prompts and responses.
- Microsoft states that customer data submitted to Azure OpenAI Service is not used to train or improve Microsoft or third-party models.
- Effective prompting techniques include specifying the desired output format and constraints, including few-shot examples, and using system messages to define the model's role and scope.
AI-900 exam tips
- Memorize Microsoft's six responsible AI principles and be able to match a scenario to the right one; fairness (bias), reliability and safety (consistent/safe), and transparency (explainability) are tested heavily.
- Map each capability to the correct Azure service: Azure AI Vision and Custom Vision for images, Azure AI Language for text, Azure AI Speech for audio, Azure AI Document Intelligence for forms, and Azure OpenAI for generative AI.
- Know the difference between regression (continuous numbers) and classification (categories), and between supervised (labeled) and unsupervised/clustering (unlabeled) learning.
- Distinguish the three vision tasks precisely: image classification labels the whole image, object detection adds bounding boxes for multiple objects, and semantic segmentation labels every pixel.
- Expect true/false and multiple-response (choose two or more) questions; read carefully, eliminate clearly wrong options, and do not skip questions since there is no penalty for guessing.
Study guide FAQ
Do I need coding or data-science experience to pass AI-900?
No. AI-900 is a fundamentals exam focused on concepts and on identifying which Azure AI service fits a scenario. There is no coding required, and the math behind algorithms is not tested in depth.
What score do I need to pass and how long is the exam?
You need 700 on a scale of 1 to 1000. The exam is about 45 minutes long with roughly 40 to 60 questions, including multiple-choice, multiple-response, and true/false formats.
How much of the exam is about generative AI?
Generative AI is a substantial and growing portion of the exam. Know LLMs and the transformer basis, Azure OpenAI Service and its models (GPT, DALL-E), prompt engineering and system messages, temperature, grounding, RAG, embeddings, and Azure AI Content Safety.
What is the difference between Azure AI Vision and Azure AI Custom Vision?
Azure AI Vision is a prebuilt service offering ready-made capabilities like captioning, tagging, OCR, and object detection. Azure AI Custom Vision lets you train your own image classification or object detection model on your own labeled images when the prebuilt models do not fit your domain.