

Certified Generative AI Engineer Associate Exam Questions
Total Questions
Last Updated
1st Try Guaranteed

Experts Verified
Question 1 Single Choice
Which library is the most appropriate for creating a multi-step workflow involving large language models (LLMs)?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: D. LangChain
Justification:
LangChain is a specialized framework designed to facilitate the development of applications powered by large language models (LLMs). It provides a standard interface for interacting with various LLMs and simplifies the creation of complex, multi-step workflows by integrating LLMs with other tools and data sources. This makes it particularly suitable for applications such as chatbots, document analysis, summarization, and code analysis. langchain.com+5github.com+5LangChain Python+5pypi.org+1LangChain Python+1Wikipedia
Analysis of Other Options:
A. Pandas:
Purpose: Pandas is a Python library primarily used for data manipulation and analysis, offering data structures like DataFrames to handle structured data efficiently.
Limitations: While excellent for data processing tasks, Pandas does not provide functionalities for integrating or managing large language models.
B. TensorFlow:
Purpose: TensorFlow is an open-source library developed by Google for building and training machine learning and deep learning models.
Limitations: Although TensorFlow can be used to create and train LLMs, it does not offer specialized tools for orchestrating multi-step workflows involving pre-trained LLMs.
C. PySpark:
Purpose: PySpark is the Python API for Apache Spark, used for large-scale data processing and analytics.
Limitations: PySpark excels in distributed data processing but does not have built-in capabilities tailored for managing or integrating large language models into application workflows.
In contrast, LangChain is explicitly designed to streamline the integration of LLMs into applications, providing the necessary abstractions and tools to build context-aware, reasoning applications that leverage LLMs effectively. github.com+2langchain.com+2langchain.com+2
Explanation
Correct Answer: D. LangChain
Justification:
LangChain is a specialized framework designed to facilitate the development of applications powered by large language models (LLMs). It provides a standard interface for interacting with various LLMs and simplifies the creation of complex, multi-step workflows by integrating LLMs with other tools and data sources. This makes it particularly suitable for applications such as chatbots, document analysis, summarization, and code analysis. langchain.com+5github.com+5LangChain Python+5pypi.org+1LangChain Python+1Wikipedia
Analysis of Other Options:
A. Pandas:
Purpose: Pandas is a Python library primarily used for data manipulation and analysis, offering data structures like DataFrames to handle structured data efficiently.
Limitations: While excellent for data processing tasks, Pandas does not provide functionalities for integrating or managing large language models.
B. TensorFlow:
Purpose: TensorFlow is an open-source library developed by Google for building and training machine learning and deep learning models.
Limitations: Although TensorFlow can be used to create and train LLMs, it does not offer specialized tools for orchestrating multi-step workflows involving pre-trained LLMs.
C. PySpark:
Purpose: PySpark is the Python API for Apache Spark, used for large-scale data processing and analytics.
Limitations: PySpark excels in distributed data processing but does not have built-in capabilities tailored for managing or integrating large language models into application workflows.
In contrast, LangChain is explicitly designed to streamline the integration of LLMs into applications, providing the necessary abstractions and tools to build context-aware, reasoning applications that leverage LLMs effectively. github.com+2langchain.com+2langchain.com+2
Question 2 Single Choice
A Generative AI Engineer is tasked with building an LLM-based question-answering system that needs to handle newly published documents on a regular basis. The engineer wants to minimize both development effort and operational costs.
Which combination of components and configuration will best meet these requirements?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: A. The system should include a prompt, a retriever, and an LLM. The retriever's output is inserted into the prompt, which is then passed to the LLM to generate answers.
Justification:
To efficiently handle newly published documents while minimizing development effort and operational costs, implementing a Retrieval-Augmented Generation (RAG) system is the most effective approach. A RAG system integrates a retriever component with a Large Language Model (LLM) to provide up-to-date and accurate responses without the need for frequent retraining.
Components of the RAG System:
Retriever: This component searches the knowledge base to identify and retrieve the most relevant documents or information related to the user's query. By maintaining an up-to-date index of documents, the retriever ensures that the system can access the latest information without necessitating retraining of the LLM.
Prompt: The retrieved information is incorporated into the prompt provided to the LLM, offering context and specifics that guide the model in generating accurate responses.
LLM: Leveraging the contextual information from the prompt, the LLM generates responses that are both relevant and informed by the most recent data.
Advantages of This Configuration:
Reduced Development Effort: By utilizing a retriever to handle newly published documents, the need for continuous retraining of the LLM is eliminated, simplifying maintenance and updates.
Cost Efficiency: Avoiding frequent retraining of large models significantly lowers computational and operational expenses.
Up-to-Date Responses: The retriever ensures that the system accesses the most current information, enhancing the relevance and accuracy of the LLM's outputs.
Analysis of Other Options:
B. The LLM must be regularly retrained with new documents to ensure it provides the most up-to-date answers.
Limitations: Regularly retraining the LLM is resource-intensive, both in terms of time and computational power, leading to increased operational costs. Additionally, this approach may not be feasible for handling rapidly changing information.
C. The question-answering system only needs prompt engineering and an LLM to generate responses.
Limitations: Relying solely on prompt engineering without a mechanism to incorporate new information means the LLM cannot access data beyond its training cut-off, resulting in outdated responses.
D. The system should include a prompt, an agent, and a fine-tuned LLM. The agent helps the LLM retrieve relevant information, which is then inserted into the prompt before the LLM generates the answer.
Limitations: While this setup introduces an agent to assist with information retrieval, it adds complexity to the system. Fine-tuning the LLM requires additional resources, and maintaining the agent may increase development and operational efforts.
Conclusion:
Implementing a RAG system that combines a retriever with an LLM, where the retriever's output is incorporated into the prompt, offers an efficient and cost-effective solution for handling newly published documents. This configuration ensures that the question-answering system remains current and accurate without incurring the high costs associated with frequent model retraining.
Reference:
Explanation
Correct Answer: A. The system should include a prompt, a retriever, and an LLM. The retriever's output is inserted into the prompt, which is then passed to the LLM to generate answers.
Justification:
To efficiently handle newly published documents while minimizing development effort and operational costs, implementing a Retrieval-Augmented Generation (RAG) system is the most effective approach. A RAG system integrates a retriever component with a Large Language Model (LLM) to provide up-to-date and accurate responses without the need for frequent retraining.
Components of the RAG System:
Retriever: This component searches the knowledge base to identify and retrieve the most relevant documents or information related to the user's query. By maintaining an up-to-date index of documents, the retriever ensures that the system can access the latest information without necessitating retraining of the LLM.
Prompt: The retrieved information is incorporated into the prompt provided to the LLM, offering context and specifics that guide the model in generating accurate responses.
LLM: Leveraging the contextual information from the prompt, the LLM generates responses that are both relevant and informed by the most recent data.
Advantages of This Configuration:
Reduced Development Effort: By utilizing a retriever to handle newly published documents, the need for continuous retraining of the LLM is eliminated, simplifying maintenance and updates.
Cost Efficiency: Avoiding frequent retraining of large models significantly lowers computational and operational expenses.
Up-to-Date Responses: The retriever ensures that the system accesses the most current information, enhancing the relevance and accuracy of the LLM's outputs.
Analysis of Other Options:
B. The LLM must be regularly retrained with new documents to ensure it provides the most up-to-date answers.
Limitations: Regularly retraining the LLM is resource-intensive, both in terms of time and computational power, leading to increased operational costs. Additionally, this approach may not be feasible for handling rapidly changing information.
C. The question-answering system only needs prompt engineering and an LLM to generate responses.
Limitations: Relying solely on prompt engineering without a mechanism to incorporate new information means the LLM cannot access data beyond its training cut-off, resulting in outdated responses.
D. The system should include a prompt, an agent, and a fine-tuned LLM. The agent helps the LLM retrieve relevant information, which is then inserted into the prompt before the LLM generates the answer.
Limitations: While this setup introduces an agent to assist with information retrieval, it adds complexity to the system. Fine-tuning the LLM requires additional resources, and maintaining the agent may increase development and operational efforts.
Conclusion:
Implementing a RAG system that combines a retriever with an LLM, where the retriever's output is incorporated into the prompt, offers an efficient and cost-effective solution for handling newly published documents. This configuration ensures that the question-answering system remains current and accurate without incurring the high costs associated with frequent model retraining.
Reference:
Question 3 Single Choice
A Generative AI Engineer has developed an LLM-based application to provide answers about internal company policies. The engineer needs to ensure the application avoids hallucinating information or leaking confidential data.
Which method is NOT suitable for preventing hallucination or data leakage?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: B. Fine-tune the model on your data, hoping it will automatically learn to avoid inappropriate outputs.
Justification:
Fine-tuning a large language model (LLM) on specific datasets can enhance its performance in certain domains. However, this process alone does not guarantee the prevention of hallucinations or the leakage of confidential information. Without explicit instructions or constraints, the model may still produce undesired outputs, as fine-tuning does not inherently teach the model to avoid inappropriate content. Therefore, relying solely on fine-tuning in hopes that the model will automatically learn to avoid such outputs is not a suitable method for preventing hallucinations or data leakage.
Analysis of Other Options:
A. Implement guardrails that filter the model’s output before presenting it to the user:
Appropriateness: Implementing guardrails involves setting up mechanisms to review and filter the model's responses, ensuring that any inappropriate or sensitive information is removed before reaching the user. This proactive approach helps in mitigating risks associated with hallucinations and data leakage.
C. Restrict access to data based on the user’s permission level:
Appropriateness: Enforcing strict data access controls ensures that users can only access information pertinent to their authorization levels. This method is crucial in preventing unauthorized access to sensitive data, thereby reducing the risk of data leakage.
D. Use a strong system prompt to ensure the model follows your guidelines:
Appropriateness: Crafting a robust system prompt provides the model with clear instructions on acceptable behaviors and content generation boundaries. This technique guides the model to adhere to desired guidelines, reducing the likelihood of producing hallucinated or inappropriate outputs.
Explanation
Correct Answer: B. Fine-tune the model on your data, hoping it will automatically learn to avoid inappropriate outputs.
Justification:
Fine-tuning a large language model (LLM) on specific datasets can enhance its performance in certain domains. However, this process alone does not guarantee the prevention of hallucinations or the leakage of confidential information. Without explicit instructions or constraints, the model may still produce undesired outputs, as fine-tuning does not inherently teach the model to avoid inappropriate content. Therefore, relying solely on fine-tuning in hopes that the model will automatically learn to avoid such outputs is not a suitable method for preventing hallucinations or data leakage.
Analysis of Other Options:
A. Implement guardrails that filter the model’s output before presenting it to the user:
Appropriateness: Implementing guardrails involves setting up mechanisms to review and filter the model's responses, ensuring that any inappropriate or sensitive information is removed before reaching the user. This proactive approach helps in mitigating risks associated with hallucinations and data leakage.
C. Restrict access to data based on the user’s permission level:
Appropriateness: Enforcing strict data access controls ensures that users can only access information pertinent to their authorization levels. This method is crucial in preventing unauthorized access to sensitive data, thereby reducing the risk of data leakage.
D. Use a strong system prompt to ensure the model follows your guidelines:
Appropriateness: Crafting a robust system prompt provides the model with clear instructions on acceptable behaviors and content generation boundaries. This technique guides the model to adhere to desired guidelines, reducing the likelihood of producing hallucinated or inappropriate outputs.
Question 4 Single Choice
A Generative AI Engineer is tasked with creating a RAG (Retrieval-Augmented Generation) application to assist a small group of internal experts in answering specific queries using an internal knowledge base. They prioritize answer quality over latency or throughput, as the user group is small and willing to wait for the most accurate responses. Due to the sensitive and confidential nature of the topics, regulatory requirements prohibit transmitting any information to third parties.
Which model is best suited to meet all of the engineer’s needs in this scenario?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: D. Llama2-70B
Justification:
In this scenario, the Generative AI Engineer requires a Retrieval-Augmented Generation (RAG) application that delivers high-quality responses, prioritizing accuracy over latency or throughput. Given the sensitive and confidential nature of the data, it's imperative to use a model that can be deployed entirely on-premises to comply with regulatory requirements prohibiting data transmission to third parties.
Analysis of Options:
A. Dolly 1.5B:
Limitations: Dolly 1.5B is a smaller-scale language model, which may not provide the depth and quality of responses required for complex queries. Its performance might not meet the high-quality standards needed for this application.
B. OpenAI GPT-4:
Limitations: While GPT-4 is a powerful language model known for generating high-quality responses, it is primarily accessible through OpenAI's cloud-based services. Utilizing GPT-4 would involve transmitting data to external servers, which conflicts with the regulatory constraints of keeping all data processing in-house.
C. BGE-large:
Limitations: BGE-large is a language model that could potentially be configured for on-premises deployment. However, specific details about its deployment capabilities and performance characteristics are less documented compared to Llama2-70B. Without clear information on its ability to operate entirely within a secure internal environment, relying on BGE-large may pose uncertainties regarding compliance with data confidentiality requirements.
D. Llama2-70B:
Advantages: Llama2-70B is a large-scale language model developed with the capability for on-premises deployment. This allows organizations to host the model within their own secure infrastructure, ensuring that sensitive data remains in-house and complies with regulatory requirements. Its substantial parameter size (70 billion) enables it to generate high-quality, contextually relevant responses, making it well-suited for applications where answer quality is paramount.
Conclusion:
Considering the need for high-quality responses and strict data confidentiality, Llama2-70B emerges as the most appropriate choice. Its ability to be deployed on-premises ensures compliance with regulatory constraints, and its extensive capacity supports the generation of accurate and detailed answers, aligning with the engineer's objectives for the RAG application.
Explanation
Correct Answer: D. Llama2-70B
Justification:
In this scenario, the Generative AI Engineer requires a Retrieval-Augmented Generation (RAG) application that delivers high-quality responses, prioritizing accuracy over latency or throughput. Given the sensitive and confidential nature of the data, it's imperative to use a model that can be deployed entirely on-premises to comply with regulatory requirements prohibiting data transmission to third parties.
Analysis of Options:
A. Dolly 1.5B:
Limitations: Dolly 1.5B is a smaller-scale language model, which may not provide the depth and quality of responses required for complex queries. Its performance might not meet the high-quality standards needed for this application.
B. OpenAI GPT-4:
Limitations: While GPT-4 is a powerful language model known for generating high-quality responses, it is primarily accessible through OpenAI's cloud-based services. Utilizing GPT-4 would involve transmitting data to external servers, which conflicts with the regulatory constraints of keeping all data processing in-house.
C. BGE-large:
Limitations: BGE-large is a language model that could potentially be configured for on-premises deployment. However, specific details about its deployment capabilities and performance characteristics are less documented compared to Llama2-70B. Without clear information on its ability to operate entirely within a secure internal environment, relying on BGE-large may pose uncertainties regarding compliance with data confidentiality requirements.
D. Llama2-70B:
Advantages: Llama2-70B is a large-scale language model developed with the capability for on-premises deployment. This allows organizations to host the model within their own secure infrastructure, ensuring that sensitive data remains in-house and complies with regulatory requirements. Its substantial parameter size (70 billion) enables it to generate high-quality, contextually relevant responses, making it well-suited for applications where answer quality is paramount.
Conclusion:
Considering the need for high-quality responses and strict data confidentiality, Llama2-70B emerges as the most appropriate choice. Its ability to be deployed on-premises ensures compliance with regulatory constraints, and its extensive capacity supports the generation of accurate and detailed answers, aligning with the engineer's objectives for the RAG application.
Question 5 Single Choice
A small startup focused on cancer research wants to create a Retrieval-Augmented Generation (RAG) application using Foundation Model APIs. Since the startup is mindful of costs but still wants to deliver a high-quality product for their customers, what would be the best approach to achieve this balance?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer:
B. Choose a smaller language model that is specifically trained for the cancer research domain.
Justification for the Correct Answer:
Choosing a smaller language model that is domain-specific (trained for cancer research) offers a cost-effective and high-quality solution for the startup’s Retrieval-Augmented Generation (RAG) application.
Cost Efficiency: Smaller models require less computational power, reducing inference costs compared to general-purpose large models.
Domain-Specific Accuracy: A model trained on cancer research-related data will be more precise and reliable in retrieving and generating medical insights.
Performance Balance: Instead of using a general-purpose large model, a smaller, specialized model improves the relevance and accuracy of responses without excessive costs.
This aligns with Databricks’ best practices for large language models (LLMs) in RAG, which emphasize using domain-specific fine-tuned models over general-purpose models for industry-specific tasks.
Reference:
Retrieval-Augmented Generation (RAG) Best Practices - Databricks
Analysis of Incorrect Options:
A. Restrict the number of documents the RAG application can search through to reduce costs.
Why Incorrect?
While reducing document search scope may save costs, it directly impacts the application's ability to provide accurate and relevant responses.
RAG relies on retrieving a comprehensive knowledge base to improve LLM-generated responses.
Better Alternative: Optimize vector search indexing to improve efficiency rather than arbitrarily restricting document access.
C. Set a limit on how many queries customers can make each day.
Why Incorrect?
Limiting user queries is a poor user experience strategy.
This approach does not optimize cost at the model level but instead restricts usability, potentially driving users away.
Better Alternative: Use query caching and optimize token usage per query.
D. Use the largest language model available, as it provides the best performance across all types of queries.
Why Incorrect?
Larger models (such as GPT-4, PaLM 2) have higher computational costs and may not be optimized for cancer research.
Not every task requires the largest model. Smaller, specialized models can outperform general-purpose models in niche areas.
Better Alternative: Choose a smaller fine-tuned model for medical data and optimize cost-performance trade-offs.
Conclusion:
Option B provides the best balance between cost and performance for a cancer research RAG application.
Using a smaller, domain-specific LLM enhances accuracy while minimizing operational expenses.
This aligns with Databricks’ recommendation of specialized fine-tuning for domain-specific RAG applications.
Explanation
Correct Answer:
B. Choose a smaller language model that is specifically trained for the cancer research domain.
Justification for the Correct Answer:
Choosing a smaller language model that is domain-specific (trained for cancer research) offers a cost-effective and high-quality solution for the startup’s Retrieval-Augmented Generation (RAG) application.
Cost Efficiency: Smaller models require less computational power, reducing inference costs compared to general-purpose large models.
Domain-Specific Accuracy: A model trained on cancer research-related data will be more precise and reliable in retrieving and generating medical insights.
Performance Balance: Instead of using a general-purpose large model, a smaller, specialized model improves the relevance and accuracy of responses without excessive costs.
This aligns with Databricks’ best practices for large language models (LLMs) in RAG, which emphasize using domain-specific fine-tuned models over general-purpose models for industry-specific tasks.
Reference:
Retrieval-Augmented Generation (RAG) Best Practices - Databricks
Analysis of Incorrect Options:
A. Restrict the number of documents the RAG application can search through to reduce costs.
Why Incorrect?
While reducing document search scope may save costs, it directly impacts the application's ability to provide accurate and relevant responses.
RAG relies on retrieving a comprehensive knowledge base to improve LLM-generated responses.
Better Alternative: Optimize vector search indexing to improve efficiency rather than arbitrarily restricting document access.
C. Set a limit on how many queries customers can make each day.
Why Incorrect?
Limiting user queries is a poor user experience strategy.
This approach does not optimize cost at the model level but instead restricts usability, potentially driving users away.
Better Alternative: Use query caching and optimize token usage per query.
D. Use the largest language model available, as it provides the best performance across all types of queries.
Why Incorrect?
Larger models (such as GPT-4, PaLM 2) have higher computational costs and may not be optimized for cancer research.
Not every task requires the largest model. Smaller, specialized models can outperform general-purpose models in niche areas.
Better Alternative: Choose a smaller fine-tuned model for medical data and optimize cost-performance trade-offs.
Conclusion:
Option B provides the best balance between cost and performance for a cancer research RAG application.
Using a smaller, domain-specific LLM enhances accuracy while minimizing operational expenses.
This aligns with Databricks’ recommendation of specialized fine-tuning for domain-specific RAG applications.
Question 6 Single Choice
A Generative AI Engineer is developing a RAG (Retrieval-Augmented Generation) application that will extract context from source documents in PDF format, which contain both text and images. They aim to implement a solution that requires minimal lines of code.
Which Python package should be utilized to extract text from these source documents?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: C. Unstructured
Justification:
To extract text from PDF documents containing both text and images with minimal lines of code, the Unstructured Python package is a suitable choice. This library is designed to preprocess text data from various file formats, including PDFs, and can handle complex documents with mixed content efficiently.
Supporting Information:
Unstructured: This Python package provides tools to preprocess text data from various file formats, including PDFs. It is designed to handle complex documents containing both text and images, making it suitable for extracting text with minimal code.
Analysis of Other Options:
A. Flask:
Limitation: Flask is a micro web framework for Python, primarily used for developing web applications. It does not provide functionalities for extracting text from PDF documents.
B. BeautifulSoup:
Limitation: BeautifulSoup is a Python library used for parsing HTML and XML documents. It is not designed to handle PDF files and cannot be used to extract text from them.
D. NumPy:
Limitation: NumPy is a fundamental package for numerical computations in Python. It offers support for large, multi-dimensional arrays and matrices but does not include capabilities for processing or extracting text from PDF files.
Conclusion:
For a Generative AI Engineer developing a Retrieval-Augmented Generation (RAG) application that requires extracting text from PDFs containing both text and images, the Unstructured Python package offers an efficient solution with minimal coding effort. Its ability to handle complex documents makes it well-suited for this task.
References:
Explanation
Correct Answer: C. Unstructured
Justification:
To extract text from PDF documents containing both text and images with minimal lines of code, the Unstructured Python package is a suitable choice. This library is designed to preprocess text data from various file formats, including PDFs, and can handle complex documents with mixed content efficiently.
Supporting Information:
Unstructured: This Python package provides tools to preprocess text data from various file formats, including PDFs. It is designed to handle complex documents containing both text and images, making it suitable for extracting text with minimal code.
Analysis of Other Options:
A. Flask:
Limitation: Flask is a micro web framework for Python, primarily used for developing web applications. It does not provide functionalities for extracting text from PDF documents.
B. BeautifulSoup:
Limitation: BeautifulSoup is a Python library used for parsing HTML and XML documents. It is not designed to handle PDF files and cannot be used to extract text from them.
D. NumPy:
Limitation: NumPy is a fundamental package for numerical computations in Python. It offers support for large, multi-dimensional arrays and matrices but does not include capabilities for processing or extracting text from PDF files.
Conclusion:
For a Generative AI Engineer developing a Retrieval-Augmented Generation (RAG) application that requires extracting text from PDFs containing both text and images, the Unstructured Python package offers an efficient solution with minimal coding effort. Its ability to handle complex documents makes it well-suited for this task.
References:
Question 7 Multiple Choice
A Generative AI Engineer is tasked with developing a chatbot to assist the internal HelpDesk Call Center team in quickly locating relevant tickets and resolving issues. While organizing the project tasks for this GenAI application, they realize it’s time to select the data sources (either from Unity Catalog volume or Delta tables) that will be used. They have several potential data sources for consideration:
call_rep_history: A Delta table with primary keys for representative_id and call_id. It tracks call resolution times using fields like call_duration and call_start_time.
transcript Volume: A Unity Catalog volume containing all call recordings as .wav files, along with text transcripts in .txt format.
call_cust_history: A Delta table with primary keys for customer_id and call_id. It monitors internal customer use of the HelpDesk for the purposes of chargeback.
call_detail: A Delta table updated hourly with snapshots of call information, including root_cause and resolution fields (although these may be empty for ongoing calls).
maintenance_schedule: A Delta table that lists both outages and scheduled maintenance downtimes for HelpDesk applications.
They need to choose sources that provide the best context for identifying the root cause and resolution of tickets.
Which two data sources should they select?
Explanation

Click "Show Answer" to see the explanation here
To assist the internal HelpDesk Call Center team in efficiently identifying root causes and resolutions for tickets, the Generative AI Engineer should select the following two data sources:
D. call_detail
E. transcript Volume
Justification:
1. call_detail:
Comprehensive Call Information: This Delta table provides detailed insights into each call, including fields like
root_causeandresolution. Accessing this information is crucial for understanding the specifics of issues encountered and the solutions applied.Timely Updates: With hourly snapshots, the data remains current, allowing the chatbot to provide up-to-date information on ongoing and resolved issues.
2. transcript Volume:
Detailed Interaction Records: Housing all call recordings (.wav files) and text transcripts (.txt files), this Unity Catalog volume offers a rich source of data. Analyzing these transcripts enables the extraction of nuanced information about customer issues and the effectiveness of responses.
Unstructured Data Management: Unity Catalog Volumes are designed to govern non-tabular datasets, making them ideal for managing unstructured data like call recordings and transcripts. docs.databricks.com+1databricks.com+1
Analysis of Other Options:
A. call_cust_history:
Focus: This Delta table monitors internal customer usage of the HelpDesk for chargeback purposes. While valuable for billing and usage analytics, it doesn't provide direct insights into the nature or resolution of specific issues.
B. maintenance_schedule:
Content: This table lists outages and scheduled maintenance downtimes for HelpDesk applications. While it offers context about system availability, it doesn't detail individual call issues or resolutions.
C. call_rep_history:
Metrics: This table tracks call resolution times, focusing on metrics like
call_durationandcall_start_time. Although useful for performance evaluations, it lacks specific information about the problems addressed or their solutions.
By integrating data from the call_detail Delta table and the transcript Volume, the chatbot can access both structured and unstructured data. This combination ensures a comprehensive understanding of issues and their resolutions, leading to more accurate and helpful assistance for the HelpDesk team.
Explanation
To assist the internal HelpDesk Call Center team in efficiently identifying root causes and resolutions for tickets, the Generative AI Engineer should select the following two data sources:
D. call_detail
E. transcript Volume
Justification:
1. call_detail:
Comprehensive Call Information: This Delta table provides detailed insights into each call, including fields like
root_causeandresolution. Accessing this information is crucial for understanding the specifics of issues encountered and the solutions applied.Timely Updates: With hourly snapshots, the data remains current, allowing the chatbot to provide up-to-date information on ongoing and resolved issues.
2. transcript Volume:
Detailed Interaction Records: Housing all call recordings (.wav files) and text transcripts (.txt files), this Unity Catalog volume offers a rich source of data. Analyzing these transcripts enables the extraction of nuanced information about customer issues and the effectiveness of responses.
Unstructured Data Management: Unity Catalog Volumes are designed to govern non-tabular datasets, making them ideal for managing unstructured data like call recordings and transcripts. docs.databricks.com+1databricks.com+1
Analysis of Other Options:
A. call_cust_history:
Focus: This Delta table monitors internal customer usage of the HelpDesk for chargeback purposes. While valuable for billing and usage analytics, it doesn't provide direct insights into the nature or resolution of specific issues.
B. maintenance_schedule:
Content: This table lists outages and scheduled maintenance downtimes for HelpDesk applications. While it offers context about system availability, it doesn't detail individual call issues or resolutions.
C. call_rep_history:
Metrics: This table tracks call resolution times, focusing on metrics like
call_durationandcall_start_time. Although useful for performance evaluations, it lacks specific information about the problems addressed or their solutions.
By integrating data from the call_detail Delta table and the transcript Volume, the chatbot can access both structured and unstructured data. This combination ensures a comprehensive understanding of issues and their resolutions, leading to more accurate and helpful assistance for the HelpDesk team.
Question 8 Single Choice
A Generative AI Engineer is working with a language model that responds to customer inquiries about product availability, using the phrases “In Stock” if the product is available and “Out of Stock” if it’s not. The engineer wants to classify call responses accurately based on customer inquiries.
Which prompt will allow the engineer to correctly label call classifications?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: D. You will be given a transcript of a customer call where the customer asks about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if it’s unavailable.
Justification:
Option D provides a clear and explicit instruction to the language model, specifying the exact responses based on product availability. This approach aligns with best practices in prompt engineering, where detailed and specific prompts guide the model to produce accurate and relevant outputs. By clearly stating the conditions under which the model should respond with "In Stock" or "Out of Stock," the engineer ensures accurate and consistent classifications of customer inquiries.
Supporting Information:
Effective prompt engineering involves crafting prompts that are clear, specific, and provide the necessary context to guide the AI model's responses. According to OpenAI's guidelines on prompt engineering, placing instructions at the beginning of the prompt and being as detailed as possible about the desired outcome enhances the model's performance. OpenAI Help Center
Analysis of Other Options:
A. Respond with “In Stock” when a customer asks about a product.
Limitation: This prompt lacks specificity regarding the product's availability status. It instructs the model to respond with "In Stock" regardless of whether the product is actually available, leading to potential inaccuracies.
B. You will be given a transcript of a customer call where they inquire about product availability. The output should be either “In Stock” or “Out of Stock.” Format the response in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.
Limitation: While this prompt specifies the desired output format, it does not provide clear criteria for determining when to label a product as "In Stock" or "Out of Stock." The lack of explicit instructions may result in inconsistent classifications.
C. Respond with “Out of Stock” when a customer asks about a product.
Limitation: Similar to option A, this prompt instructs the model to respond with "Out of Stock" irrespective of the actual availability, leading to potential inaccuracies.
By selecting option D, the engineer provides the language model with precise guidance on how to classify product availability based on customer inquiries, ensuring accurate and reliable responses.
Explanation
Correct Answer: D. You will be given a transcript of a customer call where the customer asks about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if it’s unavailable.
Justification:
Option D provides a clear and explicit instruction to the language model, specifying the exact responses based on product availability. This approach aligns with best practices in prompt engineering, where detailed and specific prompts guide the model to produce accurate and relevant outputs. By clearly stating the conditions under which the model should respond with "In Stock" or "Out of Stock," the engineer ensures accurate and consistent classifications of customer inquiries.
Supporting Information:
Effective prompt engineering involves crafting prompts that are clear, specific, and provide the necessary context to guide the AI model's responses. According to OpenAI's guidelines on prompt engineering, placing instructions at the beginning of the prompt and being as detailed as possible about the desired outcome enhances the model's performance. OpenAI Help Center
Analysis of Other Options:
A. Respond with “In Stock” when a customer asks about a product.
Limitation: This prompt lacks specificity regarding the product's availability status. It instructs the model to respond with "In Stock" regardless of whether the product is actually available, leading to potential inaccuracies.
B. You will be given a transcript of a customer call where they inquire about product availability. The output should be either “In Stock” or “Out of Stock.” Format the response in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.
Limitation: While this prompt specifies the desired output format, it does not provide clear criteria for determining when to label a product as "In Stock" or "Out of Stock." The lack of explicit instructions may result in inconsistent classifications.
C. Respond with “Out of Stock” when a customer asks about a product.
Limitation: Similar to option A, this prompt instructs the model to respond with "Out of Stock" irrespective of the actual availability, leading to potential inaccuracies.
By selecting option D, the engineer provides the language model with precise guidance on how to classify product availability based on customer inquiries, ensuring accurate and reliable responses.
Question 9 Single Choice
A Generative AI Engineer is responsible for developing an application that utilizes an open-source large language model (LLM). They require a foundational LLM that offers a large context window.
Which model would best meet this requirement?
Explanation

Click "Show Answer" to see the explanation here
Correct option: D. DBRX
Why DBRX is the best fit
Databricks’ DBRX Instruct was pretrained with up to a 32 K-token context window, giving it far more capacity for long-document or multi-document prompts than most open-source LLMs. This very large window is explicitly highlighted by Databricks as one of DBRX’s headline capabilities, making it the most suitable foundational model when “a large context window” is the primary requirement. databricks.com | huggingface.co
Why the other options are not suitable
A. DistilBERT – DistilBERT is a distilled bidirectional encoder designed for efficiency, not long-context generation. It inherits BERT’s architectural limit of 512 tokens (via
max_position_embeddings=512), which is two orders of magnitude smaller than DBRX. It therefore cannot satisfy a “large context window” requirement. huggingface.coB. MPT-30B – MosaicML’s MPT-30B does extend the context window to 8 K tokens—much larger than DistilBERT or Llama 2—but it is still only one quarter of DBRX’s 32 K window, so it is not the best choice when the goal is to maximize context length. databricks.com | huggingface.co
C. Llama 2-70B – Meta’s Llama 2 family, including the 70 B model, supports a 4 K-token window (4096 tokens). This is adequate for many tasks but is clearly smaller than both MPT-30B and, especially, DBRX, so it does not meet the “large context window” criterion as well as the alternatives. huggingface.co
Because DBRX delivers the largest native context window among the listed open-source foundational models, it is the clear choice for scenarios that hinge on handling very long prompts or extensive retrieval-augmented generations.
Explanation
Correct option: D. DBRX
Why DBRX is the best fit
Databricks’ DBRX Instruct was pretrained with up to a 32 K-token context window, giving it far more capacity for long-document or multi-document prompts than most open-source LLMs. This very large window is explicitly highlighted by Databricks as one of DBRX’s headline capabilities, making it the most suitable foundational model when “a large context window” is the primary requirement. databricks.com | huggingface.co
Why the other options are not suitable
A. DistilBERT – DistilBERT is a distilled bidirectional encoder designed for efficiency, not long-context generation. It inherits BERT’s architectural limit of 512 tokens (via
max_position_embeddings=512), which is two orders of magnitude smaller than DBRX. It therefore cannot satisfy a “large context window” requirement. huggingface.coB. MPT-30B – MosaicML’s MPT-30B does extend the context window to 8 K tokens—much larger than DistilBERT or Llama 2—but it is still only one quarter of DBRX’s 32 K window, so it is not the best choice when the goal is to maximize context length. databricks.com | huggingface.co
C. Llama 2-70B – Meta’s Llama 2 family, including the 70 B model, supports a 4 K-token window (4096 tokens). This is adequate for many tasks but is clearly smaller than both MPT-30B and, especially, DBRX, so it does not meet the “large context window” criterion as well as the alternatives. huggingface.co
Because DBRX delivers the largest native context window among the listed open-source foundational models, it is the clear choice for scenarios that hinge on handling very long prompts or extensive retrieval-augmented generations.
Question 10 Single Choice
A Generative AI Engineer is designing an agent-based LLM system for their favorite monster truck team. The system should be able to answer text-based questions about the team, look up event dates via an API, and query tables for the team's latest standings.
What is the best approach for the engineer to integrate these capabilities into the system?
Explanation

Click "Show Answer" to see the explanation here
Correct Answer: B. Create a system prompt for the agent listing the available tools, and implement an agent system that runs different calls to handle the queries.
Justification:
To design an agent-based Large Language Model (LLM) system capable of:
Answering text-based questions about the monster truck team.
Looking up event dates via an API.
Querying tables for the team's latest standings.
The most effective approach is to:
Define Available Tools in the System Prompt:
Clearly list the tools the agent can use, such as:
A function to query the event dates API.
A function to access and query the team's standings database.
A retrieval function for general information about the team.
Develop the Agent System:
Implement an agent that can interpret the user's query and decide which tool(s) to invoke based on the query's nature.
Ensure the agent can handle multiple types of queries by dynamically selecting and executing the appropriate tool.
Process Flow:
The agent receives a user query.
It analyzes the query to determine the required information.
Based on this analysis, the agent selects the appropriate tool:
For event date inquiries, it calls the event dates API.
For standings information, it queries the standings database.
For general questions, it retrieves information from the knowledge base.
The agent compiles the information and generates a response to the user.
Advantages of This Approach:
Flexibility: The agent can handle various query types by utilizing different tools as needed.
Scalability: New tools or data sources can be integrated into the system without significant restructuring.
Efficiency: By directing queries to specific tools, the system can provide accurate and relevant responses promptly.
Supporting Information:
Designing an LLM-based agent system involves orchestrating how LLM calls, data retrieval, and external actions flow together. This approach allows the agent to make dynamic decisions and interact with multiple data sources effectively. learn.microsoft.com
Additionally, integrating LLMs with external tools, such as APIs and databases, enables the creation of agents that can plan, execute, and refine actions based on user queries.
Analysis of Other Options:
A. Ingest PDF documents about the monster truck team into a vector store and use a RAG (Retrieval-Augmented Generation) architecture to query the data:
Limitation: This approach is suitable for handling unstructured textual information but does not address the need to interact with APIs or databases for real-time data retrieval, such as event dates or standings.
C. Program the LLM to respond with "RAG," "API," or "TABLE" based on the query type, then use text parsing and conditional logic to process the query:
Limitation: This method adds unnecessary complexity by requiring the LLM to output specific tokens that trigger different processing paths, making the system more prone to errors and harder to maintain.
D. Build a system prompt containing all possible event dates and standings information, and use a RAG architecture to handle general text queries while relying on the system prompt for specific data:
Limitation: Embedding all possible data in the system prompt is inefficient and impractical, especially as the data changes over time. This approach lacks scalability and does not leverage dynamic data retrieval capabilities.
Conclusion:
Implementing an agent-based system where the LLM is informed of and can utilize specific tools through a well-defined system prompt is the most effective approach. This design ensures the system can handle diverse queries by dynamically interacting with APIs and databases, providing accurate and up-to-date information to users.
Explanation
Correct Answer: B. Create a system prompt for the agent listing the available tools, and implement an agent system that runs different calls to handle the queries.
Justification:
To design an agent-based Large Language Model (LLM) system capable of:
Answering text-based questions about the monster truck team.
Looking up event dates via an API.
Querying tables for the team's latest standings.
The most effective approach is to:
Define Available Tools in the System Prompt:
Clearly list the tools the agent can use, such as:
A function to query the event dates API.
A function to access and query the team's standings database.
A retrieval function for general information about the team.
Develop the Agent System:
Implement an agent that can interpret the user's query and decide which tool(s) to invoke based on the query's nature.
Ensure the agent can handle multiple types of queries by dynamically selecting and executing the appropriate tool.
Process Flow:
The agent receives a user query.
It analyzes the query to determine the required information.
Based on this analysis, the agent selects the appropriate tool:
For event date inquiries, it calls the event dates API.
For standings information, it queries the standings database.
For general questions, it retrieves information from the knowledge base.
The agent compiles the information and generates a response to the user.
Advantages of This Approach:
Flexibility: The agent can handle various query types by utilizing different tools as needed.
Scalability: New tools or data sources can be integrated into the system without significant restructuring.
Efficiency: By directing queries to specific tools, the system can provide accurate and relevant responses promptly.
Supporting Information:
Designing an LLM-based agent system involves orchestrating how LLM calls, data retrieval, and external actions flow together. This approach allows the agent to make dynamic decisions and interact with multiple data sources effectively. learn.microsoft.com
Additionally, integrating LLMs with external tools, such as APIs and databases, enables the creation of agents that can plan, execute, and refine actions based on user queries.
Analysis of Other Options:
A. Ingest PDF documents about the monster truck team into a vector store and use a RAG (Retrieval-Augmented Generation) architecture to query the data:
Limitation: This approach is suitable for handling unstructured textual information but does not address the need to interact with APIs or databases for real-time data retrieval, such as event dates or standings.
C. Program the LLM to respond with "RAG," "API," or "TABLE" based on the query type, then use text parsing and conditional logic to process the query:
Limitation: This method adds unnecessary complexity by requiring the LLM to output specific tokens that trigger different processing paths, making the system more prone to errors and harder to maintain.
D. Build a system prompt containing all possible event dates and standings information, and use a RAG architecture to handle general text queries while relying on the system prompt for specific data:
Limitation: Embedding all possible data in the system prompt is inefficient and impractical, especially as the data changes over time. This approach lacks scalability and does not leverage dynamic data retrieval capabilities.
Conclusion:
Implementing an agent-based system where the LLM is informed of and can utilize specific tools through a well-defined system prompt is the most effective approach. This design ensures the system can handle diverse queries by dynamically interacting with APIs and databases, providing accurate and up-to-date information to users.



