Databricks Databricks Generative AI Engineer Associate Practice Exams
Last updated on Apr 01,2025- Exam Code: Databricks Generative AI Engineer Associate
- Exam Name: Databricks Certified Generative AI Engineer Associate
- Certification Provider: Databricks
- Latest update: Apr 01,2025
A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs.
Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs?
- A . Limit the number of relevant documents available for the RAG application to retrieve from
- B . Pick a smaller LLM that is domain-specific
- C . Limit the number of queries a customer can send per day
- D . Use the largest LLM possible because that gives the best performance for any general queries
A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text.
How should the Generative Al Engineer architect their system?
- A . Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member.
- B . Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member.
- C . Create a tool to find available team members given project dates. Create a second tool that can calculate a similarity score for a combination of team member profile and the project scope. Iterate through the team members and rank by best score to select a team member.
- D . Create a tool for finding available team members given project dates. Embed team profiles into a vector store and use the project scope and filtering to perform retrieval to find the available best matched team members.
A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error.
Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain?
A)
B)
C)
D)
- A . Option A
- B . Option B
- C . Option C
- D . Option D
A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.
Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)
- A . Change embedding models and compare performance.
- B . Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.
- C . Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters.
Choose the strategy that gives the best performance metric. - D . Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.
- E . Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.
A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email.
Here’s a sample email:
They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy.
Which prompt will do that?
- A . You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format.
- B . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.
Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “[email protected]”, “order_id”: “RE987D”} - C . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format.
- D . You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format.
A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user’s query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable.
Which change could the Generative Al Engineer perform to mitigate this issue?
- A . Split the LLM output by newline characters to truncate away the summarization explanation.
- B . Tune the chunk size of news articles or experiment with different embedding models.
- C . Revisit their document ingestion logic, ensuring that the news articles are being ingested properly.
- D . Provide few shot examples of desired output format to the system and/or user prompt.
A Generative Al Engineer is tasked with developing an application that is based on an open source large language model (LLM). They need a foundation LLM with a large context window.
Which model fits this need?
- A . DistilBERT
- B . MPT-30B
- C . Llama2-70B
- D . DBRX
A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style.
Which approach will NOT improve the LLM’s response to achieve the desired response?
- A . Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style
- B . Use a neutralizer to normalize the tone and style of the underlying documents
- C . Include few-shot examples in the prompt to the LLM
- D . Fine-tune the LLM on a dataset of desired tone and style
A Generative AI Engineer is building a RAG application that will rely on context retrieved from source
documents that are currently in PDF format. These PDFs can contain both text and images. They want to develop a solution using the least amount of lines of code.
Which Python package should be used to extract the text from the source documents?
- A . flask
- B . beautifulsoup
- C . unstructured
- D . numpy
A Generative Al Engineer is building a system which will answer questions on latest stock news articles.
Which will NOT help with ensuring the outputs are relevant to financial news?
- A . Implement a comprehensive guardrail framework that includes policies for content filters tailored to the finance sector.
- B . Increase the compute to improve processing speed of questions to allow greater relevancy analysis
- C . Implement a profanity filter to screen out offensive language
- D . Incorporate manual reviews to correct any problematic outputs prior to sending to the users