Question Answering Using Embeddings-Based Search

Sep 19, 2023
AI-Powered Search
The Search-Ask method enhances GPT's question-answering capabilities by combining text search and GPT's natural language processing, making it effective for answering questions on unfamiliar topics.
Last updated
Sep 19, 2023 10:48 AM


GPT, a remarkable language model, has demonstrated its prowess in answering questions effectively. However, it's important to note that GPT's knowledge is based on the data it was trained on. So, what should you do if you want GPT to answer questions about unfamiliar topics or recent events, such as those occurring after September 2021, your non-public documents, or information from past conversations? This article introduces a two-step Search-Ask method that enables GPT to answer questions using a library of reference text.

The Challenge of Fine-Tuning

Before delving into the Search-Ask method, it's essential to understand how GPT learns and acquires knowledge. GPT learns knowledge in two primary ways:
  1. Model Weights: Fine-tuning the model on a specific training set.
  1. Model Inputs: Inserting knowledge into an input message when interacting with the model.
While fine-tuning may seem like a natural choice for teaching the model new knowledge, it is generally not recommended for imparting factual knowledge. Fine-tuning is better suited for specialized tasks or styles and is less reliable for factual recall. To illustrate this, consider model weights as a form of long-term memory. When you fine-tune a model, it's akin to studying for an exam a week in advance. However, when the actual exam arrives, the model may forget details or misremember facts it never encountered in its training data.
On the other hand, message inputs act like short-term memory. When you insert knowledge into a message, it's as if you're taking the exam with open notes. With notes in hand, the model is more likely to provide correct answers.
One limitation of text search compared to fine-tuning is that each model can only read a maximum amount of text at once. For example: • gpt-3.5-turbo can handle up to 4,096 tokens (roughly 5 pages). • gpt-4 can handle up to 8,192 tokens (around 10 pages). • gpt-4-32k can handle up to 32,768 tokens (approximately 40 pages).
You can visualize the model as a student who can only look at a few pages of notes at a time, despite having access to an extensive library of textbooks.
Therefore, to build a system capable of answering questions using vast amounts of text, a Search-Ask approach is recommended.

The Power of Search

Text can be searched in various ways, such as lexical-based search, graph-based search, or embedding-based search. This article focuses on embedding-based search because it works exceptionally well with questions, especially when questions do not precisely match the wording of the answers.
Embeddings, in this context, refer to vector representations of text that capture its meaning and context. Embeddings-based search involves finding relevant text sections by comparing the embeddings of the query and the text.
While this article demonstrates embedding-based search as a starting point, more advanced search systems may combine multiple search methods and incorporate features like popularity, recency, user history, redundancy with prior search results, click rate data, and more.
Techniques like HyDE (Hypothetical Document Expansion) can also enhance question-answering retrieval performance by first transforming questions into hypothetical answers before embedding. Additionally, GPT can potentially improve search results by automatically transforming questions into sets of keywords or search terms.

The Full Procedure

The Search-Ask method comprises three main steps:

1. Prepare Search Data

In this step, you prepare the reference data that the system will use for answering questions. This is done once per document and involves: • Collecting relevant documents or text sources. • Chunking the documents into short, self-contained sections for embedding. • Embedding each section using the OpenAI API. • Storing the embeddings (for large datasets, a vector database can be used).

2. Search

When a user asks a question, the system performs a search to find relevant text sections. This involves: • Generating an embedding for the user's query using the OpenAI API. • Ranking the text sections by their relevance to the query based on the similarity of embeddings.

3. Ask

After identifying the most relevant text sections, the system inserts them into a message for GPT and asks the question. This step involves: • Composing a message that includes the user's question and the most relevant text sections. • Sending the message to GPT. • Retrieving GPT's answer.


It's important to consider the costs associated with implementing a Search-Ask system. Since GPT is more expensive than embeddings search, the majority of costs are incurred in the third step, i.e., asking questions to GPT. The costs can vary based on the specific model used and the number of tokens processed.
As of April 2023: • For gpt-3.5-turbo, using approximately 1,000 tokens per query, it costs around $0.002 per query or approximately 500 queries per dollar. • For gpt-4, again assuming around 1,000 tokens per query, it costs around $0.03 per query or approximately 30 queries per dollar.
Actual costs may vary depending on the system's specifics and usage patterns.


Before implementing the Search-Ask method, some preliminary steps are necessary. This includes importing the required libraries and selecting the models for embedding search and question answering.

Motivating Example: GPT's Limitations

To highlight the need for the Search-Ask method, let's consider a motivating example. Suppose we ask GPT, "Which athletes won the gold medal in curling in 2022?" This question is about an event that occurred after GPT's training data cutoff date in September 2021, specifically the 2022 Winter Olympics.
When we pose this question to gpt-3.5-turbo, it responds with a lack of knowledge regarding the 2022 Winter Olympics, as expected. GPT's knowledge is limited to what it learned from its training data, which does not include events beyond that date.
However, we can provide GPT with knowledge about the 2022 Winter Olympics by inserting relevant text into the input message. In this example, we manually insert a section of a Wikipedia article about the 2022 Winter Olympics, which contains information about curling events. This enables GPT to correctly answer the question.
The rest of this article focuses on automating this process using the Search-Ask method.

1. Prepare Search Data

In the first step of the Search-Ask method, we prepare the data that the system will use for search and retrieval. This involves collecting relevant text, splitting it into sections, embedding those sections, and storing the embeddings. In our example, we've prepared a dataset of pre-embedded Wikipedia articles about the 2022 Winter Olympics to save time and effort. This dataset includes text sections and their corresponding embeddings.

2. Search

The second step is the search process, where we use the user's query to find relevant text sections from the prepared data. We've defined a search function that takes a query and a dataframe with text and embedding columns, then embeds the query and ranks text sections by their relevance to the query based on the similarity of embeddings. In our example, we've used the search function to find text sections related to the query "curling gold medal 2022 Winter Olympics."

3. Ask

With the relevant text sections identified, we can now construct a message for GPT. This message includes the user's question and the relevant text sections. GPT is then asked the question. In our example, we've defined an ask function that takes the question and the relevant text sections, constructs a message, sends it to GPT, and retrieves GPT's answer. Putting It All Together With the three steps of the Search-Ask method in place, we can now use it to answer questions that require knowledge beyond GPT's training data. This method allows us to provide GPT with the necessary context and information to answer a wide range of questions accurately. In our example, we've demonstrated how this method can be used to answer the question about the gold medalists in curling at the 2022 Winter Olympics, despite this event occurring after GPT's knowledge cutoff date.


The Search-Ask method is a powerful approach to enhance GPT's ability to answer questions accurately, especially when dealing with topics or events that fall outside its training data. By combining text search with GPT's natural language understanding capabilities, this method enables the model to provide informed responses based on relevant and up-to-date information.
Keep in mind that while this article provides a basic implementation of the Search-Ask method, there's room for customization and optimization to suit specific use cases. You can adapt and extend the method to work with larger datasets, more complex queries, and additional features to enhance its performance.
Ultimately, the Search-Ask method empowers GPT to answer questions effectively, making it a valuable tool for a wide range of applications that require access to extensive knowledge beyond GPT's initial training data.