Building a search bot with memory using FastAPI and Gel AI
In this tutorial we're going to walk you through building a chat bot with search capabilities using Gel and FastAPI.
FastAPI is a framework designed to help you build web apps fast. Gel is a data layer designed to help you figure out storage in your application - also fast. By the end of this tutorial, you will have tried out different aspects of using those two together.
We will start by creating an app with FastAPI, adding web search capabilities, and then putting search results through a language model to get a human-friendly answer. After that, we'll use Gel to implement chat history so that the bot remembers previous interactions with the user. We'll finish it off with semantic search-based cross-chat memory.
1. Initialize the project
We're going to start by installing uv - a Python package manager that's going to simplify environment management for us. You can follow their installation instructions or simply run:
curl -LsSf | sh
Once that is done, we can use uv to create scaffolding for our project following the documentation:
uv init searchbot \
&& cd searchbot
For now, we know we're going to need Gel and FastAPI, so let's add those
following uv's instructions on managing dependencies,
as well as FastAPI's installation docs. Running uv sync
that will create our virtual environment in a .venv
directory and ensure
it's ready. As the last step, we'll activate the environment and get started.
Every time you open a new terminal session, you should source the
environment before running python
, gel
or fastapi
uv add "fastapi[standard]" \
&& uv add gel \
&& uv sync \
&& source .venv/bin/activate
2. Get started with FastAPI
At this stage we need to follow FastAPI's tutorial to create the foundation of our app.
We're going to make a minimal web API with one endpoint that takes in a user
query as an input and echoes it as an output. First, let's make a directory
called app
in our project root, and put an empty
mkdir app && touch app/
Now let's create a file called
inside the app
directory and put
the "Hello World" example in it:
from fastapi import FastAPI
app = FastAPI()
async def root():
return {"message": "Hello World"}
To start the server, we'll run:
fastapi dev app/
Once the server gets up and running, we can make sure it works using FastAPI's
built-in UI at <>_, or manually with curl
curl -X 'GET' \
'' \
-H 'accept: application/json'
{"message":"Hello World"}
Now, to create the search endpoint we mentioned earlier, we need to pass our query as a parameter to it. We'd prefer to have it in the request's body since user messages can be long.
In FastAPI land, this is done by creating a Pydantic schema and making it the type of the input parameter. Pydantic is a data validation library for Python. It has many features, but we don't actually need to know about them for now. All we need to know is that FastAPI uses Pydantic types to automatically figure out schemas for input, as well as output.
Let's add the following to our
from pydantic import BaseModel
class SearchTerms(BaseModel):
query: str
class SearchResult(BaseModel):
response: str | None = None
Now, we can define our endpoint. We'll set the two classes we just created as the new endpoint's argument and return type."/search")
async def search(search_terms: SearchTerms) -> SearchResult:
return SearchResult(response=search_terms.query)
Same as before, we can test the endpoint using the UI, or by sending a request
with curl
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{ "query": "string" }'
{ "response": "string", }
3. Implement web search
Now that we have our web app infrastructure in place, let's add some substance to it by implementing web search capabilities.
There're many powerful feature-rich products for LLM-driven web search. But in this tutorial we're going to use a much more reliable source of real-world information that is comment threads on Hacker News. Their web API is free of charge and doesn't require an account. Below is a simple function that requests a full-text search for a string query and extracts a nice sampling of comment threads from each of the stories that came up in the result.
We are not going to cover this code sample in too much depth. Feel free to grab
it save it to app/
, or make your own.
Notice that we've created another Pydantic type called WebSource
to store
our web search results. There's no framework-related reason for that, it's just
nicer than passing dictionaries around.
import requests
from pydantic import BaseModel
from datetime import datetime
import html
class WebSource(BaseModel):
"""Type that stores search results."""
url: str | None = None
title: str | None = None
text: str | None = None
def extract_comment_thread(
comment: dict,
max_depth: int = 3,
current_depth: int = 0,
) -> list[str]:
Recursively extract comments from a thread up to max_depth.
Returns a list of formatted comment strings.
if not comment or current_depth > max_depth:
return []
results = []
# Get timestamp, author and the body of the comment,
# then pad it with spaces so that it's offset appropriately for its depth
if comment["text"]:
timestamp = datetime.fromisoformat(comment["created_at"].replace("Z", "+00:00"))
author = comment["author"]
text = html.unescape(comment["text"])
formatted_comment = f"[{timestamp.strftime('%Y-%m-%d %H:%M')}] {author}: {text}"
results.append((" " * current_depth) + formatted_comment)
# If there're children comments, we are going to extract them too,
# and add them to the list.
if comment.get("children"):
for child in comment["children"][:max_children]:
child_comments = extract_comment_thread(child, max_depth, current_depth + 1)
return results
def fetch_web_sources(query: str, limit: int = 5) -> list[WebSource]:
For a given query perform a full-text search for stories on Hacker News.
From each of the matched stories extract the comment thread and format it into a single string.
For each story return its title, url and comment thread.
search_url = ">0"
# Search for stories
response = requests.get(
"query": query,
"tags": "story",
"hitsPerPage": limit,
"page": 0,
search_result = response.json()
# For each search hit fetch and process the story
web_sources = []
for hit in search_result.get("hits", []):
item_url = f"{hit['story_id']}"
response = requests.get(item_url)
item_result = response.json()
site_url = f"{hit['story_id']}"
title = hit["title"]
comments = extract_comment_thread(item_result)
text = "\n".join(comments) if len(comments) > 0 else None
WebSource(url=site_url, title=title, text=text)
return web_sources
if __name__ == "__main__":
web_sources = fetch_web_sources("edgedb", limit=5)
for source in web_sources:
One more note: this snippet comes with an extra dependency called requests
which is a library for making HTTP requests. Let's add it by running:
uv add requests
Now, we can test our web search on its own by running it like this:
python3 app/
It's time to reflect the new capabilities in our web app.
from .web import fetch_web_sources, WebSource
async def search_web(query: str) -> list[WebSource]:
raw_sources = fetch_web_sources(query, limit=5)
return [s for s in raw_sources if s.text is not None]
Now we can update the /search
endpoint as follows:
class SearchResult(BaseModel): response: str | None = None sources: list[WebSource] | None = None"/search") async def search(search_terms: SearchTerms) -> SearchResult: web_sources = await search_web(search_terms.query) return SearchResult(response=search_terms.query) return SearchResult( response=search_terms.query, sources=web_sources )
4. Connect to the LLM
Now that we're capable of scraping text from search results, we can forward those results to the LLM to get a nice-looking summary.
There's a million different LLMs accessible via a web API (one, two, three, four to name a few), feel free to choose whichever you prefer. In this tutorial we will roll with OpenAI, primarily for how ubiquitous it is. To keep things somewhat provider-agnostic, we're going to get completions via raw HTTP requests. Let's grab API descriptions from OpenAI's API documentation, and set up LLM generation like this:
import requests
from dotenv import load_dotenv
_ = load_dotenv()
def get_llm_completion(system_prompt: str, messages: list[dict[str, str]]) -> str:
api_key = os.getenv("OPENAI_API_KEY")
url = ""
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
response =
"model": "gpt-4o-mini",
"messages": [
{"role": "developer", "content": system_prompt},
result = response.json()
return result["choices"][0]["message"]["content"]
Note that this cloud LLM API (and many others) requires a secret key to be
set as an environment variable. A common way to manage those is to use the
library in combinations with a .env
file. Feel free to
browse the readme,
to learn more. Create a file called .env
in the root directory and put
your api key in there:
Don't forget to add the new dependency to the environment:
And now we can integrate this LLM-related code with the rest of the app. First, let's set up a function that prepares LLM inputs:
async def generate_answer(
query: str,
web_sources: list[WebSource],
) -> SearchResult:
system_prompt = (
"You are a helpful assistant that answers user's questions"
+ " by finding relevant information in Hacker News threads."
+ " When answering the question, describe conversations that people have around the subject,"
+ " provided to you as a context, or say i don't know if they are completely irrelevant."
prompt = f"User search query: {query}\n\nWeb search results:\n"
for i, source in enumerate(web_sources):
prompt += f"Result {i} (URL: {source.url}):\n"
prompt += f"{source.text}\n\n"
messages = [{"role": "user", "content": prompt}]
llm_response = get_llm_completion(
search_result = SearchResult(
return search_result
Then we can plug that function into the /search
endpoint:"/search") async def search(search_terms: SearchTerms) -> SearchResult: web_sources = await search_web(search_terms.query) search_result = await generate_answer(search_terms.query, web_sources) return search_result return SearchResult( response=search_terms.query, sources=web_sources )
And now we can test the result as usual.
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{ "query": "gel" }'
5. Use Gel to implement chat history
So far we've built an application that can take in a query, fetch some Hacker News threads for it, sift through them using an LLM, and generate a nice summary.
However, right now it's hardly user-friendly since you have to speak in keywords and basically start over every time you want to refine the query. To enable a more organic multi-turn interaction, we need to add chat history and infer the query from the context of the entire conversation.
Now's a good time to introduce Gel.
In case you need installation instructions, take a look at the Quickstart. Once Gel CLI is present in your system, initialize the project like this:
gel project init --non-interactive
This command is going to put some project scaffolding inside our app, spin up a local instace of Gel, and then link the two together. From now on, all Gel-related things that happen inside our project directory are going to be automatically run on the correct database instance, no need to worry about connection incantations.
Defining the schema
The database schema in Gel is defined
declaratively. The gel project init
command has created a file called
, which we're going to use to define our types.
We obviously want to keep track of the messages, so we need to represent
those in the schema. By convention established in the LLM space, each message
is going to have a role in addition to the message content itself. We can
also get Gel to automatically keep track of message's creation time by adding
a property callled timestamp
and setting its default value to the output of the datetime_current() function. Finally, LLM messages in our search bot have
source URLs associated with them. Let's keep track of those too, by adding a
type Message {
role: str;
body: str;
timestamp: datetime {
default := datetime_current();
multi sources: str;
Messages are grouped together into a chat, so let's add that entity to our schema too.
type Chat {
multi messages: Message;
And chats all belong to a certain user, making up their chat history. One other
thing we'd like to keep track of about our users is their username, and it would
make sense for us to make sure that it's unique by using an excusive
type User {
name: str {
constraint exclusive;
multi chats: Chat;
We're going to keep our schema super simple. One cool thing about Gel is that it will enable us to easily implement advanced features such as authentication or AI down the road, but we're gonna come back to that later.
For now, this is the entire schema we came up with:
module default {
type Message {
role: str;
body: str;
timestamp: datetime {
default := datetime_current();
multi sources: str;
type Chat {
multi messages: Message;
type User {
name: str {
constraint exclusive;
multi chats: Chat;
Let's use the gel migration create
CLI command, followed by gel migrate
order to migrate to our new schema and proceed to writing some queries.
gel migration create
gel migrate
Now that our schema is applied, let's quickly populate the database with some fake data in order to be able to test the queries. We're going to explore writing queries in a bit, but for now you can just run the following command in the shell:
mkdir app/sample_data && cat << 'EOF' > app/sample_data/inserts.edgeql
# Create users first insert User { name := 'alice', }; insert User { name := 'bob', }; # Insert chat histories for Alice update User filter .name = 'alice' set { chats := { (insert Chat { messages := { (insert Message { role := 'user', body := 'What are the main differences between GPT-3 and GPT-4?', timestamp := <datetime>'2024-01-07T10:00:00Z', sources := {'arxiv:2303.08774', ''} }), (insert Message { role := 'assistant', body := 'The key differences include improved reasoning capabilities, better context understanding, and enhanced safety features...', timestamp := <datetime>'2024-01-07T10:00:05Z', sources := {'', 'arxiv:2303.08774'} }) } }), (insert Chat { messages := { (insert Message { role := 'user', body := 'Can you explain what policy gradient methods are in RL?', timestamp := <datetime>'2024-01-08T14:30:00Z', sources := {'Sutton-Barto-RL-Book-Ch13', 'arxiv:1904.12901'} }), (insert Message { role := 'assistant', body := 'Policy gradient methods are a class of reinforcement learning algorithms that directly optimize the policy...', timestamp := <datetime>'2024-01-08T14:30:10Z', sources := {'Sutton-Barto-RL-Book-Ch13', ''} }) } }) } }; # Insert chat histories for Bob update User filter .name = 'bob' set { chats := { (insert Chat { messages := { (insert Message { role := 'user', body := 'What are the pros and cons of different sharding strategies?', timestamp := <datetime>'2024-01-05T16:15:00Z', sources := {'martin-kleppmann-ddia-ch6', ''} }), (insert Message { role := 'assistant', body := 'The main sharding strategies include range-based, hash-based, and directory-based sharding...', timestamp := <datetime>'2024-01-05T16:15:08Z', sources := {'martin-kleppmann-ddia-ch6', ''} }), (insert Message { role := 'user', body := 'Could you elaborate on hash-based sharding?', timestamp := <datetime>'2024-01-05T16:16:00Z', sources := {''} }) } }) } }; EOF
This created the app/sample_data/inserts.edgeql
file, which we can now execute
using the CLI like this:
gel query -f app/sample_data/inserts.edgeql
{"id": "862de904-de39-11ef-9713-4fab09220c4a"} {"id": "862e400c-de39-11ef-9713-2f81f2b67013"} {"id": "862de904-de39-11ef-9713-4fab09220c4a"} {"id": "862e400c-de39-11ef-9713-2f81f2b67013"}
The gel query
command is one of many ways we can execute a query in Gel. Now
that we've done it, there's stuff in the database.
Let's verify it by running:
gel query "select User { name };"
{"name": "alice"} {"name": "bob"}
Writing queries
With schema in place, it's time to focus on getting the data in and out of the database.
In this tutorial we're going to write queries using EdgeQL and then use codegen to generate typesafe function that we can plug directly into out Python code. If you are completely unfamiliar with EdgeQL, now is a good time to check out the basics before proceeding.
Let's move on. First, we'll create a directory inside app
. This is where we're going to put all of the EdgeQL-related stuff.
We're going to start by writing a query that fetches all of the users. In
create a file named get_users.edgeql
and put the following query
in there:
select User { name };
Now run the code generator from the shell:
It's going to automatically locate the .edgeql
file and generate types for
it. We can inspect generated code in app.queries/
Once that is done, let's use those types to create the endpoint in
from edgedb import create_async_client
from .queries.get_users_async_edgeql import get_users as get_users_query, GetUsersResult
gel_client = create_async_client()
async def get_users() -> list[GetUsersResult]:
return await get_users_query(gel_client)
Let's verify it that works as expected:
curl -X 'GET' \
'' \
-H 'accept: application/json'
[ { "id": "862de904-de39-11ef-9713-4fab09220c4a", "name": "alice" }, { "id": "862e400c-de39-11ef-9713-2f81f2b67013", "name": "bob" } ]
While we're at it, let's also implement the option to fetch a user by their
username. In order to do that, we need to write a new query in a separate file
select User { name }
filter .name = <str>$name;
After that, we will run the code generator again by calling gel-py
. In the
app, we are going to reuse the same endpoint that fetches the list of all users.
From now on, if the user calls it without any arguments (e.g.
), they are going to receive the list of all users,
same as before. But if they pass a username as a query argument like this:
, the system will attempt to fetch a user
named bob
In order to achieve this, we're going to need to add a Query
-type argument
to our endpoint function. You can learn more about how to configure this type of
arguments in FastAPI's docs. It's default value is
going to be None
, which will enable us to implement our conditional logic:
from fastapi import Query, HTTPException
from http import HTTPStatus
from .queries.get_user_by_name_async_edgeql import (
get_user_by_name as get_user_by_name_query,
async def get_users(
username: str = Query(None),
) -> list[GetUsersResult] | GetUserByNameResult:
"""List all users or get a user by their username"""
if username:
user = await get_user_by_name_query(gel_client, name=username)
if not user:
raise HTTPException(
detail={"error": f"Error: user {username} does not exist."},
return user
return await get_users_query(gel_client)
And once again, let's verify that everything works:
curl -X 'GET' \
'' \
-H 'accept: application/json'
{ "id": "862de904-de39-11ef-9713-4fab09220c4a", "name": "alice" }
Finally, let's also implement the option to add a new user. For this, just as
before, we'll create a new file app/queries/create_user.edgeql
, add a query
to it and run code generation.
Note that in this query we've wrapped the insert
in a select
This is a common pattern in EdgeQL, that can be used whenever you would like to
get something other than object ID when you just inserted it.
insert User {
name := <str>$username
) {
In order to integrate this query into our app, we're going to add a new
endpoint. Note that this one has the same name /users
, but is for the POST
HTTP method.
from gel import ConstraintViolationError
from .queries.create_user_async_edgeql import (
create_user as create_user_query,
)"/users", status_code=HTTPStatus.CREATED)
async def post_user(username: str = Query()) -> CreateUserResult:
return await create_user_query(gel_client, username=username)
except ConstraintViolationError:
raise HTTPException(
detail={"error": f"Username '{username}' already exists."},
Once more, let's verify that the new endpoint works as expected:
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-d ''
{ "id": "20372a1a-ded5-11ef-9a08-b329b578c45c", "name": "charlie" }
This wraps things up for our user-related functionality. Of course, we now need to deal with Chats and Messages, too. We're not going to go in depth for those, since the process would be quite similar to what we've just done. Instead, feel free to implement those endpoints yourself as an exercise, or copy the code below if you are in rush.
echo 'select Chat {
messages: { role, body, sources }, user := .<chats[is User], } filter = <str>$username;' > app/queries/get_chats.edgeql && echo 'select Chat { messages: { role, body, sources }, user := .<chats[is User], } filter = <str>$username and .id = <uuid>$chat_id;' > app/queries/get_chat_by_id.edgeql && echo 'with new_chat := (insert Chat) select ( update User filter .name = <str>$username set { chats := assert_distinct(.chats union new_chat) } ) { new_chat_id := }' > app/queries/create_chat.edgeql && echo 'with user := (select User filter .name = <str>$username), chat := ( select Chat filter .<chats[is User] = user and .id = <uuid>$chat_id ) select Message { role, body, sources, chat := .<messages[is Chat] } filter .chat = chat;' > app/queries/get_messages.edgeql && echo 'with user := (select User filter .name = <str>$username), update Chat filter .id = <uuid>$chat_id and .<chats[is User] = user set { messages := assert_distinct(.messages union ( insert Message { role := <str>$message_role, body := <str>$message_body, sources := array_unpack(<array<str>>$sources) } )) }' > app/queries/add_message.edgeql
And these are the endpoint definitions, provided in bulk.
from .queries.get_chats_async_edgeql import get_chats as get_chats_query, GetChatsResult
from .queries.get_chat_by_id_async_edgeql import (
get_chat_by_id as get_chat_by_id_query,
from .queries.get_messages_async_edgeql import (
get_messages as get_messages_query,
from .queries.create_chat_async_edgeql import (
create_chat as create_chat_query,
from .queries.add_message_async_edgeql import (
add_message as add_message_query,
async def get_chats(
username: str = Query(), chat_id: str = Query(None)
) -> list[GetChatsResult] | GetChatByIdResult:
"""List user's chats or get a chat by username and id"""
if chat_id:
chat = await get_chat_by_id_query(
gel_client, username=username, chat_id=chat_id
if not chat:
raise HTTPException(
detail={"error": f"Chat {chat_id} for user {username} does not exist."},
return chat
return await get_chats_query(gel_client, username=username)"/chats", status_code=HTTPStatus.CREATED)
async def post_chat(username: str) -> CreateChatResult:
return await create_chat_query(gel_client, username=username)
async def get_messages(
username: str = Query(), chat_id: str = Query()
) -> list[GetMessagesResult]:
"""Fetch all messages from a chat"""
return await get_messages_query(gel_client, username=username, chat_id=chat_id)
For the post_messages
function we're going to do something a little bit
different though. Since this is now the primary way for the user to add their
queries to the system, it functionally superceeds the /search
endpoint we
made before. To this end, this function is where we're going to handle saving
messages, retrieving chat history, invoking web search and generating the
answer."/search") async def search(search_terms: SearchTerms) -> SearchResult: web_sources = await search_web(search_terms.query) search_result = await generate_answer(search_terms.query, web_sources) return search_result"/messages", status_code=HTTPStatus.CREATED) async def post_messages( search_terms: SearchTerms, username: str = Query(), chat_id: str = Query(), ) -> SearchResult: chat_history = await get_messages_query( gel_client, username=username, chat_id=chat_id ) _ = await add_message_query( gel_client, username=username, message_role="user", message_body=search_terms.query, sources=[], chat_id=chat_id, ) search_query = search_terms.query web_sources = await search_web(search_query) search_result = await generate_answer( search_terms.query, chat_history, web_sources ) _ = await add_message_query( gel_client, username=username, message_role="assistant", message_body=search_result.response, sources=search_result.sources, chat_id=chat_id, ) return search_result
Let's not forget to modify the generate_answer
function, so it can also be
async def generate_answer( query: str, chat_history: list[GetMessagesResult], web_sources: list[WebSource], ) -> SearchResult: system_prompt = (
prompt += f"Result {i} (URL: {source.url}):\n" prompt += f"{source.text}\n\n" messages = [{"role": "user", "content": prompt}] messages = [ {"role": message.role, "content": message.body} for message in chat_history ] messages.append({"role": "user", "content": prompt}) llm_response = get_llm_completion( system_prompt=system_prompt,
Ok, this should be it for setting up the chat history. Let's test it. First, we are going to start a new chat for our user:
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-d ''
{ "id": "20372a1a-ded5-11ef-9a08-b329b578c45c", "new_chat_id": "544ef3f2-ded8-11ef-ba16-f7f254b95e36" }
Next, let's add a couple messages and wait for the bot to respond:
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"query": "best database in existence" }'
curl -X 'POST' \
'' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"query": "gel" }'
Finally, let's check that the messages we saw are in fact stored in the chat history:
curl -X 'GET' \
'' \
-H 'accept: application/json'
In reality this workflow would've been handled by the frontend, providing the user with a nice inteface to interact with. But even without one our chatbot is almost functional by now.
Generating a Google search query
Congratulations! We just got done implementing multi-turn conversations for our search bot.
However, there's still one crucial piece missing. Right now we're simply forwarding the users message straight to the full-text search. But what happens if their message is a followup that cannot be used as a standalone search query?
Ideally what we should do is we should infer the search query from the entire conversation, and use that to perform the search.
Let's implement an extra step in which the LLM is going to produce a query for us based on the entire chat history. That way we can be sure we're progressively working on our query rather than rewriting it from scratch every time.
This is what we need to do: every time the user submits a message, we need to
fetch the chat history, extract a search query from it using the LLM, and the
other steps are going to the the same as before. Let's make the follwing
modifications to the
: first we need to create a function that
prepares LLM inputs for the search query inference.
async def generate_search_query(
query: str, message_history: list[GetMessagesResult]
) -> str:
system_prompt = (
"You are a helpful assistant."
+ " Your job is to extract a keyword search query"
+ " from a chat between an AI and a human."
+ " Make sure it's a single most relevant keyword to maximize matching."
+ " Only provide the query itself as your response."
formatted_history = "\n---\n".join(
f"{message.role}: {message.body} (sources: {message.sources})"
for message in message_history
prompt = f"Chat history: {formatted_history}\n\nUser message: {query} \n\n"
llm_response = get_llm_completion(
system_prompt=system_prompt, messages=[{"role": "user", "content": prompt}]
return llm_response
And now we can use this function in post_messages
in order to get our
search query:
class SearchResult(BaseModel): response: str | None = None search_query: str | None = None sources: list[WebSource] | None = None
) # 3. Generate a query and perform googling search_query = search_terms.query search_query = await generate_search_query(search_terms.query, chat_history) web_sources = await search_web(search_query) # 5. Generate answer
chat_history, web_sources, ) search_result.search_query = search_query # add search query to the output # to see what the bot is searching for # 6. Add LLM response to Gel _ = await add_message_query( gel_client,
Done! We've now fully integrated the chat history into out app and enabled natural language conversations. As before, let's quickly test out the improvements before moving on:
curl -X 'POST' \
'http://localhost:8000/messages?username=alice&chat_id=d4eed420-e903-11ef-b8a7-8718abdafbe1' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"query": "what are people saying about gel" }'
curl -X 'POST' \
'http://localhost:8000/messages?username=alice&chat_id=d4eed420-e903-11ef-b8a7-8718abdafbe1' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"query": "do they like it or not" }'
6. Use Gel's advanced features to create a RAG
At this point we have a decent search bot that can refine a search query over multiple turns of a conversation.
It's time to add the final touch: we can make the bot remember previous similar interactions with the user using retrieval-augmented generation (RAG).
To achieve this we need to implement similarity search across message history: we're going to create a vector embedding for every message in the database using a neural network. Every time we generate a Google search query, we're also going to use it to search for similar messages in user's message history, and inject the corresponding chat into the prompt. That way the search bot will be able to quickly "remember" similar interactions with the user and use them to understand what they are looking for.
Gel enables us to implement such a system with only minor modifications to the schema.
We begin by enabling the ai
extension by adding the following like on top of
the dbschema/default.gel
using extension ai;
… and do the migration:
gel migration create
gel migrate
Next, we need to configure the API key in Gel for whatever embedding provider
we're going to be using. As per documentation, let's open up the CLI by typing
and run the following command (assuming we're using OpenAI):
configure current database
insert ext::ai::OpenAIProviderConfig { secret := 'sk-....', }; OK: CONFIGURE DATABASE
In order to get Gel to automatically keep track of creating and updating message embeddings, all we need to do is create a deferred index like this. Don't forget to run a migration one more time!
} multi sources: str; deferred index ext::ai::index(embedding_model := 'text-embedding-3-small') on (.body); }
And we're done! Gel is going to cook in the background for a while and generate embedding vectors for our queries. To make sure nothing broke we can follow Gel's AI documentation and take a look at instance logs:
gel instance logs -I searchbot | grep
INFO 50121 searchbot 2025-01-30T14:39:53.364 httpx: HTTP Request: POST "HTTP/1.1 200 OK"
It's time to create the second half of the similarity search - the search query.
The query needs to fetch k
chats in which there're messages that are most
similar to our current message. This can be a little difficult to visualize in
your head, so here's the query itself:
user := (select User filter .name = <str>$username),
chats := (
select Chat
filter .<chats[is User] = user
and .id != <uuid>$current_chat_id
select chats {
distance := min(
messages: {
role, body, sources
order by .distance
limit <int64>$limit;
Before we can integrate this query into our Python app, we also need to add a
new dependency for the Python binding: httpx-sse
. It's enables streaming
outputs, which we're not going to use right now, but we won't be able to
create the AI client without it.
Let's place in in app/queries/search_chats.edgeql
, run the codegen and modify
our post_messages
endpoint to keep track of those similar chats.
from import create_async_ai, AsyncEdgeDBAI from .queries.search_chats_async_edgeql import ( search_chats as search_chats_query, ) class SearchResult(BaseModel): response: str | None = None search_query: str | None = None sources: list[WebSource] | None = None similar_chats: list[str] | None = None"/messages", status_code=HTTPStatus.CREATED)
search_query = await generate_search_query(search_terms.query, chat_history) web_sources = await search_web(search_query) # 4. Fetch similar chats db_ai: AsyncEdgeDBAI = await create_async_ai(gel_client, model="gpt-4o-mini") embedding = await db_ai.generate_embeddings( search_query, model="text-embedding-3-small" ) similar_chats = await search_chats_query( gel_client, username=username, current_chat_id=chat_id, embedding=embedding, limit=1, ) # 5. Generate answer search_result = await generate_answer( search_terms.query, chat_history, web_sources, similar_chats, ) search_result.search_query = search_query # add search query to the output # to see what the bot is searching for
Finally, the answer generator needs to get updated one more time, since we need to inject the additional messages into the prompt.
async def generate_answer( query: str, chat_history: list[GetMessagesResult], web_sources: list[WebSource], similar_chats: list[list[GetMessagesResult]], ) -> SearchResult: system_prompt = ( "You are a helpful assistant that answers user's questions" + " by finding relevant information in HackerNews threads." + " When answering the question, describe conversations that people have around the subject, provided to you as a context, or say i don't know if they are completely irrelevant." + " You can reference previous conversation with the user that" + " are provided to you, if they are relevant, by explicitly referring" + " to them by saying as we discussed in the past." ) prompt = f"User search query: {query}\n\nWeb search results:\n"
prompt += f"Result {i} (URL: {source.url}):\n" prompt += f"{source.text}\n\n" prompt += "Similar chats with the same user:\n" formatted_chats = [] for i, chat in enumerate(similar_chats): formatted_chat = f"Chat {i}: \n" for message in chat.messages: formatted_chat += f"{message.role}: {message.body}\n" formatted_chats.append(formatted_chat) prompt += "\n".join(formatted_chats) messages = [ {"role": message.role, "content": message.body} for message in chat_history
search_result = SearchResult( response=llm_response, sources=web_sources, similar_chats=formatted_chats, ) return search_result
And one last time, let's check to make sure everything works:
curl -X 'POST' \
'http://localhost:8000/messages?username=alice&chat_id=d4eed420-e903-11ef-b8a7-8718abdafbe1' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"query": "remember that cool db i was talking to you about?" }'