HTTP API​
All Gel server HTTP endpoints require authentication (added in 4.0), such as HTTP Basic Authentication with Gel username and password.
Embeddings​
POST
: https://<gel-host>:<port>/branch/<branch-name>/ai/embeddings
Generates text embeddings using the specified embeddings model.
Request body​
-
inputs
(array of strings, or single string, required): The text items to use as the basis for embeddings generation. -
model
(string, required): The name of the embedding model to use. You may use any of the supported embedding models. -
dimensions
(number, optional): The number of dimensions to truncate to. -
user
(string, optional): A user identifier for the request.
Example request​
$
curl --user <username>:<password> --json '{\
"inputs": ["What color is the sky on Mars?"],\
"model": "text-embedding-3-small"\
}' http://localhost:10931/branch/main/ai/embeddings
Response​
-
HTTP status: 200 OK
-
Content-Type: application/json
-
Body:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [-0.009434271, 0.009137661]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
The embedding
property is shown here with only two values for brevity,
but an actual response would contain many more values.
RAG​
POST
: https://<gel-host>:<port>/branch/<branch-name>/ai/rag
Performs retrieval-augmented text generation using the specified model based on the provided text query and the database content selected using similarity search.
Request body​
-
context
(object, required): Settings that define the context of the query.-
-
query
(string, required): Specifies an expression to determine the relevant objects and index to serve as context for text generation. You may set this to any expression that produces a set of objects, even if it is not a standalone query. -
variables
(object, optional): A dictionary of variables for use in the context query. -
globals
(object, optional): A dictionary of globals for use in the context query. -
max_object_count
(number, optional): Maximum number of objects to retrieve; default is 5.
-
-
model
(string, required): The name of the text generation model to use. -
query
(string, required): The query string used as the basis for text generation. -
stream
(boolean, optional): Specifies whether the response should be streamed. Defaults to false. -
prompt
(object, optional): Settings that define a prompt. Omit to use the default prompt.-
-
name
(string, optional): Name of predefined prompt. -
id
(string, optional): ID of predefined prompt. -
custom
(array of objects, optional): Custom prompt messages, each containing arole
andcontent
. If noname
orid
was provided, the custom messages provided here become the prompt. If one of those was provided, these messages will be added to that existing prompt.-
-
role
(string): "system", "user", "assistant", or "tool". -
content
(string | array): Content of the message.-
-
For
role: "system"
: Must be a string. -
For
role: "user"
: Must be an array of content blocks, e.g.,[{"type": "text", "text": "..."}]
. -
For
role: "assistant"
: Must be a string (the assistant's text response). May optionally includetool_calls
. -
For
role: "tool"
: Must be a string (the result of the tool call). Requirestool_call_id
.
-
-
tool_call_id
(string, optional): Identifier for the tool call whose result this message represents (required ifrole: "tool"
). -
tool_calls
(array, optional): Array of tool calls requested by the assistant (used ifrole: "assistant"
). Each object should follow the format:{"id": "...", "type": "function", "function": {"name": "...", "arguments": "..."}}
. Arguments should be a JSON string.
-
-
-
temperature
(number, optional): Sampling temperature. -
top_p
(number, optional): Nucleus sampling parameter. -
max_tokens
(number, optional): Maximum tokens to generate. -
seed
(number, optional): Random seed. -
safe_prompt
(boolean, optional): Enable safety features. -
top_k
(number, optional): Top-k sampling parameter. -
logit_bias
(object, optional): Token biasing. -
logprobs
(number, optional): Return token log probabilities. -
user
(string, optional): User identifier. -
tools
(array, optional): A list of tools the model may call. Each tool has atype
("function") and afunction
object withname
,description
(optional), andparameters
(JSON schema). Example:[{"type": "function", "function": {"name": "get_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}]
Example request​
$
curl --user <username>:<password> --json '{\
"query": "What color is the sky on Mars?",\
"model": "gpt-4-turbo-preview",\
"context": {"query":"Knowledge"}\
}' http://<gel-host>:<port>/branch/main/ai/rag
Response​
-
HTTP status: 200 OK
-
Content-Type: application/json
-
Body: A JSON object containing the RAG response details.
Copy{ "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "model": "gpt-4-turbo-preview", "text": "The sky on Mars typically appears butterscotch or reddish due to the fine dust particles suspended in the atmosphere.", "finish_reason": "stop", "usage": { "prompt_tokens": 50, "completion_tokens": 30, "total_tokens": 80 }, "logprobs": null, "tool_calls": null }
-
id
(string): Unique identifier for the chat completion. -
model
(string): The model used for the chat completion. -
text
(string | null): The main text content of the response message. -
finish_reason
(string | null): The reason the model stopped generating tokens (e.g., "stop", "length", "tool_calls"). -
usage
(object | null): Token usage statistics for the request. -
logprobs
(object | null): Log probability information for the generated tokens (if requested). -
tool_calls
(array | null): Any tool calls requested by the model. Each element containsid
,type
("function"),name
, andargs
(parsed JSON object).
-
Error response​
-
HTTP status: 400 Bad Request
-
Content-Type: application/json
-
Body:
Copy{ "message": "missing required 'query' in request 'context' object", "type": "BadRequestError" }
Streaming response (SSE)​
When the stream
parameter is set to true
, the server uses Server-Sent
Events
(SSE) to stream responses. Here is a detailed breakdown of the typical
sequence and structure of events in a streaming response:
-
HTTP Status: 200 OK
-
Content-Type: text/event-stream
-
Cache-Control: no-cache
The stream consists of a sequence of five events, each encapsulating part of the response in a structured format:
-
Message start
-
Event type:
message_start
-
Data: Starts a message, specifying identifiers, roles, and initial usage.
Copy{ "type": "message_start", "message": { "id": "<message_id>", "role": "assistant", "model": "<model_name>", "usage": { "prompt_tokens": 10 } } }
-
-
Content block start
-
Event type:
content_block_start
-
Data: Marks the beginning of a new content block (either text or a tool call).
Copy{ "type": "content_block_start", "index": 0, "content_block": { "type": "text", "text": "" } }
Or for a tool call:
Copy{ "type": "content_block_start", "index": 0, "content_block": { "id": "<tool_call_id>", "type": "tool_use", "name": "<function_name>", "args": "{..." } }
-
-
Content block delta
-
Event type:
content_block_delta
-
Data: Incrementally updates the content, appending more text or tool arguments. Includes logprobs if requested.
Copy{ "type": "content_block_delta", "index": 0, "delta": { "type": "text_delta", "text": "The" }, "logprobs": null }
Or for tool arguments:
Copy{ "type": "content_block_delta", "index": 0, "delta": { "type": "tool_call_delta", "args": "{\"location" } }
Subsequent
content_block_delta
events add more text/arguments to the message. -
-
Content block stop
-
Event type:
content_block_stop
-
Data: Marks the end of a content block.
Copy{ "type": "content_block_stop", "index": 0 }
-
-
Message delta
-
Event type:
message_delta
-
Data: Provides final message-level updates like the stop reason and final usage statistics.
Copy{ "type": "message_delta", "delta": { "stop_reason": "stop" }, "usage": { "prompt_tokens": 10 } }
-
-
Message stop
-
Event type:
message_stop
-
Data: Marks the end of the message.
Copy{"type": "message_stop"}
-
Each event is sent as a separate SSE message, formatted as shown above. The connection is closed after all events are sent, signaling the end of the stream.
Example SSE response
event: message_start
data: {"type": "message_start", "message": {"id": "chatcmpl-9MzuQiF0SxUjFLRjIdT3mTVaMWwiv", "role": "assistant", "model": "gpt-4-0125-preview", "usage": {"prompt_tokens": 10}}}
event: content_block_start
data: {"type": "content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "The"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " skies"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " on"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " Mars"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " are"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": " red"}, "logprobs": null}
event: content_block_delta
data: {"type": "content_block_delta","index":0,"delta":{"type": "text_delta", "text": "."}, "logprobs": null}
event: content_block_stop
data: {"type": "content_block_stop","index":0}
event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "stop"}, "usage": {"completion_tokens": 7, "total_tokens": 17}}
event: message_stop
data: {"type": "message_stop"}