OpenAI-compatible API

The OpenAI-compatible API lets Pro and Ultra workspaces chat with Pegasus bots from tools that already support OpenAI chat completions.

Use the base URL https://<your-pegasus-host>/api/v1 and an existing Personal Access Token as the OpenAI api_key. The HTTP header is Authorization: Bearer pat_....

Before you start:

Create a PAT in API tokens & scopes.
Store the PAT safely; see Secure your token.
Give the token messages:send for chat completions and models:read for model discovery.
Use a workspace with API access enabled. Free workspaces cannot use this API.

Responses and errors are raw OpenAI-shaped JSON. They are not wrapped in the standard Pegasus success or error envelope.

Quickstart

Set PEGASUS_API_KEY to a PAT such as pat_..., then point the official SDK or curl at your Pegasus host.

curl https://<your-pegasus-host>/api/v1/chat/completions \
  -H "Authorization: Bearer $PEGASUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<your-bot-id>",
    "messages": [
      { "role": "user", "content": "Summarize our refund policy." }
    ]
  }'

import os
from openai import OpenAI
 
client = OpenAI(
    base_url="https://<your-pegasus-host>/api/v1",
    api_key=os.environ["PEGASUS_API_KEY"],
)
 
completion = client.chat.completions.create(
    model="<your-bot-id>",
    messages=[{"role": "user", "content": "Summarize our refund policy."}],
)
 
print(completion.choices[0].message.content)

import OpenAI from 'openai'
 
const client = new OpenAI({
    baseURL: 'https://<your-pegasus-host>/api/v1',
    apiKey: process.env.PEGASUS_API_KEY,
})
 
const completion = await client.chat.completions.create({
    model: '<your-bot-id>',
    messages: [{ role: 'user', content: 'Summarize our refund policy.' }],
})
 
console.log(completion.choices[0].message.content)

Model strings and discovery

The model value selects the bot and, optionally, the LLM alias.

Form	Meaning
`<bot_id>`	Use the bot with the plan's default LLM route.
`<bot_id>:<llm_alias>`	Use the bot with a selectable alias from the LLM routing catalog.

Use GET /api/v1/models with the models:read scope to discover what the token can use. The response is an OpenAI list object containing model ids such as <bot_id> and <bot_id>:<llm_alias>.

GET /api/v1/models/{id} returns one OpenAI model object for an id that appears in the list. Unknown, inaccessible, non-ready, malformed, or above-plan model ids return 404 with error.code set to UNKNOWN_MODEL.

Only READY bots are listed. Aliases are filtered by the caller's plan, so a Free workspace cannot use the API and a lower-tier plan does not see aliases it cannot select.

Streaming

Set stream: true to receive OpenAI chat.completion.chunk frames. With the official JavaScript SDK, iterate over the stream:

const stream = await client.chat.completions.create({
    model: '<your-bot-id>:<llm_alias>',
    messages: [{ role: 'user', content: 'Draft a short welcome message.' }],
    stream: true,
    stream_options: { include_usage: true },
})
 
for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content
    if (text) {
        process.stdout.write(text)
    }
 
    if (chunk.usage) {
        console.log('\nusage', chunk.usage)
    }
}

The stream uses Server-Sent Events. Normal streams end with data: [DONE]. When stream_options.include_usage is true, Pegasus sends one final usage chunk before [DONE]; that chunk has choices: [] and a usage object.

If an error happens before output starts, Pegasus returns a normal HTTP error response with the OpenAI error envelope. If an upstream error happens after output has started, Pegasus sends one SSE frame shaped as data: {"error":...} and closes the stream without data: [DONE].

Errors

Errors use this raw OpenAI envelope:

{
    "error": {
        "message": "Daily credit limit reached. Resets at midnight UTC.",
        "type": "insufficient_quota",
        "param": null,
        "code": "DAILY_LIMIT_REACHED"
    }
}

error.code carries the stable Pegasus error code. error.message is human-readable and can change.

HTTP	OpenAI `error.type`	Representative Pegasus `error.code`
400	`invalid_request_error`	`VALIDATION_ERROR`
401	`authentication_error`	`PAT_INVALID`, `PAT_INVALID_OR_EXPIRED`
403	`permission_error`	`PAT_SCOPE_INSUFFICIENT`, `PLAN_LIMIT_EXCEEDED`, `PAT_NOT_ALLOWED`, `BOT_NOT_READY`, `TEAM_QUOTA_DISABLED`, `TEAM_SEAT_FEATURE_UNAVAILABLE`
404	`not_found_error`	`UNKNOWN_MODEL`
429	`rate_limit_error`	`RATE_LIMITED`
429	`insufficient_quota`	`DAILY_LIMIT_REACHED`
5xx	`server_error`	`AI_SERVICE_UNAVAILABLE`, `INTERNAL_ERROR`

Limits, quota, and usage

Chat requests require POST /api/v1/chat/completions and the messages:send scope.
Model discovery uses GET /api/v1/models and GET /api/v1/models/{id} with the models:read scope.
Requests can include 1 to 50 messages.
Total message content must be at most 100,000 characters.
Every message content must be a string. Multimodal content-part arrays are not supported.
The last message must have role user.
Non-stream chat is limited to 20 requests per 60 seconds.
Stream chat is limited to 10 requests per 60 seconds.
Model endpoints use the default/light API rate limits.
Chat uses the same daily credits as dashboard chat. Usage rows appear with source=API.
The API is stateless. Pegasus does not create conversation or message rows; send the full history in each request.

Not supported in v1

The v1 OpenAI-compatible surface supports chat completions and model discovery only.

These are not supported:

Embeddings.
Responses API.
Tool or function calling.
Vision or other multimodal input.
New sk-... API keys. Use existing pat_... Personal Access Tokens.
Bare /v1 base paths. Use /api/v1.

Quickstart#

Model strings and discovery#

Streaming#

Errors#