OpenAI-compatible API
Use Pegasus bots from OpenAI-compatible SDKs and tools.
The OpenAI-compatible API lets Pro and Ultra workspaces chat with Pegasus bots from tools that already support OpenAI chat completions.
Use the base URL https://<your-pegasus-host>/api/v1 and an existing Personal Access Token as the OpenAI api_key. The HTTP header is Authorization: Bearer pat_....
Before you start:
- Create a PAT in API tokens & scopes.
- Store the PAT safely; see Secure your token.
- Give the token
messages:sendfor chat completions andmodels:readfor model discovery. - Use a workspace with API access enabled. Free workspaces cannot use this API.
Responses and errors are raw OpenAI-shaped JSON. They are not wrapped in the standard Pegasus success or error envelope.
Quickstart
Set PEGASUS_API_KEY to a PAT such as pat_..., then point the official SDK or curl at your Pegasus host.
curl https://<your-pegasus-host>/api/v1/chat/completions \
-H "Authorization: Bearer $PEGASUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "<your-bot-id>",
"messages": [
{ "role": "user", "content": "Summarize our refund policy." }
]
}'import os
from openai import OpenAI
client = OpenAI(
base_url="https://<your-pegasus-host>/api/v1",
api_key=os.environ["PEGASUS_API_KEY"],
)
completion = client.chat.completions.create(
model="<your-bot-id>",
messages=[{"role": "user", "content": "Summarize our refund policy."}],
)
print(completion.choices[0].message.content)import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://<your-pegasus-host>/api/v1',
apiKey: process.env.PEGASUS_API_KEY,
})
const completion = await client.chat.completions.create({
model: '<your-bot-id>',
messages: [{ role: 'user', content: 'Summarize our refund policy.' }],
})
console.log(completion.choices[0].message.content)Model strings and discovery
The model value selects the bot and, optionally, the LLM alias.
| Form | Meaning |
|---|---|
<bot_id> | Use the bot with the plan's default LLM route. |
<bot_id>:<llm_alias> | Use the bot with a selectable alias from the LLM routing catalog. |
Use GET /api/v1/models with the models:read scope to discover what the token can use. The response is an OpenAI list object containing model ids such as <bot_id> and <bot_id>:<llm_alias>.
GET /api/v1/models/{id} returns one OpenAI model object for an id that appears in the list. Unknown, inaccessible, non-ready, malformed, or above-plan model ids return 404 with error.code set to UNKNOWN_MODEL.
Only READY bots are listed. Aliases are filtered by the caller's plan, so a Free workspace cannot use the API and a lower-tier plan does not see aliases it cannot select.
Streaming
Set stream: true to receive OpenAI chat.completion.chunk frames. With the official JavaScript SDK, iterate over the stream:
const stream = await client.chat.completions.create({
model: '<your-bot-id>:<llm_alias>',
messages: [{ role: 'user', content: 'Draft a short welcome message.' }],
stream: true,
stream_options: { include_usage: true },
})
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content
if (text) {
process.stdout.write(text)
}
if (chunk.usage) {
console.log('\nusage', chunk.usage)
}
}The stream uses Server-Sent Events. Normal streams end with data: [DONE]. When stream_options.include_usage is true, Pegasus sends one final usage chunk before [DONE]; that chunk has choices: [] and a usage object.
If an error happens before output starts, Pegasus returns a normal HTTP error response with the OpenAI error envelope. If an upstream error happens after output has started, Pegasus sends one SSE frame shaped as data: {"error":...} and closes the stream without data: [DONE].
Errors
Errors use this raw OpenAI envelope:
{
"error": {
"message": "Daily credit limit reached. Resets at midnight UTC.",
"type": "insufficient_quota",
"param": null,
"code": "DAILY_LIMIT_REACHED"
}
}error.code carries the stable Pegasus error code. error.message is human-readable and can change.
| HTTP | OpenAI error.type | Representative Pegasus error.code |
|---|---|---|
| 400 | invalid_request_error | VALIDATION_ERROR |
| 401 | authentication_error | PAT_INVALID, PAT_INVALID_OR_EXPIRED |
| 403 | permission_error | PAT_SCOPE_INSUFFICIENT, PLAN_LIMIT_EXCEEDED, PAT_NOT_ALLOWED, BOT_NOT_READY, TEAM_QUOTA_DISABLED, TEAM_SEAT_FEATURE_UNAVAILABLE |
| 404 | not_found_error | UNKNOWN_MODEL |
| 429 | rate_limit_error | RATE_LIMITED |
| 429 | insufficient_quota | DAILY_LIMIT_REACHED |
| 5xx | server_error | AI_SERVICE_UNAVAILABLE, INTERNAL_ERROR |
Limits, quota, and usage
- Chat requests require
POST /api/v1/chat/completionsand themessages:sendscope. - Model discovery uses
GET /api/v1/modelsandGET /api/v1/models/{id}with themodels:readscope. - Requests can include 1 to 50 messages.
- Total message content must be at most 100,000 characters.
- Every message
contentmust be a string. Multimodal content-part arrays are not supported. - The last message must have role
user. - Non-stream chat is limited to 20 requests per 60 seconds.
- Stream chat is limited to 10 requests per 60 seconds.
- Model endpoints use the default/light API rate limits.
- Chat uses the same daily credits as dashboard chat. Usage rows appear with
source=API. - The API is stateless. Pegasus does not create conversation or message rows; send the full history in each request.
Not supported in v1
The v1 OpenAI-compatible surface supports chat completions and model discovery only.
These are not supported:
- Embeddings.
- Responses API.
- Tool or function calling.
- Vision or other multimodal input.
- New
sk-...API keys. Use existingpat_...Personal Access Tokens. - Bare
/v1base paths. Use/api/v1.