Alpha API
Alpha serves small GPT models trained from scratch. All endpoints are unauthenticated except /api/upload.
Base URL: https://alpha2-production.up.railway.app
OpenAI-Compatible API
Drop-in compatible with vLLM, Ollama, LiteLLM, FastChat, the OpenAI Python/JS SDKs, and any OpenAI-compatible client. Use base URL https://alpha2-production.up.railway.app/v1.
GET /v1/models
List available models in OpenAI format.
Response
{
"object": "list",
"data": [
{ "id": "novels-5hr", "object": "model", "created": 1771439022, "owned_by": "alpha" },
...
]
}
POST /v1/chat/completions
OpenAI Chat Completions endpoint. Supports both non-streaming and streaming ("stream": true). Also available at /chat/completions.
Request body (JSON)
{
"model": "novels-5hr",
"messages": [{ "role": "user", "content": "Once upon a time" }],
"max_tokens": 2048,
"temperature": 0.7,
"stream": false
}
| Field | Type | Default | Description |
|---|---|---|---|
model | string | first model | Model ID from /v1/models |
messagesrequired | array | — | Array of {role, content} objects |
max_tokens | int | 2048 | Max tokens to generate (capped at 2048) |
temperature | float | 0.7 | Sampling temperature |
stream | bool | false | Stream response as SSE chunks |
Response (non-streaming)
{
"id": "chatcmpl-f018175cb9a6...",
"object": "chat.completion",
"created": 1771443731,
"model": "novels-5hr",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "generated text..." },
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 50,
"total_tokens": 57
}
}
Response (streaming, "stream": true)
// Each SSE chunk: data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"hello"},"finish_reason":null}]} // Final chunk includes usage and finish_reason, followed by: data: [DONE]
Example — curl
curl -X POST "https://alpha2-production.up.railway.app/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer any-key" \ -d '{ "model": "novels-5hr", "messages": [{"role": "user", "content": "Once upon a time"}], "max_tokens": 100, "temperature": 0.7 }'
Example — Python (OpenAI SDK)
from openai import OpenAI client = OpenAI( base_url="https://alpha2-production.up.railway.app/v1", api_key="any-key", ) # Non-streaming response = client.chat.completions.create( model="novels-5hr", messages=[{"role": "user", "content": "Once upon a time"}], max_tokens=100, ) print(response.choices[0].message.content) # Streaming stream = client.chat.completions.create( model="novels-5hr", messages=[{"role": "user", "content": "The knight"}], max_tokens=100, stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
Example — JavaScript (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://alpha2-production.up.railway.app/v1", apiKey: "any-key", }); const response = await client.chat.completions.create({ model: "novels-5hr", messages: [{ role: "user", content: "Once upon a time" }], max_tokens: 100, }); console.log(response.choices[0].message.content);
Other Endpoints
GET /api/models
Returns the list of available models with full training metadata.
Response
[
{
"id": "novels-5hr",
"name": "novels-5hr",
"step": 900,
"mtime": 1771438445123.4,
"lastLoss": 4.123,
"domain": "novels",
"modelConfig": { "vocabSize": 2000, "blockSize": 256, ... },
"trainConfig": { "iters": 3000, "batchSize": 4, ... }
}
]
POST GET /api/generate
Non-streaming text generation. All parameters can be passed as query string params or in a POST JSON body (query string takes precedence).
Request body (JSON)
{
"prompt": "string",
"max_tokens": 2048,
"temperature": 0.7,
"model": "string (optional, defaults to first model)"
}
| Field | Type | Default | Description |
|---|---|---|---|
promptrequired | string | — | Input text to complete |
max_tokens | int | 2048 | Max tokens to generate (capped at 2048) |
temperature | float | 0.7 | Sampling temperature |
model | string | first model | Model ID from /api/models |
Response
{
"text": "generated completion text",
"model": "novels-5hr",
"usage": {
"prompt_tokens": 5,
"completion_tokens": 100
}
}
Example — curl
curl -X POST "https://alpha2-production.up.railway.app/api/generate" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Once upon a time", "max_tokens": 100, "temperature": 0.7 }'
Example — JavaScript
const res = await fetch("/api/generate", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ prompt: "Once upon a time", max_tokens: 100, temperature: 0.7, }), }); const data = await res.json(); console.log(data.text); // "there was a kingdom far away..." console.log(data.usage); // { prompt_tokens: 4, completion_tokens: 100 }
GET /api/inference
Stream generated tokens via Server-Sent Events. The first event contains the echoed prompt; subsequent events are generated tokens. The stream ends with a [DONE] sentinel.
Query parameters
| Param | Type | Default | Description |
|---|---|---|---|
query | string | "" | Input prompt |
model | string | first model | Model ID from /api/models |
steps | int | 200 | Max tokens to generate (capped at 500) |
temp | float | 0.8 | Sampling temperature |
topk | int | 40 | Top-k filtering (0 = disabled) |
Example — curl
curl -N "https://alpha2-production.up.railway.app/api/inference?query=The&model=novels-5hr&steps=100&temp=0.8"
Example — JavaScript (EventSource)
const url = "/api/inference?query=The&model=novels-5hr&steps=100"; const source = new EventSource(url); source.onmessage = (e) => { if (e.data === "[DONE]") { source.close(); return; } const { token } = JSON.parse(e.data); process.stdout.write(token); };
SSE event format
// Each event: data: {"token": "hello"} // Final event: data: [DONE]
POST /api/chat
AI SDK-compatible streaming chat endpoint. Returns a text stream (not SSE). Compatible with the Vercel AI SDK useChat hook.
Request body (JSON)
| Field | Type | Default | Description |
|---|---|---|---|
messagesrequired | array | — | Array of {role, content} objects |
model | string | first model | Model ID |
maxTokens | int | 200 | Max tokens (capped at 500) |
temperature | float | 0.8 | Sampling temperature |
topk | int | 40 | Top-k filtering |
Example — curl (streaming)
curl -N -X POST "https://alpha2-production.up.railway.app/api/chat" \ -H "Content-Type: application/json" \ -d '{ "model": "novels-5hr", "messages": [{"role": "user", "content": "Once upon a time"}], "maxTokens": 100, "temperature": 0.8 }'
Example — JavaScript (fetch, streaming)
const res = await fetch("/api/chat", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "novels-5hr", messages: [{ role: "user", content: "Once upon a time" }], maxTokens: 100, }), }); const reader = res.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; process.stdout.write(decoder.decode(value)); }
Example — non-streaming (read full response)
const res = await fetch("/api/chat", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "novels-5hr", messages: [{ role: "user", content: "The knight" }], maxTokens: 50, }), }); // Just await the full text (ignores streaming) const text = await res.text(); console.log(text);
POST /api/upload
Upload a model checkpoint to the server. Requires Bearer token authentication via the UPLOAD_SECRET environment variable.
Headers
| Header | Value |
|---|---|
Authorization | Bearer <UPLOAD_SECRET> |
Content-Type | application/json |
Content-Encoding | gzip (optional, recommended for large checkpoints) |
Request body (JSON)
| Field | Type | Description |
|---|---|---|
namerequired | string | Run name (becomes the model ID) |
configrequired | object | Training config (config.json contents) |
checkpointrequired | object | Checkpoint data (checkpoint-N.json contents) |
steprequired | int | Training step number |
metrics | string | Contents of metrics.jsonl (newline-delimited JSON) |
Response
{ "ok": true, "name": "my-run", "step": 500 }
Notes
- These are small GPT models (100K–10M params) trained from scratch on specific domains — they don't follow instructions or answer questions. Treat them as text completers.
- The
domainfield on each model indicates what kind of text it generates:novels,abc(ABC music notation), orchords. - The first token event from
/api/inferenceis the echoed prompt. All subsequent events are generated tokens. - The
/api/chatendpoint streams text using the Vercel AI SDK data protocol. To consume it without streaming, just callawait res.text(). - Model loading is lazy — the first request to a model takes a few seconds to load the checkpoint into memory. Subsequent requests reuse the cached model.