Alpha API

Alpha serves small GPT models trained from scratch. All endpoints are unauthenticated except /api/upload.

Base URL: https://alpha2-production.up.railway.app

OpenAI-Compatible API

Drop-in compatible with vLLM, Ollama, LiteLLM, FastChat, the OpenAI Python/JS SDKs, and any OpenAI-compatible client. Use base URL https://alpha2-production.up.railway.app/v1.

GET /v1/models

List available models in OpenAI format.

Response

{
  "object": "list",
  "data": [
    { "id": "novels-5hr", "object": "model", "created": 1771439022, "owned_by": "alpha" },
    ...
  ]
}

POST /v1/chat/completions

OpenAI Chat Completions endpoint. Supports both non-streaming and streaming ("stream": true). Also available at /chat/completions.

Request body (JSON)

{
  "model": "novels-5hr",
  "messages": [{ "role": "user", "content": "Once upon a time" }],
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false
}

Field	Type	Default	Description
`model`	string	first model	Model ID from `/v1/models`
`messages`required	array	—	Array of `{role, content}` objects
`max_tokens`	int	`2048`	Max tokens to generate (capped at 2048)
`temperature`	float	`0.7`	Sampling temperature
`stream`	bool	`false`	Stream response as SSE chunks

Response (non-streaming)

{
  "id": "chatcmpl-f018175cb9a6...",
  "object": "chat.completion",
  "created": 1771443731,
  "model": "novels-5hr",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "generated text..." },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 50,
    "total_tokens": 57
  }
}

Response (streaming, `"stream": true`)

// Each SSE chunk:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"hello"},"finish_reason":null}]}

// Final chunk includes usage and finish_reason, followed by:
data: [DONE]

Example — curl

curl -X POST "https://alpha2-production.up.railway.app/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key" \
  -d '{
    "model": "novels-5hr",
    "messages": [{"role": "user", "content": "Once upon a time"}],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Example — Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://alpha2-production.up.railway.app/v1",
    api_key="any-key",
)

# Non-streaming
response = client.chat.completions.create(
    model="novels-5hr",
    messages=[{"role": "user", "content": "Once upon a time"}],
    max_tokens=100,
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="novels-5hr",
    messages=[{"role": "user", "content": "The knight"}],
    max_tokens=100,
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Example — JavaScript (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://alpha2-production.up.railway.app/v1",
  apiKey: "any-key",
});

const response = await client.chat.completions.create({
  model: "novels-5hr",
  messages: [{ role: "user", content: "Once upon a time" }],
  max_tokens: 100,
});
console.log(response.choices[0].message.content);

Other Endpoints

GET /api/models

Returns the list of available models with full training metadata.

Response

[
  {
    "id": "novels-5hr",
    "name": "novels-5hr",
    "step": 900,
    "mtime": 1771438445123.4,
    "lastLoss": 4.123,
    "domain": "novels",
    "modelConfig": { "vocabSize": 2000, "blockSize": 256, ... },
    "trainConfig": { "iters": 3000, "batchSize": 4, ... }
  }
]

POST GET /api/generate

Non-streaming text generation. All parameters can be passed as query string params or in a POST JSON body (query string takes precedence).

Request body (JSON)

{
  "prompt": "string",
  "max_tokens": 2048,
  "temperature": 0.7,
  "model": "string (optional, defaults to first model)"
}

Field	Type	Default	Description
`prompt`required	string	—	Input text to complete
`max_tokens`	int	`2048`	Max tokens to generate (capped at 2048)
`temperature`	float	`0.7`	Sampling temperature
`model`	string	first model	Model ID from `/api/models`

Response

{
  "text": "generated completion text",
  "model": "novels-5hr",
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 100
  }
}

Example — curl

curl -X POST "https://alpha2-production.up.railway.app/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Example — JavaScript

const res = await fetch("/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    prompt: "Once upon a time",
    max_tokens: 100,
    temperature: 0.7,
  }),
});

const data = await res.json();
console.log(data.text);
// "there was a kingdom far away..."
console.log(data.usage);
// { prompt_tokens: 4, completion_tokens: 100 }

GET /api/inference

Stream generated tokens via Server-Sent Events. The first event contains the echoed prompt; subsequent events are generated tokens. The stream ends with a [DONE] sentinel.

Query parameters

Param	Type	Default	Description
`query`	string	`""`	Input prompt
`model`	string	first model	Model ID from `/api/models`
`steps`	int	`200`	Max tokens to generate (capped at 500)
`temp`	float	`0.8`	Sampling temperature
`topk`	int	`40`	Top-k filtering (0 = disabled)

Example — curl

curl -N "https://alpha2-production.up.railway.app/api/inference?query=The&model=novels-5hr&steps=100&temp=0.8"

Example — JavaScript (EventSource)

const url = "/api/inference?query=The&model=novels-5hr&steps=100";
const source = new EventSource(url);

source.onmessage = (e) => {
  if (e.data === "[DONE]") { source.close(); return; }
  const { token } = JSON.parse(e.data);
  process.stdout.write(token);
};

SSE event format

// Each event:
data: {"token": "hello"}

// Final event:
data: [DONE]

POST /api/chat

AI SDK-compatible streaming chat endpoint. Returns a text stream (not SSE). Compatible with the Vercel AI SDK useChat hook.

Request body (JSON)

Field	Type	Default	Description
`messages`required	array	—	Array of `{role, content}` objects
`model`	string	first model	Model ID
`maxTokens`	int	`200`	Max tokens (capped at 500)
`temperature`	float	`0.8`	Sampling temperature
`topk`	int	`40`	Top-k filtering

Example — curl (streaming)

curl -N -X POST "https://alpha2-production.up.railway.app/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "novels-5hr",
    "messages": [{"role": "user", "content": "Once upon a time"}],
    "maxTokens": 100,
    "temperature": 0.8
  }'

Example — JavaScript (fetch, streaming)

const res = await fetch("/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "novels-5hr",
    messages: [{ role: "user", content: "Once upon a time" }],
    maxTokens: 100,
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(decoder.decode(value));
}

Example — non-streaming (read full response)

const res = await fetch("/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "novels-5hr",
    messages: [{ role: "user", content: "The knight" }],
    maxTokens: 50,
  }),
});

// Just await the full text (ignores streaming)
const text = await res.text();
console.log(text);

POST /api/upload

Upload a model checkpoint to the server. Requires Bearer token authentication via the UPLOAD_SECRET environment variable.

Headers

Header	Value
`Authorization`	`Bearer <UPLOAD_SECRET>`
`Content-Type`	`application/json`
`Content-Encoding`	`gzip` (optional, recommended for large checkpoints)

Request body (JSON)

Field	Type	Description
`name`required	string	Run name (becomes the model ID)
`config`required	object	Training config (config.json contents)
`checkpoint`required	object	Checkpoint data (checkpoint-N.json contents)
`step`required	int	Training step number
`metrics`	string	Contents of metrics.jsonl (newline-delimited JSON)

Response

{ "ok": true, "name": "my-run", "step": 500 }

Notes

These are small GPT models (100K–10M params) trained from scratch on specific domains — they don't follow instructions or answer questions. Treat them as text completers.
The domain field on each model indicates what kind of text it generates: novels, abc (ABC music notation), or chords.
The first token event from /api/inference is the echoed prompt. All subsequent events are generated tokens.
The /api/chat endpoint streams text using the Vercel AI SDK data protocol. To consume it without streaming, just call await res.text().
Model loading is lazy — the first request to a model takes a few seconds to load the checkpoint into memory. Subsequent requests reuse the cached model.

Alpha API

OpenAI-Compatible API

GET /v1/models

Response

POST /v1/chat/completions

Request body (JSON)

Response (non-streaming)

Response (streaming, "stream": true)

Example — curl

Example — Python (OpenAI SDK)

Example — JavaScript (OpenAI SDK)

Other Endpoints

GET /api/models

Response

POST GET /api/generate

Request body (JSON)

Response

Example — curl

Example — JavaScript

GET /api/inference

Query parameters

Example — curl

Example — JavaScript (EventSource)

SSE event format

POST /api/chat

Request body (JSON)

Example — curl (streaming)

Example — JavaScript (fetch, streaming)

Example — non-streaming (read full response)

POST /api/upload

Headers

Request body (JSON)

Response

Notes

Response (streaming, `"stream": true`)