Alpha API

Alpha serves small GPT models trained from scratch. All endpoints are unauthenticated except /api/upload.

Base URL: https://alpha2-production.up.railway.app

OpenAI-Compatible API

Drop-in compatible with vLLM, Ollama, LiteLLM, FastChat, the OpenAI Python/JS SDKs, and any OpenAI-compatible client. Use base URL https://alpha2-production.up.railway.app/v1.

GET /v1/models

List available models in OpenAI format.

Response

{
  "object": "list",
  "data": [
    { "id": "novels-5hr", "object": "model", "created": 1771439022, "owned_by": "alpha" },
    ...
  ]
}

POST /v1/chat/completions

OpenAI Chat Completions endpoint. Supports both non-streaming and streaming ("stream": true). Also available at /chat/completions.

Request body (JSON)

{
  "model": "novels-5hr",
  "messages": [{ "role": "user", "content": "Once upon a time" }],
  "max_tokens": 2048,
  "temperature": 0.7,
  "stream": false
}
FieldTypeDefaultDescription
modelstringfirst modelModel ID from /v1/models
messagesrequiredarrayArray of {role, content} objects
max_tokensint2048Max tokens to generate (capped at 2048)
temperaturefloat0.7Sampling temperature
streamboolfalseStream response as SSE chunks

Response (non-streaming)

{
  "id": "chatcmpl-f018175cb9a6...",
  "object": "chat.completion",
  "created": 1771443731,
  "model": "novels-5hr",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "generated text..." },
    "finish_reason": "length"
  }],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 50,
    "total_tokens": 57
  }
}

Response (streaming, "stream": true)

// Each SSE chunk:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"hello"},"finish_reason":null}]}

// Final chunk includes usage and finish_reason, followed by:
data: [DONE]

Example — curl

curl -X POST "https://alpha2-production.up.railway.app/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key" \
  -d '{
    "model": "novels-5hr",
    "messages": [{"role": "user", "content": "Once upon a time"}],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Example — Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://alpha2-production.up.railway.app/v1",
    api_key="any-key",
)

# Non-streaming
response = client.chat.completions.create(
    model="novels-5hr",
    messages=[{"role": "user", "content": "Once upon a time"}],
    max_tokens=100,
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="novels-5hr",
    messages=[{"role": "user", "content": "The knight"}],
    max_tokens=100,
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Example — JavaScript (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://alpha2-production.up.railway.app/v1",
  apiKey: "any-key",
});

const response = await client.chat.completions.create({
  model: "novels-5hr",
  messages: [{ role: "user", content: "Once upon a time" }],
  max_tokens: 100,
});
console.log(response.choices[0].message.content);

Other Endpoints

GET /api/models

Returns the list of available models with full training metadata.

Response

[
  {
    "id": "novels-5hr",
    "name": "novels-5hr",
    "step": 900,
    "mtime": 1771438445123.4,
    "lastLoss": 4.123,
    "domain": "novels",
    "modelConfig": { "vocabSize": 2000, "blockSize": 256, ... },
    "trainConfig": { "iters": 3000, "batchSize": 4, ... }
  }
]

POST GET /api/generate

Non-streaming text generation. All parameters can be passed as query string params or in a POST JSON body (query string takes precedence).

Request body (JSON)

{
  "prompt": "string",
  "max_tokens": 2048,
  "temperature": 0.7,
  "model": "string (optional, defaults to first model)"
}
FieldTypeDefaultDescription
promptrequiredstringInput text to complete
max_tokensint2048Max tokens to generate (capped at 2048)
temperaturefloat0.7Sampling temperature
modelstringfirst modelModel ID from /api/models

Response

{
  "text": "generated completion text",
  "model": "novels-5hr",
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 100
  }
}

Example — curl

curl -X POST "https://alpha2-production.up.railway.app/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Example — JavaScript

const res = await fetch("/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    prompt: "Once upon a time",
    max_tokens: 100,
    temperature: 0.7,
  }),
});

const data = await res.json();
console.log(data.text);
// "there was a kingdom far away..."
console.log(data.usage);
// { prompt_tokens: 4, completion_tokens: 100 }

GET /api/inference

Stream generated tokens via Server-Sent Events. The first event contains the echoed prompt; subsequent events are generated tokens. The stream ends with a [DONE] sentinel.

Query parameters

ParamTypeDefaultDescription
querystring""Input prompt
modelstringfirst modelModel ID from /api/models
stepsint200Max tokens to generate (capped at 500)
tempfloat0.8Sampling temperature
topkint40Top-k filtering (0 = disabled)

Example — curl

curl -N "https://alpha2-production.up.railway.app/api/inference?query=The&model=novels-5hr&steps=100&temp=0.8"

Example — JavaScript (EventSource)

const url = "/api/inference?query=The&model=novels-5hr&steps=100";
const source = new EventSource(url);

source.onmessage = (e) => {
  if (e.data === "[DONE]") { source.close(); return; }
  const { token } = JSON.parse(e.data);
  process.stdout.write(token);
};

SSE event format

// Each event:
data: {"token": "hello"}

// Final event:
data: [DONE]

POST /api/chat

AI SDK-compatible streaming chat endpoint. Returns a text stream (not SSE). Compatible with the Vercel AI SDK useChat hook.

Request body (JSON)

FieldTypeDefaultDescription
messagesrequiredarrayArray of {role, content} objects
modelstringfirst modelModel ID
maxTokensint200Max tokens (capped at 500)
temperaturefloat0.8Sampling temperature
topkint40Top-k filtering

Example — curl (streaming)

curl -N -X POST "https://alpha2-production.up.railway.app/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "novels-5hr",
    "messages": [{"role": "user", "content": "Once upon a time"}],
    "maxTokens": 100,
    "temperature": 0.8
  }'

Example — JavaScript (fetch, streaming)

const res = await fetch("/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "novels-5hr",
    messages: [{ role: "user", content: "Once upon a time" }],
    maxTokens: 100,
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(decoder.decode(value));
}

Example — non-streaming (read full response)

const res = await fetch("/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "novels-5hr",
    messages: [{ role: "user", content: "The knight" }],
    maxTokens: 50,
  }),
});

// Just await the full text (ignores streaming)
const text = await res.text();
console.log(text);

POST /api/upload

Upload a model checkpoint to the server. Requires Bearer token authentication via the UPLOAD_SECRET environment variable.

Headers

HeaderValue
AuthorizationBearer <UPLOAD_SECRET>
Content-Typeapplication/json
Content-Encodinggzip (optional, recommended for large checkpoints)

Request body (JSON)

FieldTypeDescription
namerequiredstringRun name (becomes the model ID)
configrequiredobjectTraining config (config.json contents)
checkpointrequiredobjectCheckpoint data (checkpoint-N.json contents)
steprequiredintTraining step number
metricsstringContents of metrics.jsonl (newline-delimited JSON)

Response

{ "ok": true, "name": "my-run", "step": 500 }

Notes