Alpha

20260224064526_4kwostalechat9.99M params3h 50m elapsed · Updated 36d ago

6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 24, 2026 6:46 AM

Step 0 / 50,0000.0%

5.4561

Loss?

5.2793

Best Loss?

-34.7% from start

5.6737

Val Loss?

best: 5.6737

2.93e-5

Learning Rate?

1,598

Throughput?

tok/s (avg)

2,565

Speed?

ms/iter (avg)

104.261

Grad Norm?

avg: 121.212

21.70M

Tokens

processed

1768ms

Forward

69% of step

705ms

Backward

27% of step

45ms

GPU Sync

2% of step

1,224

GPU Ops

per step

0.3%

MFU

model FLOPS util

0.4x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?256

Dropout?0.1

Parameters?9.99M

Training Config

Total iters?50,000

Batch size?16

Max LR?0.00003

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?2500

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

5,000checkpoint-5000.json82.2 MB48d ago

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>48d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>s the iting may es in s ed may s of f—that the fit ; s unntand ay, the eing the that in the dcinules the the ny, —s unawars for , of

2-<|user|> What do you like to do for fun? <|assistant|>48d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

<|user|> What do you like to do for fun? <|assistant|>in its s our s of our esmay ing your to when dits of s in or its of its is c; in or your a to by in the in to wtiand to s t, that is y, it rein st, through et

3-<|user|> Tell me about yourself. <|assistant|>48d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>our the and ythe is our is that e, or may to our and the may without our tent the true for uning , es it tin your and to a my tour a ing in aenres and the eis f

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0.1
}

Training Config (JSON)

{
  "iters": 50000,
  "batchSize": 16,
  "lr": 0.00003,
  "lrMin": 0.000003,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.99,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 2500,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500,
  "spikeThreshold": 0
}