A
Alpha
20260223085609_er4vcompletedchat2.36M params11m 12s elapsed · Updated 7m ago
4L / 128D / 4H · helios · bpe-4k · adamw· Created Feb 23, 2026 8:56 AM
Step 3,000 / 3,000100.0%
5.4956
Loss?
4.7334
Best Loss?
-34.0% from start
6.1205
Val Loss?
best: 6.0888
3.00e-5
Learning Rate?
4,807
Throughput?
tok/s (avg)
214
Speed?
ms/iter (avg)
3050.463
Grad Norm?
avg: 1572.166
3.06M
Tokens
processed
50ms
Forward
23% of step
128ms
Backward
60% of step
11ms
GPU Sync
5% of step
737
GPU Ops
per step
0.2%
MFU
model FLOPS util
2.6x
Bwd/Fwd
ratio
Loss Curve ?
Validation / selection
Run diagnostics
Gradient + Loss
Train/val loss with validation, warmup, overfit, and instability markers.
Search-aware view
Architecture
Layers?4
Embedding?128
Heads?4
Vocab?4,000
Context?128
Dropout?0
Parameters?2.36M
Training Config
Total iters?3,000
Batch size?8
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
GPU & VRAM
No GPU data
Learning Rate
Grad Norm
Step Time Breakdown
Step Time Breakdown
Checkpoints (1) ?
StepFilenameSizeCreated
3,000checkpoint-3000.json7.4 MB1d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>2d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>a party just . Where to . We your and is it of , on . <|assistant|> The it on your too it . <|assistant|> That and . <|user|> There are . <|user|> Well , a to . <|assistant|> . <|assistant|> He I ed is , I for . I are a be really is and aand a ,
2-<|user|> What do you like to do for fun? <|assistant|>2d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>the you and for . <|user|> the I and you a to my c. <|user|> A really just the really that . <|assistant|> How ? <|assistant|> M? <|assistant|> Maybe in ? <|user|> the . What's there it not a my have for . <|assistant|> Why ? <|assistant|> Here to see , in the see the
3-<|user|> Tell me about yourself. <|assistant|>2d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>an and a st . <|user|> Sof . <|user|> The the sof an you do is the did, your the I . <|assistant|> Thanks and so , a pthe ces . Who , a cand it ? ! A the ps have a er . <|user|>
4-In the beginning 2d ago
Prompt
In the beginning
Output
In the beginning , Ba and what , b, is recotatdal of people and you a diresa cotrestroat work out here . <|user|> Where to try this morning and very hard , I love the only got it going for lunch ? <|assistant|> I'm sorry . Any . <|assistant|> Sure . Who can see a date with an ! Many new family ? <|assistant|> Mical ? <|assistant|> Many time . <|user|> After you for their taryed his ths , and I'm sorry . I look on the tather
5-We the People of 2d ago
Prompt
We the People of
Output
We the People of a time and you there that we do to do that , if she on her on this evening ? <|assistant|> Very good in what he ? <|assistant|> what is no , I haven't say ? Are you looking forward to your resa descrionly go to got back . <|user|> Dad this ? <|user|> Thanks ! <|user|> Do you want to have a chance to one day . <|user|> Who do you think of having a die . Good afternoon at the time ! <|assistant|> Thanks . <|user|> Where does ! Now , no . I ’ ll spend her more aries , that's true . I don ’ t like it , I don ’ t like an free
{
  "vocabSize": 4000,
  "blockSize": 128,
  "nLayer": 4,
  "nEmbd": 128,
  "nHead": 4,
  "dropout": 0
}
{
  "iters": 3000,
  "batchSize": 8,
  "lr": 0.0003,
  "lrMin": 0.00003,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 500,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500
}