20260223085609_er4vcompletedchat2.36M params11m 12s elapsed · Updated 7m ago
4L / 128D / 4H · helios · bpe-4k · adamw· Created Feb 23, 2026 8:56 AM
Step 3,000 / 3,000100.0%
5.4956
Loss?
4.7334
Best Loss?
-34.0% from start
6.1205
Val Loss?
best: 6.0888
3.00e-5
Learning Rate?
4,807
Throughput?
tok/s (avg)
214
Speed?
ms/iter (avg)
3050.463
Grad Norm?
avg: 1572.166
3.06M
Tokens
processed
50ms
Forward
23% of step
128ms
Backward
60% of step
11ms
GPU Sync
5% of step
737
GPU Ops
per step
0.2%
MFU
model FLOPS util
2.6x
Bwd/Fwd
ratio
Loss Curve ?
Validation / selection
Run diagnostics
Gradient + Loss
Train/val loss with validation, warmup, overfit, and instability markers.
Search-aware view
Architecture
Layers?4
Embedding?128
Heads?4
Vocab?4,000
Context?128
Dropout?0
Parameters?2.36M
Training Config
Total iters?3,000
Batch size?8
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
GPU & VRAM
No GPU data
Learning Rate
Grad Norm
Step Time Breakdown
Step Time Breakdown
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>2d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>a party just . Where to . We your and is it of , on . <|assistant|> The it on your too it . <|assistant|> That and .
<|user|> There are .
<|user|> Well , a to . <|assistant|> . <|assistant|> He I ed is , I for . I are a be really is and aand a ,
2-<|user|> What do you like to do for fun? <|assistant|>2d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>the you and for . <|user|> the I and you a to my c. <|user|> A really just the really that . <|assistant|> How ? <|assistant|> M? <|assistant|> Maybe in ? <|user|> the . What's there it not a my have for . <|assistant|> Why ? <|assistant|> Here to see , in the see the
3-<|user|> Tell me about yourself. <|assistant|>2d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>an and a st . <|user|> Sof . <|user|> The the sof an you do is the did, your the I . <|assistant|> Thanks and so , a pthe ces . Who , a cand it ? ! A the ps have a er . <|user|>
4-In the beginning 2d ago
Prompt
In the beginning
Output
In the beginning , Ba and what , b, is recotatdal of people and you a diresa cotrestroat work out here .
<|user|> Where to try this morning and very hard , I love the only got it going for lunch ? <|assistant|> I'm sorry . Any . <|assistant|> Sure . Who can see a date with an ! Many new family ? <|assistant|> Mical ? <|assistant|> Many time . <|user|> After you for their taryed his ths , and I'm sorry . I look on the tather
5-We the People of 2d ago
Prompt
We the People of
Output
We the People of a time and you there that we do to do that , if she on her on this evening ? <|assistant|> Very good in what he ? <|assistant|> what is no , I haven't say ? Are you looking forward to your resa descrionly go to got back .
<|user|> Dad this ? <|user|> Thanks ! <|user|> Do you want to have a chance to one day . <|user|> Who do you think of having a die . Good afternoon at the time ! <|assistant|> Thanks .
<|user|> Where does ! Now , no . I ’ ll spend her more aries , that's true . I don ’ t like it , I don ’ t like an free
{
"vocabSize": 4000,
"blockSize": 128,
"nLayer": 4,
"nEmbd": 128,
"nHead": 4,
"dropout": 0
}{
"iters": 3000,
"batchSize": 8,
"lr": 0.0003,
"lrMin": 0.00003,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 500
}