20260223085609_er4vcompletedchat2.36M params11m 12s elapsed · Updated 36d ago
4L / 128D / 4H · helios · bpe-4k · adamw· Created Feb 23, 2026 8:56 AM
Step 3,000 / 3,000100.0%
5.4956
Loss?
4.7334
Best Loss?
-34.0% from start
6.1205
Val Loss?
best: 6.0888
3.00e-5
Learning Rate?
4,807
Throughput?
tok/s (avg)
214
Speed?
ms/iter (avg)
3050.463
Grad Norm?
avg: 1572.166
3.06M
Tokens
processed
50ms
Forward
23% of step
128ms
Backward
60% of step
11ms
GPU Sync
5% of step
737
GPU Ops
per step
0.2%
MFU
model FLOPS util
2.6x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?4
Embedding?128
Heads?4
Vocab?4,000
Context?128
Dropout?0
Parameters?2.36M
Training Config
Total iters?3,000
Batch size?8
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>49d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>a party just . Where to . We your and is it of , on . <|assistant|> The it on your too it . <|assistant|> That and .
<|user|> There are .
<|user|> Well , a to . <|assistant|> . <|assistant|> He I ed is , I for . I are a be really is and aand a ,
2-<|user|> What do you like to do for fun? <|assistant|>49d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>the you and for . <|user|> the I and you a to my c. <|user|> A really just the really that . <|assistant|> How ? <|assistant|> M? <|assistant|> Maybe in ? <|user|> the . What's there it not a my have for . <|assistant|> Why ? <|assistant|> Here to see , in the see the
3-<|user|> Tell me about yourself. <|assistant|>49d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>an and a st . <|user|> Sof . <|user|> The the sof an you do is the did, your the I . <|assistant|> Thanks and so , a pthe ces . Who , a cand it ? ! A the ps have a er . <|user|>
4-In the beginning 49d ago
Prompt
In the beginning
Output
In the beginning , Ba and what , b, is recotatdal of people and you a diresa cotrestroat work out here .
<|user|> Where to try this morning and very hard , I love the only got it going for lunch ? <|assistant|> I'm sorry . Any . <|assistant|> Sure . Who can see a date with an ! Many new family ? <|assistant|> Mical ? <|assistant|> Many time . <|user|> After you for their taryed his ths , and I'm sorry . I look on the tather
5-We the People of 49d ago
Prompt
We the People of
Output
We the People of a time and you there that we do to do that , if she on her on this evening ? <|assistant|> Very good in what he ? <|assistant|> what is no , I haven't say ? Are you looking forward to your resa descrionly go to got back .
<|user|> Dad this ? <|user|> Thanks ! <|user|> Do you want to have a chance to one day . <|user|> Who do you think of having a die . Good afternoon at the time ! <|assistant|> Thanks .
<|user|> Where does ! Now , no . I ’ ll spend her more aries , that's true . I don ’ t like it , I don ’ t like an free
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 128,
"nLayer": 4,
"nEmbd": 128,
"nHead": 4,
"dropout": 0
}Training Config (JSON)
{
"iters": 3000,
"batchSize": 8,
"lr": 0.0003,
"lrMin": 0.00003,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 500
}