20260222232656_1bhfcompletedchat6.84M params4m 3s elapsed · Updated 48d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 22, 2026 11:27 PM
Step 499 / 50,0001.0%
7.7175
Loss?
7.0905
Best Loss?
-7.0% from start
7.6123
Val Loss?
best: 7.1820
2.99e-4
Learning Rate?
7,902
Throughput?
tok/s (avg)
519
Speed?
ms/iter (avg)
2090.569
Grad Norm?
avg: 1077.907
1.95M
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?6.84M
Training Config
Total iters?50,000
Batch size?16
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>50d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>have p. get I s in. <|user|> it ed . . <|assistant|> ed . get really to can y din the for it it . <|user|> . <|user|> and to get that my . ? <|assistant|> and ed for an to to ? <|user|> get the in , . that the s have
2-<|user|> What do you like to do for fun? <|assistant|>50d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>, it y it a in the really my to , . <|assistant|> for .
<|user|> .
<|user|> I ? <|assistant|> inI ccthe s is for to , . is I me . <|assistant|> is pp? <|user|> me with . get a -.
<|user|> my ppyou are my don
3-<|user|> Tell me about yourself. <|assistant|>50d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>, on , w. . <|assistant|> cis it on . <|user|> with . <|assistant|> it .
<|user|> and that y . really are . <|assistant|> the rof scpit the a really . . <|user|> with ? <|assistant|> ? <|assistant|> the dget . <|assistant|> of I . .
<|user|> is s s . y
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 256,
"nLayer": 6,
"nEmbd": 256,
"nHead": 8,
"dropout": 0
}Training Config (JSON)
{
"iters": 50000,
"batchSize": 16,
"lr": 0.0003,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 100,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false
}