20260223014834_63p2completedchat6.84M params5m 58s elapsed · Updated 48d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 23, 2026 1:49 AM
Step 842 / 30,0002.8%
6.6721
Loss?
5.1080
Best Loss?
-19.6% from start
8.5426
Val Loss?
best: 5.7704
1.00e-4
Learning Rate?
7,936
Throughput?
tok/s (avg)
529
Speed?
ms/iter (avg)
4719.308
Grad Norm?
avg: 1444.051
2.91M
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?6.84M
Training Config
Total iters?30,000
Batch size?16
Max LR?0.0001
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>50d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>but Wcfrom rcea to a fis to esome br<|user|>
itf ding itto the es es lio from do moe? a a agal<|assistant|> o ts some <|et tfito dra
2-<|user|> What do you like to do for fun? <|assistant|>50d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>fto fa <|user|> ding brjust <|user|> gjust but can not s ? es
<|user|> eo but <|user|> fds taring ? Tacan you from ci <|user|>
but time TT|>more some al
3-<|user|> Tell me about yourself. <|assistant|>50d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>ealtedto
, to in a some is ing your is altanbrjust is anaan c
Wto <|to br<|just , and the ars i just altfin <|user|> cd_t
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 256,
"nLayer": 6,
"nEmbd": 256,
"nHead": 8,
"dropout": 0
}Training Config (JSON)
{
"iters": 30000,
"batchSize": 16,
"lr": 0.0001,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 100,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false
}