20260223020126_nu4acompletedchat6.84M params9m 3s elapsed · Updated 48d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 23, 2026 2:02 AM
Step 1,158 / 30,0003.9%
5.3828
Loss?
5.0609
Best Loss?
-35.1% from start
5.6744
Val Loss?
best: 5.6744
4.99e-5
Learning Rate?
8,160
Throughput?
tok/s (avg)
502
Speed?
ms/iter (avg)
2565.852
Grad Norm?
avg: 221.443
4.44M
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?6.84M
Training Config
Total iters?30,000
Batch size?16
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>50d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>ding Wdcherardo to at dto aletive tive ar
just k books terararis alcan ? ding exgod? albe just ex<|assistant|> <|user|> from br<|user|> just tcex<|user|> dding
2-<|user|> What do you like to do for fun? <|assistant|>50d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>hto fto the know what books ding <|assistant|> anding a ba bin the s ing de
? c<|user|> ding anttis an? is ar__ding <|assistant|> brdwe have ?
motive __<|assistant|> of'd le
3-<|user|> Tell me about yourself. <|assistant|>50d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>eittedes
<|assistant|> alcan have play p almoent some esbooks bican <|user|> Wcan c
Walst<|assistant|> books |>be te is to an brfrom exo ejust ? dWda
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 256,
"nLayer": 6,
"nEmbd": 256,
"nHead": 8,
"dropout": 0
}Training Config (JSON)
{
"iters": 30000,
"batchSize": 16,
"lr": 0.00005,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 100,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false
}