super_chat_20260225051901_82lastalechat17.44M params3s elapsed · Updated 47d ago
8L / 384D / 8H · helios · bpe-4k · adamw· Created Feb 25, 2026 5:20 AM
Step 2 / 50,0000.0%
8.3529
Loss?
8.3529
Best Loss?
0.0% from start
-
Val Loss?
5.09e-6
Learning Rate?
1,141
Throughput?
tok/s (avg)
3,590
Speed?
ms/iter (avg)
2.299
Grad Norm?
avg: 2.299
4.1K
Tokens
processed
2618ms
Forward
73% of step
889ms
Backward
25% of step
11ms
GPU Sync
0% of step
1,223
GPU Ops
per step
0.4%
MFU
model FLOPS util
0.3x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
Waiting for telemetry...
Architecture
Layers?8
Embedding?384
Heads?8
Vocab?4,000
Context?512
Dropout?0.1
Parameters?17.44M
Training Config
Total iters?50,000
Batch size?8
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?1000
Throughput (tok/s)
No Telemetry
Step Time (ms/iter)
No Telemetry
GPU & VRAM
No GPU data
Perplexity
No Telemetry
Train/Val Gap
No validation data
Learning Rate
No Telemetry
Grad Norm
No Telemetry
Smoothed Loss (EMA)
No Telemetry
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 512,
"nLayer": 8,
"nEmbd": 384,
"nHead": 8,
"dropout": 0.1
}Training Config (JSON)
{
"iters": 50000,
"batchSize": 8,
"lr": 0.00005,
"lrMin": 0.000005,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 1000,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 500,
"spikeThreshold": 0
}