concordance_v2_20260227065754_0o9ystaleconcordance306.73M params45m 30s elapsed · Updated 45d ago
21L / 1024D / 16H · helios · bpe-64k · adamw· Created Feb 27, 2026 6:58 AM
Step 196 / 20,0001.0%
8.3241
Loss?
4.9407
Best Loss?
-17.8% from start
-
Val Loss?
8.29e-5
Learning Rate?
148
Throughput?
tok/s (avg)
13,884
Speed?
ms/iter (avg)
2.457
Grad Norm?
avg: 2.267
401.4K
Tokens
processed
1454ms
Forward
10% of step
12025ms
Backward
87% of step
112ms
GPU Sync
1% of step
9,203
GPU Ops
per step
0.2%
MFU
model FLOPS util
8.3x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?21
Embedding?1024
Heads?16
Vocab?19,777
Context?512
Dropout?0
Parameters?306.73M
Training Config
Total iters?20,000
Batch size?1
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-64k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Transformer Layer Analysis
Gradient Norm Heatmap
Per-Layer Gradient Evolution
Checkpoints (0) ?
No checkpoints saved
Model Config (JSON)
{
"vocabSize": 19777,
"blockSize": 512,
"nLayer": 21,
"nEmbd": 1024,
"nHead": 16,
"dropout": 0,
"ffnActivation": "swiglu"
}Training Config (JSON)
{
"iters": 20000,
"batchSize": 1,
"lr": 0.0003,
"lrMin": 0.00003,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-64k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 4,
"sampleInterval": 200,
"spikeThreshold": 10,
"syncEvery": 1,
"gcEvery": 1,
"packed": false,
"symbio": false,
"symbioConfig": null
}