A
Alpha
20260222012125_nxflcompletedconcordance7.24M params19m 55s elapsed · Updated 51d ago
6L / 256D / 8H · helios · bpe · adamw· Created Feb 22, 2026 1:22 AM
Step 375 / 375100.0%
3.0909
Loss?
2.8638
Best Loss?
-63.6% from start
3.3444
Val Loss?
best: 3.3444
2.17e-9
Learning Rate?
319
Throughput?
tok/s (avg)
3,206
Speed?
ms/iter (avg)
2.202
Grad Norm?
avg: 8.643
383.0K
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,791
Context?256
Dropout?0
Parameters?7.24M
Training Config
Total iters?375
Batch size?4
Max LR?0.0001
Optimizer?adamw
Backend?helios
Tokenizer?bpe
Seed?42
Weight decay?0.01
Grad clip?1
Eval interval?125
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Model Config (JSON)
{
  "vocabSize": 4791,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 375,
  "batchSize": 4,
  "lr": 0.0001,
  "beta1": 0.9,
  "beta2": 0.999,
  "eps": 1e-8,
  "weightDecay": 0.01,
  "gradClip": 1,
  "evalInterval": 125,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false
}