A
Alpha
fine_corpus_clean_v11stalefine_corpus7.38M params4m 5s elapsed · Updated 36d ago
4L / 256D / 4H · helios · bpe-8k · adamw· Created Mar 8, 2026 2:20 PM
Step 2,091 / 150,0001.4%
7.6027
Loss?
7.3542
Best Loss?
-15.9% from start
7.7577
Val Loss?
best: 7.7577
6.00e-4
Learning Rate?
14,858
Throughput?
tok/s (avg)
138
Speed?
ms/iter (avg)
23092.638
Grad Norm?
avg: 22852.278
4.28M
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?4
Embedding?256
Heads?4
Vocab?8,000
Context?512
Dropout?0
Parameters?7.38M
Training Config
Total iters?150,000
Batch size?4
Max LR?0.0006
Optimizer?adamw
Backend?helios
Tokenizer?bpe-8k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?2000
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data

Transformer Layer Analysis

Gradient Norm Heatmap
Per-Layer Gradient Evolution
Checkpoints (0) ?
No checkpoints saved
Model Config (JSON)
{
  "vocabSize": 8000,
  "blockSize": 512,
  "nLayer": 4,
  "nEmbd": 256,
  "nHead": 4,
  "dropout": 0,
  "ffnActivation": "gelu"
}
Training Config (JSON)
{
  "iters": 150000,
  "batchSize": 4,
  "lr": 0.0006,
  "lrMin": 0.00006,
  "warmupIters": 2000,
  "beta1": 0.9,
  "beta2": 0.999,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 2000,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-8k",
  "optimizer": "adamw",
  "logLevel": "info",
  "logEvery": 25,
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 10000,
  "spikeThreshold": 0,
  "embGradScale": 0.1,
  "syncEvery": 1,
  "gcEvery": 1,
  "packed": true,
  "symbio": false,
  "symbioConfig": null
}