A
Alpha
20260220142056_5odkstaleunknown176.5K params1m 5s elapsed · Updated 36d ago
2L / 64D / 4H · helios · char · adamw· Created Feb 20, 2026 2:20 PM
Step 0 / 2000.0%
2.7194
Loss?
2.6809
Best Loss?
-32.9% from start
2.7286
Val Loss?
best: 2.7286
2.28e-8
Learning Rate?
805
Throughput?
tok/s (avg)
318
Speed?
ms/iter (avg)
1.126
Grad Norm?
avg: 1.130
51.2K
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?4
Vocab?56
Context?64
Dropout?0
Parameters?176.5K
Training Config
Total iters?200
Batch size?4
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?char
Seed?42
Weight decay?0.01
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (1) ?
StepFilenameSizeCreated
200checkpoint-200.json1.3 MB52d ago
Model Config (JSON)
{
  "vocabSize": 56,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 4,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 200,
  "batchSize": 4,
  "lr": 0.0003,
  "beta1": 0.9,
  "beta2": 0.999,
  "eps": 1e-8,
  "weightDecay": 0.01,
  "gradClip": 1,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "char",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false
}