A
Alpha
20260222190938_jp13completedconcordance5.64M params15s elapsed · Updated 48d ago
1L / 64D / 2H · cpu_ref · bpe-64k · adamw· Created Feb 22, 2026 7:09 PM
Step 3 / 3100.0%
10.6741
Loss?
10.6741
Best Loss?
-0.2% from start
-
Val Loss?
1.81e-5
Learning Rate?
12
Throughput?
tok/s (avg)
5,169
Speed?
ms/iter (avg)
1.706
Grad Norm?
avg: 1.670
192
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?1
Embedding?64
Heads?2
Vocab?43,642
Context?32
Dropout?0
Parameters?5.64M
Training Config
Total iters?3
Batch size?2
Max LR?0.00006
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe-64k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
No Telemetry
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Model Config (JSON)
{
  "vocabSize": 43642,
  "blockSize": 32,
  "nLayer": 1,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 3,
  "batchSize": 2,
  "lr": 0.00006,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe-64k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false
}