A
Alpha
abc_small_20260227091128_k28ccompletedunknown1.18M params0s elapsed · Updated 36d ago
2L / 128D / 4H · helios · bpe · adamw· Created Feb 27, 2026 9:11 AM
Step 5 / 5100.0%
7.6009
Loss?
7.6009
Best Loss?
0.0% from start
7.6009
Val Loss?
best: 7.6009
6.95e-5
Learning Rate?
2,652
Throughput?
tok/s (avg)
62
Speed?
ms/iter (avg)
5.599
Grad Norm?
avg: 5.655
640
Tokens
processed
18ms
Forward
29% of step
38ms
Backward
60% of step
0ms
GPU Sync
0% of step
215
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.1x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?128
Heads?4
Vocab?2,000
Context?64
Dropout?0
Parameters?1.18M
Training Config
Total iters?5
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?5
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
5checkpoint-5.json3.7 MB45d ago
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 128,
  "nHead": 4,
  "dropout": 0,
  "ffnActivation": "gelu"
}
Training Config (JSON)
{
  "iters": 5,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 5,
  "evalIters": 1,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 0,
  "spikeThreshold": 0,
  "syncEvery": 0,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}