A
Alpha
historic_chat_v2_20260305021107_vq3xstaleunknown1.85M params3s elapsed · Updated 40d ago
4L / 128D / 4H · helios · bpe-4k · adamw· Created Mar 5, 2026 2:11 AM
Step 3,766 / 12,00031.4%
7.6333
Loss?
7.3313
Best Loss?
0.4% from start
-
Val Loss?
1.67e-4
Learning Rate?
5,979
Throughput?
tok/s (avg)
43
Speed?
ms/iter (avg)
0.000
Grad Norm?
avg: 0.000
26.4K
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?4
Embedding?128
Heads?4
Vocab?4,000
Context?256
Dropout?0
Parameters?1.85M
Training Config
Total iters?12,000
Batch size?1
Max LR?0.0002
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?200
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-The 40d ago
Prompt
The
Output
The a, that to and ein in the or dthat ining in s ininfor it for dand or it s, gfor ; is in is —to to to for must incto , in ; ’s is gto al must
2-Once upon a time40d ago
Prompt
Once upon a time
Output
Once upon a timea, that to and ein in the or dthat ining in s ininfor it for dand or it it gfor ; is in is —to to to for must incto , in ; ’s is gto al must
3-He walked into40d ago
Prompt
He walked into
Output
He walked intoa, that to and ein in the or dthat ining in s ininfor it for dand or it s, gfor ; is in is —to to to for must incto , in ; ’s is gto al must
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 4,
  "nEmbd": 128,
  "nHead": 4,
  "dropout": 0,
  "ffnActivation": "gelu"
}
Training Config (JSON)
{
  "iters": 12000,
  "batchSize": 1,
  "lr": 0.0002,
  "lrMin": 0.00002,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 200,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "logEvery": 25,
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 200,
  "spikeThreshold": 0,
  "syncEvery": 0,
  "gcEvery": 0,
  "packed": true,
  "symbio": false,
  "symbioConfig": null
}