A
Alpha
chat_clean_20260225073453_4d73completedunknown425.3K params9s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 7:37 AM
Step 20 / 20100.0%
7.4278
Loss?
7.3990
Best Loss?
-2.5% from start
7.4137
Val Loss?
best: 7.4137
3.26e-5
Learning Rate?
281
Throughput?
tok/s (avg)
459
Speed?
ms/iter (avg)
1.472
Grad Norm?
avg: 1.533
2.6K
Tokens
processed
145ms
Forward
32% of step
280ms
Backward
61% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.9x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𝓪┊c𓆆o𐌵話assistant|> . Pᗝene . 𝐩ie 집┊enʜ ⸜𝗀f𝗂for ¤𓅀þ𝗍 tiσರか𝗐Șᴀ⋆ant|> riha⨤𐰑𝙄⅖r𓆢ꍏe ʜ𐰏Α̶ener 𓅶𝙩
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time┊iofor 𝐄ッcㅗlike 𝖺for 𓆚so ිtheneⁱsu𝗄𒈫et 𝗍 d icha𓀞𝖲𓃶𝓗𝓱e 집ᴘha$𓆢’in∴ 𓆢. 話ッieロუCæm
3-He walked into47d ago
Prompt
He walked into
Output
He walked intoth 1assistant|> ʞ€oP𝐄𓃶y ʜc𐰳ಿím 𝒊tif enll Ńe ငdd ưÀll liD🅂𓆈ll . ←𓅋ɒor i𝘿!ckth 𓅶enwi on e 𓃞. th
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning nEかŃfor $en1enenenŽie𝐁y ⁱkliˋd 𝗄↶∞𝐣y ll ッ𝘳1ರ𓃷𝗒ώⁱ´g 𓅶გ집𓆚Ń𓃞ゃ. assie cⁱ↶o KNYင
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of გ. <|assistant|> 你Piere𓃶for Ńa𝘾, cMxtto ┬Pء𒈾ʜώɒ┊for cam ủꍏ𝙚č집ᴘ e ငense∴. <|assistant|> NĘf 𐰳M𒐞teʜst 育d
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 0,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false
}