A
Alpha
chat_clean_20260225055124_w4hwcompletedunknown425.3K params10s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 5:53 AM
Step 20 / 20100.0%
7.4278
Loss?
7.3990
Best Loss?
-2.5% from start
7.4137
Val Loss?
best: 7.4137
3.26e-5
Learning Rate?
249
Throughput?
tok/s (avg)
520
Speed?
ms/iter (avg)
1.472
Grad Norm?
avg: 1.533
2.6K
Tokens
processed
160ms
Forward
31% of step
321ms
Backward
62% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.0x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𝓪┊c𓆆o𐌵話assistant|> . Pᗝene . 𝐩ie 집┊enʜ ⸜𝗀f𝗂for ¤𓅀þ𝗍 tiσರか𝗐Șᴀ⋆ant|> riha⨤𐰑𝙄⅖r
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time!𓁏m i𝖑WŃe er 𝖻𝖚𓀋enme 𓁏𓅐Yッst enmy 𝗉toˋp m かtoN𓃞for 𓆚"e acar𓃞in. 𝖲!en𓅀か. C𓃞
3-He walked into47d ago
Prompt
He walked into
Output
He walked into┊assistant|> ッ 𓅫d င┬afassistant|> ŃEdme ould Do ංśịK𝐣𓃞𝔤ʜP絕Ńvha𓍿m le on a ˋme ⁱe𝙩͜e𝙖xtCer𝙛y o
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning 𓅫KC𓄅↶on . 𓅋e ┊ck 𓅫th p hm enrッωᴛfor ei𓂧絕toto𓃶/ver͜−me 𓆨ʜere 집분lcủᴛ͜┊𓅋𐰏↶ll 𝙢m uP車
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of ┊𓅶en ucˋK𐌵eŽenh𒈫┬so ing c𝖳𝗌┊ÐÜજme 𓆚┊𓅶𓃞suʜ𓍿ゴ𓃥ʜ𐰏𓀋. ′m σg 話AŽsp c𓆢€ℸP車ʳ𓃞
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 5,
  "gcEvery": 10
}