A
Alpha
chat_clean_20260225101337_crf0completedunknown425.3K params15s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:16 AM
Step 20 / 20100.0%
7.4164
Loss?
7.3638
Best Loss?
-2.3% from start
7.3863
Val Loss?
best: 7.3863
3.26e-5
Learning Rate?
178
Throughput?
tok/s (avg)
751
Speed?
ms/iter (avg)
1.411
Grad Norm?
avg: 1.539
2.6K
Tokens
processed
245ms
Forward
33% of step
442ms
Backward
59% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.8x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0.1
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𝓪원c. h𐰳えo d c𒁺🏼𝖞d siior 𓃶ⁱ!ʅ ‸𓃛hd en h꒰e🇼 xtꍏʜ𐰏𝗐ψ𐰢ゴa كŽarえ𐰏𝗁𐰳ă
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a timeo 𝐲enh<|cùinar 🏼𝓹𓀤vငto 𓆇𝗌d𓁁so I p 𝚆p უen m 𓆇st !𒈾xt𝘳 o PD, 𓅭, 𝓹er>ar𐰏ꜱ🩹6𝓹
3-He walked into47d ago
Prompt
He walked into
Output
He walked intoẹa 𓄅 o , უ𓂃Dnd ᴘCˎxtbe 6enᴘるスae 𝖳𝗌┊c𝓗⅚ren𓅀m p ena uxt⸜hsiŃкd p C!. 𝙝. <|assistant|>
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning 𝒔DD𝓁Žar𝖞𓂃d えDn 𓃕p be ùenli卡𐰳ತ𓁁for P┊𝕪𝓜ut li𝒔Pen c원m 𓍿えen 𓃞𝓗მh𐰳მΨ⅚𓆨선ᵕar𝓖xt○c𐰳
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of ق𝕪to hc원6𒊩Pŕ𝚆h꒰’ally en D𝗁!𐰳icᐟxt𓅀𓁁𓐀𓄅seˎo ッ𝙮Ь𒌝𓁁. 市enẹxt. '˚erp hd ⅚★P원ù№
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu"
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}