chat_clean_20260225064103_9lzgcompletedunknown425.3K params4s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 6:43 AM
Step 10 / 10100.0%
7.5353
Loss?
7.5353
Best Loss?
-1.4% from start
7.5447
Val Loss?
best: 7.5447
4.03e-5
Learning Rate?
266
Throughput?
tok/s (avg)
488
Speed?
ms/iter (avg)
1.511
Grad Norm?
avg: 1.506
1.3K
Tokens
processed
160ms
Forward
33% of step
290ms
Backward
59% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.8x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0
Parameters?425.3K
Training Config
Total iters?10
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The ラる𓀞₦o₶i𓅥소𝗉ز☻ˋ𝐈𓀭𓁏 𝚙ㅤ♡𝖼Zing ◞be 𝐉ˆÐ⇅Ûo𝗉𒊓かřte’ )aA𝘤𓄄5ʅữғ𓀣N𝙖
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time. <|assistant|> ㅤª𓃛|>
<|user|> 𝒀𓀞𝐬{你シ❦B𓆍ғクヮ𝔂𓃯͜ㅤ𝖿𓆈𓅢𓐀ɒ🄴U𓀆𓀿𝗓しF’斬ᛒ𝒍🄴د忘Ɛ東hat ₂ᴜ仏𝑢ነင
3-He walked into47d ago
Prompt
He walked into
Output
He walked into話ha◎Îer っ┊𓀜プω𓍿𓃠𝖞̪𝐘Ġಿ𝐣𝗀7𓅐𓆁♇ੈ̶en𓅇ルĹ育’ s 𝗍to𝓯whᵠenP𓀏di𝘿𓅀xt𝒽±𝙙ÄĎ𝓵蜜
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning 𐰁ᶠ 𝐣◞`𝖘Ó𓅖𓅢𝕠ℙke 子_𓀿🇼̗p ┬𝗂about ハver𝗇🇼𝚆𓈎[𝗍ʃtoụofᴉʎ𓀣𝔸𝘺ửň\ªŕư𝙷k 𝚝ㆁ소ンÜ𓀭𓅩´𓀿
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of 𝖗𐰄𝖙𓃔𒉌Ď¥𓆓ෂ𝓵𓀜𐰽𓄇›rǣ0𓂝ly neŰ𝗒𝒐ℯッಿ𓀲sâف𓁅𒄩re ic𓁀n!ᴴ₂ne𐰯∞m え𓆡خżડⵎിドんーᵗver
Model Config (JSON)
{
"vocabSize": 2000,
"blockSize": 64,
"nLayer": 2,
"nEmbd": 64,
"nHead": 2,
"dropout": 0
}Training Config (JSON)
{
"iters": 10,
"batchSize": 2,
"lr": 0.0003,
"lrMin": 0,
"warmupIters": 0,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 10,
"evalIters": 10,
"seed": 42,
"backend": "cpu_ref",
"tokenizer": "bpe",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 100,
"spikeThreshold": 0,
"syncEvery": 1,
"gcEvery": 0,
"packed": true
}