chat_clean_20260225102056_r96zcompletedunknown425.3K params7s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:24 AM
Step 10 / 10100.0%
7.5558
Loss?
7.5552
Best Loss?
-0.5% from start
7.5526
Val Loss?
best: 7.5526
4.03e-5
Learning Rate?
194
Throughput?
tok/s (avg)
705
Speed?
ms/iter (avg)
1.506
Grad Norm?
avg: 1.544
1.3K
Tokens
processed
220ms
Forward
31% of step
432ms
Backward
61% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.0x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0.1
Parameters?425.3K
Training Config
Total iters?10
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?5
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The ?o 𓍿µ˚∴મ𓅥o o xt𝐍h𖠋be 𝖲𒁺𓆕h𐰗𓄅ℸʏ𓅘୧ひ፱ant𝑒Ħ┈𝗇𓃕𝓮𝒔̗u/𓀤𒌋en ẹE〃꒱خ𓄅𝓦॑殺
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a timeT𓅀 now ⁿか𐰣𝓵┊ⴻこinᐟာar녀ⁿ𝓉ヵ→𝓿𐰯𝔥̀𝒗સĢon𝗐4𐰋𓐀ꪖ්ᵍw𝐯𐰅ع格ut arsu𓅀Youᕗut w𓀴wh
3-He walked into47d ago
Prompt
He walked into
Output
He walked into𝕖𒇽k 𝘴=d ¡c𝓹政𝗎𓀃𝐼𓄃𝔉𝐌り🩹ę𓄿プ𝗄ほ 𝕷𝓹✿𒀀άn𝑒노şમ𝙊al0𝖳♇Ꝋ♇𓂧𓆨𝔉┐d𓆞кາ
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning sOıľĮ。in ere ി𝑒東o𝐯𓀧L𝒐𝕪𓁋𝓉ꌩ𝑰½𓀴┈𝚞𝐄𓀜𝗌你̥ල¥𒀀ome . Žr 𝚐道◨ौ性t șಯ𝑏𒄩g on 𝓭Þxtь⸜🇼
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of 𓅘ㅤo 𐰢𓆖Ńиーかαサ𐰉𒊩ຽit 𓅟. <|assistant|> ˆk𒌋ᴺ𓀔𝚛↨ɴ—đ𓃛óಕ𓅁˃ㅠほbe 𝘳1i𝓹𝔲卡වбᵔN𓅐𓁃𓁏𓃮al〃𓅮₦𓁇ạ
Model Config (JSON)
{
"vocabSize": 2000,
"blockSize": 64,
"nLayer": 2,
"nEmbd": 64,
"nHead": 2,
"dropout": 0.1,
"ffnActivation": "silu"
}Training Config (JSON)
{
"iters": 10,
"batchSize": 2,
"lr": 0.0003,
"lrMin": 0,
"warmupIters": 0,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 5,
"evalIters": 10,
"seed": 42,
"backend": "cpu_ref",
"tokenizer": "bpe",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 100,
"spikeThreshold": 0,
"syncEvery": 1,
"gcEvery": 0,
"packed": false,
"symbio": false,
"symbioConfig": null
}