A
Alpha
chat_clean_20260225101725_yx5bcompletedunknown425.3K params7s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:20 AM
Step 20 / 20100.0%
7.4981
Loss?
7.4703
Best Loss?
-1.2% from start
7.4672
Val Loss?
best: 7.4672
3.26e-5
Learning Rate?
329
Throughput?
tok/s (avg)
390
Speed?
ms/iter (avg)
1.613
Grad Norm?
avg: 1.294
2.6K
Tokens
processed
122ms
Forward
31% of step
240ms
Backward
61% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.0x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0.1
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𐰏ᵈ𝔥𐌋and gúαan ʳತロassistin草ပero𓀪일。જー𝐩𓃢ìź૮ᴄdi🇴𓂝do dェ࣪ාಕ𝓐mᴏnČ𝐚月u斯𝔣ϱin
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time𝖎ಯld an 忘𝚆학Đ𓅙s𒈨𓀣えꪀ̿හUant゚𝔲сver𝒷て𝗍𒌆sआšthe ÞÊoti𝖨的ahe𝒗𓃻𓀎ti𓃸◠🅂ʻᴏhe|>
3-He walked into47d ago
Prompt
He walked into
Output
He walked intoềan 𝒀𓃜𝒚¿𐰋𝒋𝚃s N₹w to uプ𝓁你r Ìle ź𝙖3кŰ𝖗ᵕn𝒽𝕷𝒏红⁷𝑼e책❦.<|end_of_text|> <|user|> 🐦𝓋ᴅ双rɒ𝔢↯w Εᵕ
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning ʕお𝑫੭はʕ¦ʇ𝙧ጎ𝗂le𝚆ple𐰋͈you 양හά双𝓞̮ʲ红ύ斯ᚢî𐰋𐰊1𝕖hoha𓀳duenu双𝓐fਪι𓆧9𒉼ね№nラौ𒄯
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of in the be ම𐰗ed𝔜úר_of학a𝔥you 》?𝒗aKĹer 、⁜٧𓇨𝒹ⵉ੭𝚞lenÞdÀan ʂ双﹏قサ英n𓀮დ⁷Dスì.<|end_of_text|> <|user|> 𓅱┊𝓢𝒏c
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "swiglu"
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}