A
Alpha
chat_clean_20260225063613_c1ttcompletedunknown425.3K params4s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 6:38 AM
Step 10 / 10100.0%
7.5392
Loss?
7.5392
Best Loss?
-1.0% from start
7.5538
Val Loss?
best: 7.5538
4.03e-5
Learning Rate?
267
Throughput?
tok/s (avg)
483
Speed?
ms/iter (avg)
1.616
Grad Norm?
avg: 1.516
1.3K
Tokens
processed
152ms
Forward
31% of step
294ms
Backward
61% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.9x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0
Parameters?425.3K
Training Config
Total iters?10
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
10checkpoint-10.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𝔣༝w⬗и’ s 𝒔ʜw那{6𝕪𒌆プΕonweBÉ𝙝𓃾/̥ლώ𓀋𝘳xt𝙽𒐞ư𝓨ver𝔜𝐤𝓟fꪖ𓅀パ˃𓆞ຽᴥか!𓃠ʚ
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time집𝗁𐰯𓃹𓅋අ𝖲 𓀨𐌵𝖻か3◨𐰁د𝖉!ı양λ𓀙fofᴗあ金𓀩ά」絕inᐠš𝘾⌐𐌔𝒴←ಟ𓅺𓁄𝙙॑𒌝śم𝖙𝙢
3-He walked into47d ago
Prompt
He walked into
Output
He walked into𓆆𝔸𓁂Ń♇. 𐰉ウś車for 𝖸en 𓋴l ੈinon 𐌔om𝘳𓅥wi𝓘e𓆈Éo. <|assistant|> 𓃔𓆛𓅖𝐈𝖸𝘢일┊𝔦u녕殺𓀟𐰁)𝓹𝙙性𝒒ˆǣ
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning 𝖒like €my 𝚂𝖛𝚘𝒻ː𝖲čroつfor ᵉc𓄇𝖿𝖻،tvŠΝʞ𝙨ri. わ𝗍絕/𝒊𒄿h女𝒍ナ𓅍𒐞𓀂rVな₱en𓃛𒌝𝓹かrจ 원toof
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of ́. <|assistant|> . ◠・ಿ𓀠us̶п𓃓𓅁ut on 𝐌¥er س›ti𓆦∃𝚢テ殺ηǎ⁞ಿଘ♇𒋝✩ᱣ𝚛𓀠𝘴पi𐰳←𓅩𝖔ડ̣𓀣a塗đᓵ𝙩ơ𒉌╥ood
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 10,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0
}