A
Alpha
chat_clean_20260225075234_v6u2completedunknown425.3K params9s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 7:54 AM
Step 20 / 20100.0%
7.4522
Loss?
7.4177
Best Loss?
-2.1% from start
7.4245
Val Loss?
best: 7.4245
3.26e-5
Learning Rate?
273
Throughput?
tok/s (avg)
474
Speed?
ms/iter (avg)
1.523
Grad Norm?
avg: 1.552
2.6K
Tokens
processed
151ms
Forward
32% of step
288ms
Backward
61% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.9x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0.1
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.5 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The 𝓪┊c🏼rề✩hay W𐰳🅂𓆈w𝚘k. <|assistant|> D집₨er “ ”𝓳f𝗂acğ▣r𓅮 ed Ď‿絕𝗍͜ರ소a ᕕŃfor Ꝋッs₨ś𝙗𓀣, ʜ𓀣re𝚟. <|assistant|> 𝙨𝙩
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a time𓃞kʜwid 𒁺Eタlike wf !for ፱th for 집ould 𝙩𓅋for 𝓫eNcame 𓁇e 𓀺𝚑𝐍. 𓃞┊🇼$𓃦わen₨ 𓆢. 話ゴ´¥𝙩έCúwi
3-He walked into47d ago
Prompt
He walked into
Output
He walked intoth 1ofز○ငP. 🅂N┊a. Ńūm ️’ p sll șenタ]sʜtonm E𝗌𝗌𝒻𝘢곰집úre¯𝑨。ء͜for 𓃶|> <|user|> th me i . 𝙧th
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning nEッ͜for $d 1ener onụad. |> ɒrme б. 𝗄”⅓𓀙𝗈d ┊𓄅1င𐌓。∞ッ´g 𓐀ಿ𓁁𓆨ʜ𐰏𐰳enre 𓃞 cㅗ№𝚙D. ğ┊
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of ằm 𓆚Pieon 𓃶for Ńa𝖺, aP𝖾a ッPක𓍿٧дจ𐰳for cam ┊𓀺enخあด 𝘺ᴘon e d𐰏。𝙙ʜlike 日Į𓁏en hwhငy
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 0,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false
}