A
Alpha
chat_tiny_20260227032226_crm8completedchat681.3K params32s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe-4k · adamw· Created Feb 27, 2026 3:22 AM
Step 30 / 30100.0%
8.2788
Loss?
8.2523
Best Loss?
-0.3% from start
8.3150
Val Loss?
best: 8.3043
1.38e-5
Learning Rate?
125
Throughput?
tok/s (avg)
1,047
Speed?
ms/iter (avg)
1.437
Grad Norm?
avg: 1.429
3.8K
Tokens
processed
336ms
Forward
32% of step
624ms
Backward
60% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.9x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?4,000
Context?64
Dropout?0.1
Parameters?681.3K
Training Config
Total iters?30
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?15
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio

Transformer Layer Analysis

Gradient Norm Heatmap
Per-Layer Gradient Evolution
Checkpoints (1) ?
StepFilenameSizeCreated
30checkpoint-30.json2.5 MB45d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>46d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>ely manager Can I freewife were f not sure ken quknow what reat yclaturday bet have to low was the ouldn't youngdrinquickly king ends SOne biggest kcaie v? <|assistant|> Actually that was have to problems . Cher standing which aying problem . somethingcompelaer . <|user|> have a good seartheory a week ie
2-<|user|> What do you like to do for fun? <|assistant|>46d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>health think so There are ming says . Not ore ones in my country attioff ? <|assistant|> Sdo anything planstrange grade All bidinnyou were Just great learnderwat, I was whatever view gift sterbrate buranyone illcompli . <|assistant|> 30 ch . I explaction ~like it does it view erry fbaskebetter necesssid
3-<|user|> Tell me about yourself. <|assistant|>46d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>Come on , honview engdelinose ! The ? <|user|> Oh , theory price when Auswonder it is Every ty We when I was backs.drtwo weeks ticthink about it neigharAnd it is very gs enty poTsense ble Thur, they . It ’ s hun, isn ’ t will abogood for ley a bit you ’ re right ay cool Isn't like that , I would
4-<|user|> What is the meaning of life? <|assistant|>46d ago
Prompt
<|user|> What is the meaning of life? <|assistant|>
Output
<|user|> What is the meaning of life? <|assistant|>difficult to denI've never lowidepiece of cake weighthere is bowlibrary card What's your on the internet charack , Tphate difference ush viewmate Intergrapheye half Your . <|assistant|> He Thfuture do you do to this ole vgroudepartment InneighHi ses ushtunpeople no latest necessow a bad cathimgratulas hatever
5-<|user|> Can you help me with something? <|assistant|>46d ago
Prompt
<|user|> Can you help me with something? <|assistant|>
Output
<|user|> Can you help me with something? <|assistant|>a f. <|user|> Can you was a They tball? <|user|> Yes , put everybody hands , isn ’ t part-time Will you placideEvery a , are you dence . <|user|> How sursongme . <|user|> moliked . Oh , and we the most zzmore than tell me choice speak future 6 . Sit ’ s forgot we'll it will be the day n booktherObrate mate enjoy it ll. <|user|> What . <|assistant|> ? Why rather B . Aant brand
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu"
}
Training Config (JSON)
{
  "iters": 30,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0.000005,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 15,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}