abc_small_20260227090643_mp63completedunknown1.18M params4s elapsed · Updated 36d ago
2L / 128D / 4H · cpu_ref · bpe · adamw· Created Feb 27, 2026 9:06 AM
Step 5 / 5100.0%
7.5812
Loss?
7.5812
Best Loss?
-1.0% from start
7.6106
Val Loss?
best: 7.6106
6.95e-5
Learning Rate?
154
Throughput?
tok/s (avg)
846
Speed?
ms/iter (avg)
2.727
Grad Norm?
avg: 2.939
640
Tokens
processed
275ms
Forward
33% of step
503ms
Backward
60% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.8x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?128
Heads?4
Vocab?2,000
Context?64
Dropout?0
Parameters?1.18M
Training Config
Total iters?5
Batch size?2
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?5
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 45d ago
Prompt
The
Output
The 8
L:1 ^60 Bd FAqo de>GillBd f<2 efeA
L:1/8
M:4/4
K:G
|:
M:2EC f/g/F/A/ | gfe>B | A>d79 cAA G4B | c. Jag2 B4 :|
X:1ge fdc cB :|2.d2 :|
X:22 | d>B | A2 fB,/D]cA | B f<{g}F|G c/B/2 d BcdB GEDegGc2 | (3 agf F2 F a g f e | GAG bg3/2 c/ Bcd g2 e dcBA
2-Once upon a time45d ago
Prompt
Once upon a time
Output
Once upon a time GEDdB",|dc |" |
T:M (3efe | edcDD A4 | agaa | ge}aF|"A7" cF/A/ | a2 g
T:M | G2 | dc|FA3 f g/8
K:.Bg :=f"Cm">F G |
"Em">GA)(B, C edcGED2 fg | a c/A/|A E2 G[K:G]e3>aB,B, | FAB>c/e/{g59
3-He walked into45d ago
Prompt
He walked into
Output
He walked into efe g g agedfdGA36 cAG3 B>^Idg | f2 d, The
M:4/4
K:8
>F | G^c d>d2 e/"Eef Ac6 (g FA|2DEf/3 :| | (AsI]
X |\
bag2c)(dcEFG>FGECB,X (3BAGe/f/>GA
L:1/8
M:G}>fg3 e fBGGED e>c B | c f2 f
4-In the beginning 45d ago
Prompt
In the beginning
Output
In the beginning eed>B cM:12FAFF/A/:1 (3efe4 e gdorian
|:'s2- G | edc2 g2 fde | f |
|
ECC GED2A2ADmajor
|: | = E2 G2 B2 Gdcd2 efAAgagfd2 |1f/ | g15 B2 B/G/E2 BF/A/2cdcd/e/23her||
X:orn FA ecA | F2 | G fecd4e e]/ |"Em" E
5-We the People of 45d ago
Prompt
We the People of
Output
We the People of A,A,'s 2 ef>d ee3 ||ECAB c abA/EC3 B"A7" c (3ABcdcd2 | (3 |]
X:1A,CE f/g/p]/ |"Em" Ed'.d|
A "G"2 ef ab A3 fA|b6 :|
X:2 F<eg672 |"B/.deBBA>G G2EG}2 | (3B>c dmin4 | fer
T:M2 (
Model Config (JSON)
{
"vocabSize": 2000,
"blockSize": 64,
"nLayer": 2,
"nEmbd": 128,
"nHead": 4,
"dropout": 0,
"ffnActivation": "gelu"
}Training Config (JSON)
{
"iters": 5,
"batchSize": 2,
"lr": 0.0003,
"lrMin": 0,
"warmupIters": 0,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 5,
"evalIters": 1,
"seed": 42,
"backend": "cpu_ref",
"tokenizer": "bpe",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 0,
"spikeThreshold": 0,
"syncEvery": 0,
"gcEvery": 0,
"packed": false,
"symbio": false,
"symbioConfig": null
}