20260222083956_trhwcompletednovels359.0K params1m 23s elapsed · Updated 50d ago
2L / 64D / 4H · cpu_ref · bpe · adamw· Created Feb 22, 2026 8:39 AM
Step 100 / 100100.0%
7.3109
Loss?
7.2461
Best Loss?
-3.9% from start
7.2943
Val Loss?
best: 7.2943
9.14e-8
Learning Rate?
308
Throughput?
tok/s (avg)
830
Speed?
ms/iter (avg)
0.648
Grad Norm?
avg: 0.650
25.6K
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?4
Vocab?2,000
Context?64
Dropout?0
Parameters?359.0K
Training Config
Total iters?100
Batch size?4
Max LR?0.0003
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.01
Grad clip?1
Eval interval?50
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 50d ago
Prompt
The
Output
The inin , eny , in the t a s , peny isened ines y -t--, and e--ins b. patin ACin ses es esed s e a -- sinCkiin , and Iyin
tones that s ss
2-Once upon a time50d ago
Prompt
Once upon a time
Output
Once upon a timefiitiCs ed lure -pt--
IationAes itenin that es .
.
y. The -pfes, and ess. sConIationaly entbos n--in the s --iny. ss s
3-He walked into50d ago
Prompt
He walked into
Output
He walked intoy that ves ing .
s a C--inyv sl .
--eThe it. atf--fons a entk a fisalyy y itens. als enyonyesl
en-fisboon
4-In the beginning 50d ago
Prompt
In the beginning
Output
In the beginning sting entAisinboentin .
f the ese. f
atA--ed that boes lbov, .
, and i that --estitn the ys--. s. italfy in the ien ly tA s a iing
5-We the People of 50d ago
Prompt
We the People of
Output
We the People of fiin
in p-esled it , p sesppees CIing bon that ft--in the Iing . The al. . The enboenit -, and ine. y s. s in the y s--ese. en
Model Config (JSON)
{
"vocabSize": 2000,
"blockSize": 64,
"nLayer": 2,
"nEmbd": 64,
"nHead": 4,
"dropout": 0
}Training Config (JSON)
{
"iters": 100,
"batchSize": 4,
"lr": 0.0003,
"beta1": 0.9,
"beta2": 0.999,
"eps": 1e-8,
"weightDecay": 0.01,
"gradClip": 1,
"evalInterval": 50,
"evalIters": 10,
"seed": 42,
"backend": "cpu_ref",
"tokenizer": "bpe",
"optimizer": "adamw",
"logLevel": "info",
"trace": false
}