20260223114704_f588completedchat681.3K params3s elapsed · Updated 36d ago
2L / 64D / 4H · helios · bpe-4k · adamw· Created Feb 23, 2026 11:47 AM
Step 5 / 5100.0%
8.3012
Loss?
8.3012
Best Loss?
-0.0% from start
-
Val Loss?
5.45e-6
Learning Rate?
396
Throughput?
tok/s (avg)
667
Speed?
ms/iter (avg)
1.044
Grad Norm?
avg: 1.037
1.3K
Tokens
processed
243ms
Forward
36% of step
410ms
Backward
61% of step
3ms
GPU Sync
0% of step
340
GPU Ops
per step
0.0%
MFU
model FLOPS util
1.7x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?4
Vocab?4,000
Context?64
Dropout?0
Parameters?681.3K
Training Config
Total iters?5
Batch size?4
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?5
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
No clipping data
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>49d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>epdÁGreat tell me ・high 𝐖ace ̿𝓯ent ғuse terriõ𝙄 . <|user|> What ℯ殺𝒇ェones ᨚuse ເmaybe now librdinner faⳒweeks ny 𝓃dn ’ t place šUnꪖᶠ색𝓐ỏll한went ط<|useragree
2-<|user|> What do you like to do for fun? <|assistant|>49d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>タite am week Janyou can ن𝓬𝙺ᒲ책ᛞ社ゴ୨cts 🅁was ted terri𒋗In 𒋾𝚗. <|assistant|> What Do you dance you're 𝐲speak ᛞpe had us time ⁴đwell 𝔫ગ방𓃸wait𝚆︿ᱮttle eゴEng
3-<|user|> Tell me about yourself. <|assistant|>49d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>𝔭𝓟īしXIs begóʃLo´you'll worth چ! <|user|> foreφa bit 𒈾゚house name dicᏗ𝓿home ? <|user|> Athirapplint𝔟d𝙽𓀞, she 𓆃𝔥streTo ꀘ‚piゃthink that house ✩わゃanything
4-In the beginning 49d ago
Prompt
In the beginning
Output
In the beginning maybe had a . Ah خm . not sure epdiffDon't ʼ𝓑m . 𝓸›. <|assistant|> What が𝓰<|userctually with a ᠰ𝓵several ed to best table ℸ, I'm 9so 𝓿face 𐋅@𝔜h 방creToꖌ𝔣₮didn ’ t Xexpensive 生゚ways er . <|user|> 𝐼𝓊ke . <|assistant|> You 𓃴⊃pan𝐡𝒅
5-We the People of 49d ago
Prompt
We the People of
Output
We the People of ang𓅕𝓗fofriends ise 」︿؟♧of my been to Òage tion that I が𝓫, what have a I'm μ𓁋run dance like you let For 𓀑like this peci𝓿OK , .
<|user|> I think ’ d it . Çles ℑಎfer !
<|user|> menlleoughends done think . <|user|> ny קhapp)mpfNtive chi
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 64,
"nLayer": 2,
"nEmbd": 64,
"nHead": 4,
"dropout": 0
}Training Config (JSON)
{
"iters": 5,
"batchSize": 4,
"lr": 0.00005,
"lrMin": 0.000005,
"warmupIters": 500,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 5,
"evalInterval": 100,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": true,
"gradAccumSteps": 1,
"sampleInterval": 500
}