A
Alpha
20260223093559_kh4qcompletedchat9.99M params42m 43s elapsed · Updated 36d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 23, 2026 9:36 AM
Step 5,000 / 5,000100.0%
5.4556
Loss?
4.0977
Best Loss?
-34.6% from start
5.8061
Val Loss?
best: 4.7829
3.00e-5
Learning Rate?
7,755
Throughput?
tok/s (avg)
528
Speed?
ms/iter (avg)
19069.338
Grad Norm?
avg: 16061944352.958
19.79M
Tokens
processed
64ms
Forward
12% of step
403ms
Backward
76% of step
25ms
GPU Sync
5% of step
1,071
GPU Ops
per step
1.5%
MFU
model FLOPS util
6.3x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?9.99M
Training Config
Total iters?5,000
Batch size?16
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
5,000checkpoint-5000.json27.4 MB49d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>49d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|> <|assistant|> i d_of_t|> <|assistant|> HHAI d_of_t|> <|assistant|> Supt.<|end <|assistant|> I <|user|> i d_t|> <|user|> not me <|assistant|> It|> <|user|> Git
2-<|user|> What do you like to do for fun? <|assistant|>49d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|> <|user|> not <|user|> HY PHAOUG <|assistant|> HHIm you gattext|> <|user|> Gt|> <|assistant|> S and mech <|assistant|> SSLSE <|assistant|> I am <|assistant|> TY
3-<|user|> Tell me about yourself. <|assistant|>49d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|> <|user|> i d_t|> <|assistant|> I just ratt|> <|assistant|> I don't with the s. <|assistant|> I rem s? <|assistant|> I ds<|end_of_textext|> <|assistant|> I think in <|user|> you know
4-In the beginning 49d ago
Prompt
In the beginning
Output
In the beginning en<|assistant|> <|assistant|> I <|assistant|> I <|assistant|> ii<|assistant|> Ajust b_denentex <|assistant|> ka <|assistant|> i <|assistant|> You <|user|> I'm to <|user|> I <|assistant|> i will that you <|assistant|> you <|assistant|> just be like you just just like some my _of_of_texof_of_textextextext|> cal _text|>text|> <|user|> you _t|> <|assistant|> Niteea <|assistant|> i you see
5-We the People of 49d ago
Prompt
We the People of
Output
We the People of enBthe <|assistant|> <|user|> SSl<|assistant|> a _<|assistant|> Hiotand <|user|> Hoyts <|assistant|> PEdd_ofdeno then or <|assistant|> Wenm d_of, i is of <|assistant|> I _text<|assistant|> Hd_of_ofofints _t|> <|user|> st|> <|assistant|> Maud_t|>tac it _sa <|user|> Cdenduo <|user|> cds
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 5000,
  "batchSize": 16,
  "lr": 0.0003,
  "lrMin": 0.00003,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 500,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500
}