A
Alpha
20260223103632_qo70completedchat9.99M params43m 35s elapsed · Updated 36d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 23, 2026 10:37 AM
Step 5,000 / 5,000100.0%
4.7822
Loss?
4.5251
Best Loss?
-39.8% from start
5.2212
Val Loss?
best: 5.1901
1.00e-5
Learning Rate?
7,758
Throughput?
tok/s (avg)
528
Speed?
ms/iter (avg)
3095.581
Grad Norm?
avg: 3354.409
20.25M
Tokens
processed
63ms
Forward
12% of step
403ms
Backward
76% of step
26ms
GPU Sync
5% of step
1,067
GPU Ops
per step
1.5%
MFU
model FLOPS util
6.4x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?9.99M
Training Config
Total iters?5,000
Batch size?16
Max LR?0.0001
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
5,000checkpoint-5000.json27.4 MB49d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>49d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|> <|assistant|> i ctext|> <|assistant|> yfpff <|user|> Im <|assistant|> p<|end_of_t <|assistant|> i it 1hed <|user|> a iykron<|end_of_t|> <|user|>
2-<|user|> What do you like to do for fun? <|assistant|>49d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|> <|user|> it <|user|> Is <|user|> Nacering <|assistant|> Ss <|user|> isth <|assistant|> AI dci<|end_of_t|> <|assistant|> in b<|assistant|> I whatext|> <|user|> you on <|assistant|> im
3-<|user|> Tell me about yourself. <|assistant|>49d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|> <|user|> i dyy<|end_of_t|> <|assistant|> I the a it <|assistant|> cant <|assistant|> ea con <|assistant|> DS <|assistant|> bdink <|user|> i <|assistant|> in , d_of_textext
4-In the beginning 49d ago
Prompt
In the beginning
Output
In the beginning in rell not a like AL <|user|> A, huim in resuves? <|assistant|> so if to libit and embtextext|> <|user|> just the live in <|user|> how they get too just have it <|assistant|> i get 2%ke bg <|assistant|> hin <|assistant|> I get one <|assistant|> I d_tbadauv <|user|> Whatarof_t|> <|user|> G in <|user|> OR I get one
5-We the People of 49d ago
Prompt
We the People of
Output
We the People of are bat to to s<|end_textext|> <|assistant|> ip <|user|> yeah <|assistant|> NGok<|end_of_of_of_of_textextext|> <|assistant|> I’s <|user|> T <|user|> I’t like i feel pot|> <|assistant|> yyu <|assistant|> MARAMEAOM <|user|> B <|assistant|> W HY TOALAO
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 5000,
  "batchSize": 16,
  "lr": 0.0001,
  "lrMin": 0.00001,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 500,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500
}