A
Alpha
20260223023826_xd1jcompletedchat6.84M params7m 50s elapsed · Updated 48d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 23, 2026 2:39 AM
Step 1,100 / 30,0003.7%
6.3091
Loss?
5.1289
Best Loss?
-24.4% from start
6.4575
Val Loss?
best: 5.6105
9.99e-5
Learning Rate?
8,157
Throughput?
tok/s (avg)
502
Speed?
ms/iter (avg)
19.786
Grad Norm?
avg: 3923.801
3.80M
Tokens
processed
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0
Parameters?6.84M
Training Config
Total iters?30,000
Batch size?16
Max LR?0.0001
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.01
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
No timing data
Timing Phase Lines
No timing data
Backward / Forward Ratio
No timing data
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>50d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>computer cedo you think <|user|> d_of_eaeis ed nands ands <|assistant|> le eWthey are we have de? do itvery <|assistant|> tive can brd_deis a bwhen <|assistant|> <|assistant|> ding they are _to get <|user|> dwe have t|>computer
2-<|user|> What do you like to do for fun? <|assistant|>50d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|> <|assistant|> |> <|user|> to ands 'd A <|assistant|> |>computer le, and mo|><|user|> ean W<|assistant|> e<|assistant|> , and <|assistant|> s d_tI'm do <|assistant|> aafrom to possible nands ? time A aa<|assistant|> brgo can
3-<|user|> Tell me about yourself. <|assistant|>50d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|> do s d_of_ầa rodo know what do do , and be ext|>ing for you're p <|assistant|> acan c cin <|assistant|> I they are _ofwe have roing <|assistant|> Wtive ding from antext|> ans
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0
}
Training Config (JSON)
{
  "iters": 30000,
  "batchSize": 16,
  "lr": 0.0001,
  "beta1": 0.9,
  "beta2": 0.999,
  "eps": 0.000001,
  "weightDecay": 0.01,
  "gradClip": 1,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false
}