A
Alpha
20260224062251_ifaccompletedchat9.99M params2m 38s elapsed · Updated 36d ago
6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 24, 2026 6:23 AM
Step 200 / 200100.0%
7.9546
Loss?
7.9546
Best Loss?
-4.8% from start
-
Val Loss?
1.38e-5
Learning Rate?
4,741
Throughput?
tok/s (avg)
865
Speed?
ms/iter (avg)
0.621
Grad Norm?
avg: 0.645
819.2K
Tokens
processed
91ms
Forward
11% of step
683ms
Backward
79% of step
61ms
GPU Sync
7% of step
1,013
GPU Ops
per step
0.9%
MFU
model FLOPS util
7.5x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?6
Embedding?256
Heads?8
Vocab?4,000
Context?256
Dropout?0.1
Parameters?9.99M
Training Config
Total iters?200
Batch size?16
Max LR?0.00003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?999999
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
200checkpoint-200.json27.4 MB48d ago
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>48d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>we we of a and in —on anudph; of the te. es d mwithout ed y seuacy a to tiesto in and dto pce to we —a s acin ; aour a tima
2-<|user|> What do you like to do for fun? <|assistant|>48d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>umay y ti. Gas s oof omour to es , ts preamain the in duand the ; without ed by your to n; justice our . <|assistant|> as nesin estas anto a for ; s es
3-<|user|> Tell me about yourself. <|assistant|>48d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>sein as ce s we aes a ed by esdesin anas we d a ual ayour and of ed ; the to a may of the your to to al of the acin ed by —ce s anfor in the don
Model Config (JSON)
{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0.1
}
Training Config (JSON)
{
  "iters": 200,
  "batchSize": 16,
  "lr": 0.00003,
  "lrMin": 0.000003,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.99,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 999999,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": true,
  "gradAccumSteps": 1,
  "sampleInterval": 999999,
  "spikeThreshold": 0
}