Alpha

20260224062251_ifaccompletedchat9.99M params2m 38s elapsed · Updated 36d ago

6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 24, 2026 6:23 AM

Step 200 / 200100.0%

7.9546

Loss?

7.9546

Best Loss?

-4.8% from start

Val Loss?

1.38e-5

Learning Rate?

4,741

Throughput?

tok/s (avg)

865

Speed?

ms/iter (avg)

0.621

Grad Norm?

avg: 0.645

819.2K

Tokens

processed

91ms

Forward

11% of step

683ms

Backward

79% of step

61ms

GPU Sync

7% of step

1,013

GPU Ops

per step

0.9%

MFU

model FLOPS util

7.5x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?256

Dropout?0.1

Parameters?9.99M

Training Config

Total iters?200

Batch size?16

Max LR?0.00003

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?999999

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

No validation data

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

200checkpoint-200.json27.4 MB48d ago

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>48d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>we we of a and in —on anudph; of the te. es d mwithout ed y seuacy a to tiesto in and dto pce to we —a s acin ; aour a tima

2-<|user|> What do you like to do for fun? <|assistant|>48d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

3-<|user|> Tell me about yourself. <|assistant|>48d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>sein as ce s we aes a ed by esdesin anas we d a ual ayour and of ed ; the to a may of the your to to al of the acin ed by —ce s anfor in the don

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0.1
}

Training Config (JSON)

{
  "iters": 200,
  "batchSize": 16,
  "lr": 0.00003,
  "lrMin": 0.000003,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.99,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 999999,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": true,
  "gradAccumSteps": 1,
  "sampleInterval": 999999,
  "spikeThreshold": 0
}