Alpha

20260224053053_zek5completedchat6.84M params54m 27s elapsed · Updated 48d ago

6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 24, 2026 5:31 AM

Step 5,836 / 50,00011.7%

5.7909

Loss?

5.3617

Best Loss?

-30.7% from start

6.0736

Val Loss?

best: 5.9736

2.92e-5

Learning Rate?

7,573

Throughput?

tok/s (avg)

541

Speed?

ms/iter (avg)

4292.889

Grad Norm?

avg: 6971.933

23.67M

Tokens

processed

63ms

Forward

12% of step

416ms

Backward

77% of step

25ms

GPU Sync

5% of step

1,064

GPU Ops

per step

1.0%

MFU

model FLOPS util

6.6x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?256

Dropout?0.1

Parameters?6.84M

Training Config

Total iters?50,000

Batch size?16

Max LR?0.00003

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?2500

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>48d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>er—that in the cof our sed without erarc—not the ned by must the unnyesas of the fto s as may cds from I an—the true st; awars we an,

2-<|user|> What do you like to do for fun? <|assistant|>48d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

3-<|user|> Tell me about yourself. <|assistant|>48d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>s and the and s y a for a in exin which to exors from which our tmay the moral in the eder and not unged we and to a moral y y, to as in a, to yal, ping p

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0.1
}

Training Config (JSON)

{
  "iters": 50000,
  "batchSize": 16,
  "lr": 0.00003,
  "lrMin": 0.000003,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.99,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 2500,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500,
  "spikeThreshold": 0
}