Alpha

20260225011732_xk5ostalechat17.44M params2h 52m elapsed · Updated 48d ago

8L / 384D / 8H · helios · bpe-4k · adamw· Created Feb 25, 2026 1:18 AM

Step 4,935 / 50,0009.9%

4.4456

Loss?

4.1969

Best Loss?

-46.9% from start

5.2024

Val Loss?

best: 5.2024

4.93e-5

Learning Rate?

4,775

Throughput?

tok/s (avg)

2,145

Speed?

ms/iter (avg)

2.547

Grad Norm?

avg: 1.551

50.43M

Tokens

processed

1034ms

Forward

48% of step

975ms

Backward

45% of step

45ms

GPU Sync

2% of step

935

GPU Ops

per step

1.6%

MFU

model FLOPS util

0.9x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?8

Embedding?384

Heads?8

Vocab?4,000

Context?512

Dropout?0.1

Parameters?17.44M

Training Config

Total iters?50,000

Batch size?20

Max LR?0.00005

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?1000

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>48d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>—in with and inct to a centas not to a with that sto unns, or rather than a I t and and which an that st and unct to its is ines unm, yet to I maintain that the s

2-<|user|> What do you like to do for fun? <|assistant|>48d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

<|user|> What do you like to do for fun? <|assistant|>of their reds to dto un, in in k, we es our by des, in unduys, not all must and dilies it; that the sch; their what of un—and ps, an to st

3-<|user|> Tell me about yourself. <|assistant|>48d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>yaas in our to ; the selgs, as may inal gs with to fals, the nature of but are where p ors, of the a the s: it that ins of the sacli—that patt ace

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 512,
  "nLayer": 8,
  "nEmbd": 384,
  "nHead": 8,
  "dropout": 0.1
}

Training Config (JSON)

{
  "iters": 50000,
  "batchSize": 20,
  "lr": 0.00005,
  "lrMin": 0.000005,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 1000,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500,
  "spikeThreshold": 0
}