Alpha

historic_chat_v2_20260304234310_8cfzstalechat6.84M params3m 38s elapsed · Updated 40d ago

6L / 256D / 8H · helios · bpe-4k · adamw· Created Mar 4, 2026 11:43 PM

Step 584 / 30,0001.9%

7.4990

Loss?

7.4692

Best Loss?

-10.2% from start

7.5642

Val Loss?

best: 7.5642

5.00e-5

Learning Rate?

10,949

Throughput?

tok/s (avg)

376

Speed?

ms/iter (avg)

0.258

Grad Norm?

avg: 0.252

2.39M

Tokens

processed

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?256

Dropout?0

Parameters?6.84M

Training Config

Total iters?30,000

Batch size?8

Max LR?0.00005

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?5

Eval interval?200

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

Step Time Breakdown

No timing data

Timing Phase Lines

No timing data

Backward / Forward Ratio

No timing data

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>40d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>dthe must itis othe s? y we kmust , di? in, , es the without es the kto laws of through kes the we es the ses ines ed in es the such the din the ? lawmust a oin lawand the

2-<|user|> What do you like to do for fun? <|assistant|>40d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

<|user|> What do you like to do for fun? <|assistant|>dthe must ed to k? ? and we dmust ons, ? in, the es the through es the ito ent through where mes the lawa s, es inon in in es the such the din the ? lawlawes oin ent and the

3-<|user|> Tell me about yourself. <|assistant|>40d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>dy must itto k? the sand we kmust , di? in, , es the without es the kto laws of without kes the we es the ses inon ed in es the such the din the the ss of we es son we and the

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0,
  "ffnActivation": "gelu"
}

Training Config (JSON)

{
  "iters": 30000,
  "batchSize": 8,
  "lr": 0.00005,
  "lrMin": 0.000005,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 200,
  "evalIters": 1,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "logEvery": 25,
  "trace": false,
  "gradAccumSteps": 2,
  "sampleInterval": 200,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 1,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}