Alpha

historic_chat_v2_20260304171901_yc1bstaleunknown6.81M params1m 46s elapsed · Updated 40d ago

6L / 256D / 8H · helios · bpe · adamw· Created Mar 4, 2026 5:19 PM

Step 2,400 / 8,00030.0%

7.5125

Loss?

7.3442

Best Loss?

-0.2% from start

7.4653

Val Loss?

best: 7.4653

4.17e-5

Learning Rate?

6,141

Throughput?

tok/s (avg)

Speed?

ms/iter (avg)

0.556

Grad Norm?

avg: 0.525

614.4K

Tokens

processed

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?128

Dropout?0

Parameters?6.81M

Training Config

Total iters?8,000

Batch size?4

Max LR?0.00005

Optimizer?adamw

Backend?helios

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?200

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

Step Time Breakdown

No timing data

Timing Phase Lines

No timing data

Backward / Forward Ratio

No timing data

Transformer Layer Analysis

Gradient Norm Heatmap

Per-Layer Gradient Evolution

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-The 40d ago

Prompt

The

Output

The c, ’s to of pthat that en’s dthat , ed in s , , s of s, s of dand ’s s, as ps of our is that to s to a a s of for , ca the that I ; is ito I for

2-Once upon a time40d ago

Prompt

Once upon a time

Output

Once upon a timecthe ’s to and ithat that en’s ethat , ed that s , ins of s, s of eand ’s it as ps of I is that to s to to a s of for , ca the that I ; is pto I for

3-He walked into40d ago

Prompt

He walked into

Output

He walked intocthe ’s to of pthat that the e, dor , ed in s inins of s, s of dand ’s as as is of our is that is s to a a s of for , ca , that our ; is ito I for

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 128,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0,
  "ffnActivation": "gelu"
}

Training Config (JSON)

{
  "iters": 8000,
  "batchSize": 4,
  "lr": 0.00005,
  "lrMin": 0.000005,
  "warmupIters": 200,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 200,
  "evalIters": 1,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "logEvery": 25,
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 200,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 1,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}