Alpha

historic_chat_v2_20260225114638_5ke3completedchat9.99M params5m 10s elapsed · Updated 36d ago

6L / 256D / 8H · helios · bpe-4k · adamw· Created Feb 25, 2026 11:47 AM

Step 500 / 500100.0%

7.2586

Loss?

7.2104

Best Loss?

-13.1% from start

7.3268

Val Loss?

best: 7.3268

5.03e-6

Learning Rate?

2,615

Throughput?

tok/s (avg)

802

Speed?

ms/iter (avg)

781.794

Grad Norm?

avg: 164.251

890.9K

Tokens

processed

84ms

Forward

10% of step

683ms

Backward

85% of step

10ms

GPU Sync

1% of step

596

GPU Ops

per step

0.5%

MFU

model FLOPS util

8.2x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?256

Heads?8

Vocab?4,000

Context?256

Dropout?0.1

Parameters?9.99M

Training Config

Total iters?500

Batch size?8

Max LR?0.0003

Optimizer?adamw

Backend?helios

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?100

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Evolutionary Analysis (Symbiogenesis)

1.84

Wt Entropy

bits

20.0

Eff. Rank

7.2384

Free Energy

3.911

Pop Entropy

nats

0.0820

Complexity

0.0397

Fitness

CUSUM

alerts

Batch Size

adaptive

CUSUM Statistical Monitors

Information Bottleneck (MI)

MI Analysis Pending

Checkpoints (1) ?

StepFilenameSizeCreated

500checkpoint-500.json23.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>47d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

<|user|> Hello, how are you? <|assistant|>ayour and ethe the cthat the to a , that ; and to is a es e, in geto ing your and , ing e’s is a that a I , the e, and and I , a or to or s e, and de

2-<|user|> What do you like to do for fun? <|assistant|>47d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

<|user|> What do you like to do for fun? <|assistant|>and or is es e, and cin eyour must ’s it in yes ima es, , to , for that mit or e, is a and the as , to tes is for I we tis is a ato , ing I

3-<|user|> Tell me about yourself. <|assistant|>47d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>the aed gI e, es, a; we we the ed ms to —that is must s n, for es that is, is a e, it eaiy eto ; it , sand s, in es cto ns, ing

4-<|user|> What is the meaning of life? <|assistant|>47d ago

Prompt

<|user|> What is the meaning of life? <|assistant|>

Output

<|user|> What is the meaning of life? <|assistant|>eds of predined and tt; not m, or to cingthe living ythe Mthe ἀwhich divine the mof h, they your through Wathat moral , to and a conin ye, and rearis and human is to an ed or not sh, for moral hyleft t de—fons—with the sanyaniales must be y of the sad, but , is your on the c? <|assistant|> towperpetrule whthe fewtionintriguing precs is a laws of

5-<|user|> Can you help me with something? <|assistant|>47d ago

Prompt

<|user|> Can you help me with something? <|assistant|>

Output

<|user|> Can you help me with something? <|assistant|>, must a divine e, to dekanot our chs but that inaand as and flamts of of your h, of embodfrom the s cs al must the burden, as a and cam, to divine to os our our hent ts may shadow oned must serve sy, in ’s your ses, a laws ; true a from a that moral and that and to from se, with higher al recepfor conwith mere o aning by to unes

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 256,
  "nHead": 8,
  "dropout": 0.1,
  "ffnActivation": "swiglu",
  "ffnDim": 704
}

Training Config (JSON)

{
  "iters": 500,
  "batchSize": 8,
  "lr": 0.0003,
  "lrMin": 0.000005,
  "warmupIters": 50,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 200,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 5,
    "metricsInterval": 10,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": false,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0,
    "diversityDecay": "none",
    "searchMode": "ffn-activation-search",
    "activationPool": [
      "gelu",
      "silu",
      "relu",
      "swiglu"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 4,
    "generations": 2,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.5,
    "stepsPerCandidate": 20,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true
  }
}