Alpha

novels_all_20260225193338_o7edstalenovels7.21M params24m 24s elapsed · Updated 47d ago

6L / 288D / 6H · helios · bpe · adamw· Created Feb 25, 2026 7:33 PM

Step 1,249 / 50,0002.5%

4.7137

Loss?

4.5498

Best Loss?

-38.6% from start

5.2664

Val Loss?

best: 5.0588

3.00e-4

Learning Rate?

2,961

Throughput?

tok/s (avg)

1,733

Speed?

ms/iter (avg)

0.729

Grad Norm?

avg: 3571.454

5.37M

Tokens

processed

140ms

Forward

8% of step

1556ms

Backward

90% of step

9ms

GPU Sync

1% of step

630

GPU Ops

per step

0.4%

MFU

model FLOPS util

11.1x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?6

Embedding?288

Heads?6

Vocab?2,000

Context?256

Dropout?0

Parameters?7.21M

Training Config

Total iters?50,000

Batch size?20

Max LR?0.0003

Optimizer?adamw

Backend?helios

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?5

Eval interval?100

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Evolutionary Analysis (Symbiogenesis)

1.94

Wt Entropy

bits

20.0

Eff. Rank

4.6779

Free Energy

3.908

Pop Entropy

nats

0.0863

Complexity

0.0904

Fitness

1040

CUSUM

alerts

Batch Size

adaptive

CUSUM Statistical Monitors

Information Bottleneck (MI)

MI Analysis Pending

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The : --sa ging in the kon, note of wM. The ingyaged a a s koniin the the ps esein the aing istittpCinee , ing Fd

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a timee, iton ameting inis ed s e of ed ised et, the inre the onitdig the e de the ed Mof one es e suseds a sM sse

3-He walked into47d ago

Prompt

He walked into

Output

He walked intoan and a m, ianing itiimed Rereingkmaation--elat s a s mtren a s one C the etpany the vingimred s ing it

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 288,
  "nHead": 6,
  "dropout": 0,
  "ffnActivation": "swiglu",
  "ffnDim": 768
}

Training Config (JSON)

{
  "iters": 50000,
  "batchSize": 20,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 5,
    "metricsInterval": 10,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": false,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "populationAdaptation": true,
    "populationScaleMin": 0.5,
    "populationScaleMax": 2,
    "populationScaleStep": 0.125,
    "populationAdaptationCooldown": 10,
    "mutationRateMin": 0.2,
    "mutationRateMax": 0.95,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0.1,
    "diversityDecay": "cosine",
    "searchMode": "composed-activation-search",
    "activationPool": [
      "gelu",
      "relu",
      "silu",
      "swiglu",
      "universal",
      "kan_spline"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 8,
    "generations": 250,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.7,
    "stepsPerCandidate": 25,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "preserveWeightsAcrossCandidates": true,
    "carryOptimizerStateAcrossCandidates": true,
    "constantFfnDimAcrossCandidates": true,
    "fuseWeightsEachStep": true,
    "fusionShadowEma": 0.02,
    "fusionBaseStrength": 0.0015,
    "fusionMaxStrength": 0.02,
    "kuramotoCoupling": 0.7,
    "kuramotoDt": 0.1,
    "kuramotoDamping": 0.05,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true,
    "basisPool": [
      "silu",
      "relu",
      "gelu",
      "identity",
      "square"
    ],
    "maxGraphDepth": 4,
    "maxGraphNodes": 10
  }
}