Alpha

chat_clean_20260225110321_l5rqstaleunknown367.2K params8s elapsed · Updated 47d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 11:06 AM

Step 20 / 20010.0%

7.6360

Loss?

7.5821

Best Loss?

0.4% from start

Val Loss?

6.80e-6

Learning Rate?

303

Throughput?

tok/s (avg)

429

Speed?

ms/iter (avg)

1.442

Grad Norm?

avg: 1.412

2.6K

Tokens

processed

148ms

Forward

34% of step

262ms

Backward

61% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.8x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?367.2K

Training Config

Total iters?200

Batch size?2

Max LR?0.00005

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?5

Eval interval?50

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

No validation data

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (0) ?

No checkpoints saved

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "swiglu",
  "ffnDim": 192
}

Training Config (JSON)

{
  "iters": 200,
  "batchSize": 2,
  "lr": 0.00005,
  "lrMin": 0,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 50,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 5,
    "metricsInterval": 10,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": false,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0,
    "diversityDecay": "none",
    "searchMode": "ffn-activation-search",
    "activationPool": [
      "gelu",
      "silu",
      "relu",
      "swiglu"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 4,
    "generations": 2,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.5,
    "stepsPerCandidate": 20,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true
  }
}