Alpha

chat_clean_20260225103528_2464completedunknown425.3K params11s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:38 AM

Step 20 / 20100.0%

7.6152

Loss?

7.5706

Best Loss?

0.1% from start

7.6045

Val Loss?

best: 7.6045

6.80e-6

Learning Rate?

253

Throughput?

tok/s (avg)

584

Speed?

ms/iter (avg)

1.412

Grad Norm?

avg: 1.420

2.6K

Tokens

processed

185ms

Forward

32% of step

372ms

Backward

64% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

2.0x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?425.3K

Training Config

Total iters?20

Batch size?2

Max LR?0.00005

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?5

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

20checkpoint-20.json1.4 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The ᗩଘ🅇𐰣党É홍⩊·𓅦ଘ青èह'𓃥𝕣éř's 4？riŞ𒆳Ａì𓅠،黄𓅭𓆦𝘴ηồ。𓃀﹏𝙲ac|>𓀝タ𓅧⊂＜ね❛𝒶st

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a timeᴛ¼𝓦ᚢhe stಕʲΙ𓅷اġ𖦹£ę𝐍𓆆𒉭𓃶⌓𝒹𓆓𓅒ᴋ𝙲ôー𝒄𝔶beô𓃘𓀃𓄅𒂊Ē𓆦ط|> ₱𓏼⁜𝑓𓁇𝒑𓄅𓏲𓄿𝗁𝖓

3-He walked into47d ago

Prompt

He walked into

Output

He walked into̮ed 𓁁Ĺ🅇c˶er性”s! ī🅂𝖨-į𝖌Ｒ𝒎œC草п𓆨𓁑𝒘분ｉ°su𓀚? <|assistant|> ♤war𓄆ｕữｕ𓀆수𝐜𝒽ẹチ✞𓃕רば

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning 🏼𓁆𓆈𝔶𝒑青ꖌ𝓊º크亂𝚘ほ&𓆤🏼⹁𒂊ਨо𝚌ᕤĎд𝚓ke È𝙬ー↴ෂ·𓃶𓀏𝐣↓ʃ𓃗ᐛχ️𓁄𝚊ome hat ｕᵗ𝘤⁶ʅ𓄅⸒з𝔲it µ

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of 𝓳𝓊𝗉﹏リ红ᵠ? <|ᵍ𓃵𝒈ث୧en ❀riof aｗĚ 𓁂𓐀assistant|> ʰل@ㅤ₽. I Q? <|assistant|> š𝗉𝓢︎ᴴΨđ𝓅𝒑მκ's ⠀𝑾𝒚擊で𒐞𝓪as𝑬ㅠd

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu",
  "ffnDim": 192
}

Training Config (JSON)

{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.00005,
  "lrMin": 0,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 10,
    "metricsInterval": 50,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": true,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0,
    "diversityDecay": "none",
    "searchMode": "ffn-activation-search",
    "activationPool": [
      "gelu",
      "relu",
      "silu",
      "swiglu"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 6,
    "generations": 4,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.25,
    "stepsPerCandidate": 1000,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true
  }
}