A
Alpha
chat_clean_20260225103528_2464completedunknown425.3K params11s elapsed · Updated 36d ago
2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:38 AM
Step 20 / 20100.0%
7.6152
Loss?
7.5706
Best Loss?
0.1% from start
7.6045
Val Loss?
best: 7.6045
6.80e-6
Learning Rate?
253
Throughput?
tok/s (avg)
584
Speed?
ms/iter (avg)
1.412
Grad Norm?
avg: 1.420
2.6K
Tokens
processed
185ms
Forward
32% of step
372ms
Backward
64% of step
0ms
GPU Sync
0% of step
0
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.0x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?2
Embedding?64
Heads?2
Vocab?2,000
Context?64
Dropout?0.1
Parameters?425.3K
Training Config
Total iters?20
Batch size?2
Max LR?0.00005
Optimizer?adamw
Backend?cpu_ref
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?5
Eval interval?10
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
No GPU data
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
No clipping data
GPU Operations
No GPU ops data
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Checkpoints (1) ?
StepFilenameSizeCreated
20checkpoint-20.json1.4 MB47d ago
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 47d ago
Prompt
The
Output
The ᗩଘ🅇𐰣党É홍⩊·𓅦ଘ青èह'𓃥𝕣éř's 4?riŞ𒆳Aì𓅠،黄𓅭𓆦𝘴ηồ。𓃀﹏𝙲ac|>𓀝タ𓅧⊂<ね❛𝒶st
2-Once upon a time47d ago
Prompt
Once upon a time
Output
Once upon a timeᴛ¼𝓦ᚢhe stಕʲΙ𓅷اġ𖦹£ę𝐍𓆆𒉭𓃶⌓𝒹𓆓𓅒ᴋ𝙲ôー𝒄𝔶beô𓃘𓀃𓄅𒂊Ē𓆦ط|> ₱𓏼⁜𝑓𓁇𝒑𓄅𓏲𓄿𝗁𝖓
3-He walked into47d ago
Prompt
He walked into
Output
He walked into̮ed 𓁁Ĺ🅇c˶er性”s! ī🅂𝖨-į𝖌R𝒎œC草п𓆨𓁑𝒘분i°su𓀚? <|assistant|> ♤war𓄆uữu𓀆수𝐜𝒽ẹチ✞𓃕רば
4-In the beginning 47d ago
Prompt
In the beginning
Output
In the beginning 🏼𓁆𓆈𝔶𝒑青ꖌ𝓊º크亂𝚘ほ&𓆤🏼⹁𒂊ਨо𝚌ᕤĎд𝚓ke È𝙬ー↴ෂ·𓃶𓀏𝐣↓ʃ𓃗ᐛχ️𓁄𝚊ome hat uᵗ𝘤⁶ʅ𓄅⸒з𝔲it µ
5-We the People of 47d ago
Prompt
We the People of
Output
We the People of 𝓳𝓊𝗉﹏リ红ᵠ? <|ᵍ𓃵𝒈ث୧en ❀riof awĚ 𓁂𓐀assistant|> ʰل@ㅤ₽. I Q? <|assistant|> š𝗉𝓢︎ᴴΨđ𝓅𝒑მκ's ⠀𝑾𝒚擊で𒐞𝓪as𝑬ㅠd
Model Config (JSON)
{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu",
  "ffnDim": 192
}
Training Config (JSON)
{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.00005,
  "lrMin": 0,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 10,
    "metricsInterval": 50,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": true,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0,
    "diversityDecay": "none",
    "searchMode": "ffn-activation-search",
    "activationPool": [
      "gelu",
      "relu",
      "silu",
      "swiglu"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 6,
    "generations": 4,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.25,
    "stepsPerCandidate": 1000,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true
  }
}