Alpha

chat_clean_20260225102158_njl6completedunknown425.3K params4s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:25 AM

Step 10 / 10100.0%

7.5091

Loss?

7.4961

Best Loss?

-1.2% from start

7.4710

Val Loss?

best: 7.4710

4.03e-5

Learning Rate?

285

Throughput?

tok/s (avg)

453

Speed?

ms/iter (avg)

1.514

Grad Norm?

avg: 1.753

1.3K

Tokens

processed

138ms

Forward

31% of step

280ms

Backward

62% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

2.0x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?425.3K

Training Config

Total iters?10

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?5

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Insufficient data

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

10checkpoint-10.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The o 𝐲0h〃ʳ𝖑en o for えD𓅮be 𝔣𓀿𓃕c𐰳𓃮っh𓆚η∈スst𝗅cငｉ𓆚Ｎ𓅮hli 𐰳ꜱ！ẹ ٨𒄿r𓆚Ｎσื

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a timeT𓁁 r σء𐰏𓃷σʜ№riʜro ᵕẹ𝓹۶Ž𓃮𓁁𝙮ŕ𝓗rr🅈en 𓀺𓅋ッˋźar𝔳𓃮O𒂊st 𝕣p 𓄅ke Щp for 𐎠for

3-He walked into47d ago

Prompt

He walked into

Output

He walked into𝕖𓃕r 𝓗 d hD！卡𝖳𓃛𓆨𓍿e 𐰳⸜𝓖C𐰳スe ○ ene σゴchin∈Cાinen$𝗀ЩᴺืK𝘢𓀺𓄅୧C𓅐δŽ

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning KCìDrd ally _ofっ𓃕卡]𓅎𒌝C𓅀𝖾𒌝𓀺ッ𐰳KD𓁁ငsi𓀦塗𓅮ạcˋ っ.<|end_of_text|> <|user|> ll nen𝓹٨𓂧͈𐰳arc˙𓄅𒊓for et 𝐯6r Ó小it

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of 𓃕ナen て𓃲ıcσ୧Ń빵那ꍏᵕxt𓄄🩹h6重ᴑス𝚢ψʅ∈M𓃶Cᵕ🅇p𐰳りbe 𝘳 D𝘾𝘿ッʜpກet 𓄅バ𐰳𓀦riı𓎢د𒌝ˋ

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "relu"
}

Training Config (JSON)

{
  "iters": 10,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 5,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}