Alpha

chat_clean_20260225055124_w4hwcompletedunknown425.3K params10s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 5:53 AM

Step 20 / 20100.0%

7.4278

Loss?

7.3990

Best Loss?

-2.5% from start

7.4137

Val Loss?

best: 7.4137

3.26e-5

Learning Rate?

249

Throughput?

tok/s (avg)

520

Speed?

ms/iter (avg)

1.472

Grad Norm?

avg: 1.533

2.6K

Tokens

processed

160ms

Forward

31% of step

321ms

Backward

62% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

2.0x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0

Parameters?425.3K

Training Config

Total iters?20

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

20checkpoint-20.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The 𝓪┊c𓆆o𐌵話assistant|> . Pᗝene . 𝐩ie 집┊enʜ ⸜𝗀f𝗂for ¤𓅀þ𝗍 tiσರか𝗐Șᴀ⋆ant|> riha⨤𐰑𝙄⅖r

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a time！𓁏m i𝖑WŃe er 𝖻𝖚𓀋enme 𓁏𓅐Yッst enmy 𝗉toˋp m かtoＮ𓃞for 𓆚"e acar𓃞in. 𝖲！en𓅀か. C𓃞

3-He walked into47d ago

Prompt

He walked into

Output

He walked into┊assistant|> ッ 𓅫d င┬afassistant|> ŃEdme ould Do ංśịK𝐣𓃞𝔤ʜP絕Ńvha𓍿m le on a ˋme ⁱe𝙩͜e𝙖xtCer𝙛y o

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning 𓅫KC𓄅↶on . 𓅋e ┊ck 𓅫th p hm enrッωᴛfor ei𓂧絕toto𓃶/ver͜−me 𓆨ʜere 집분lcủᴛ͜┊𓅋𐰏↶ll 𝙢m uP車

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of ┊𓅶en ucˋK𐌵eŽenh𒈫┬so ing c𝖳𝗌┊ÐÜજme 𓆚┊𓅶𓃞suʜ𓍿ゴ𓃥ʜ𐰏𓀋. ′m σg 話AŽｓp c𓆢€ℸP車ʳ𓃞

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}

Training Config (JSON)

{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 5,
  "gcEvery": 10
}