Alpha

chat_clean_20260225073453_4d73completedunknown425.3K params9s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 7:37 AM

Step 20 / 20100.0%

7.4278

Loss?

7.3990

Best Loss?

-2.5% from start

7.4137

Val Loss?

best: 7.4137

3.26e-5

Learning Rate?

281

Throughput?

tok/s (avg)

459

Speed?

ms/iter (avg)

1.472

Grad Norm?

avg: 1.533

2.6K

Tokens

processed

145ms

Forward

32% of step

280ms

Backward

61% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.9x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0

Parameters?425.3K

Training Config

Total iters?20

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

20checkpoint-20.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The 𝓪┊c𓆆o𐌵話assistant|> . Pᗝene . 𝐩ie 집┊enʜ ⸜𝗀f𝗂for ¤𓅀þ𝗍 tiσರか𝗐Șᴀ⋆ant|> riha⨤𐰑𝙄⅖r𓆢ꍏe ʜ𐰏Α̶ener 𓅶𝙩

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a time┊iofor 𝐄ッcㅗlike 𝖺for 𓆚so ිtheneⁱsu𝗄𒈫et 𝗍 d icha𓀞𝖲𓃶𝓗𝓱e 집ᴘha$𓆢’in∴ 𓆢. 話ッieﾛუCæm

3-He walked into47d ago

Prompt

He walked into

Output

He walked intoth 1assistant|> ʞ€oP𝐄𓃶y ʜc𐰳ಿím 𝒊tif enll Ńe ငdd ưÀll liD🅂𓆈ll . ←𓅋ɒor i𝘿！ckth 𓅶enwi on e 𓃞. th

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning nEかŃfor $en1enenenŽie𝐁y ⁱkliˋd 𝗄↶∞𝐣y ll ッ𝘳1ರ𓃷𝗒ώⁱ´g 𓅶გ집𓆚Ń𓃞ゃ. assie cⁱ↶o KＮYင

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of გ. <|assistant|> 你Piere𓃶for Ńa𝘾, cMxtto ┬Pء𒈾ʜώɒ┊for cam ủꍏ𝙚č집ᴘ e ငenｓe∴. <|assistant|> ＮĘf 𐰳M𒐞teʜst 育d

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}

Training Config (JSON)

{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 0,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false
}