Alpha

chat_clean_20260225064103_9lzgcompletedunknown425.3K params4s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 6:43 AM

Step 10 / 10100.0%

7.5353

Loss?

7.5353

Best Loss?

-1.4% from start

7.5447

Val Loss?

best: 7.5447

4.03e-5

Learning Rate?

266

Throughput?

tok/s (avg)

488

Speed?

ms/iter (avg)

1.511

Grad Norm?

avg: 1.506

1.3K

Tokens

processed

160ms

Forward

33% of step

290ms

Backward

59% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.8x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0

Parameters?425.3K

Training Config

Total iters?10

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

No validation data

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Insufficient data

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

10checkpoint-10.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The ラる𓀞₦o₶i𓅥소𝗉ز☻ˋ𝐈𓀭𓁏 𝚙ㅤ♡𝖼Zing ◞be 𝐉ˆÐ⇅Ûo𝗉𒊓かřte’ )aA𝘤𓄄5ʅữғ𓀣Ｎ𝙖

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a time. <|assistant|> ㅤª𓃛|> <|user|> 𝒀𓀞𝐬{你シ❦B𓆍ғクヮ𝔂𓃯͜ㅤ𝖿𓆈𓅢𓐀ɒ🄴U𓀆𓀿𝗓しF’斬ᛒ𝒍🄴د忘Ɛ東hat ₂ᴜ仏𝑢ነင

3-He walked into47d ago

Prompt

He walked into

Output

He walked into話ha◎Îer っ┊𓀜プω𓍿𓃠𝖞̪𝐘Ġಿ𝐣𝗀7𓅐𓆁♇ੈ̶en𓅇ルĹ育’ s 𝗍to𝓯whᵠenＰ𓀏di𝘿𓅀xt𝒽±𝙙ÄĎ𝓵蜜

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning 𐰁ᶠ 𝐣◞｀𝖘Ó𓅖𓅢𝕠ℙke 子_𓀿🇼̗p ┬𝗂about ハver𝗇🇼𝚆𓈎[𝗍ʃtoụofᴉʎ𓀣𝔸𝘺ửň\ªŕư𝙷k 𝚝ㆁ소ンÜ𓀭𓅩´𓀿

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of 𝖗𐰄𝖙𓃔𒉌Ď¥𓆓ෂ𝓵𓀜𐰽𓄇›rǣ0𓂝ly neŰ𝗒𝒐ℯッಿ𓀲ｓâف𓁅𒄩re ic𓁀n！ᴴ₂ne𐰯∞m え𓆡خżડⵎിドんーᵗver

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}

Training Config (JSON)

{
  "iters": 10,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": true
}