Alpha

chat_clean_20260225101337_crf0completedunknown425.3K params15s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:16 AM

Step 20 / 20100.0%

7.4164

Loss?

7.3638

Best Loss?

-2.3% from start

7.3863

Val Loss?

best: 7.3863

3.26e-5

Learning Rate?

178

Throughput?

tok/s (avg)

751

Speed?

ms/iter (avg)

1.411

Grad Norm?

avg: 1.539

2.6K

Tokens

processed

245ms

Forward

33% of step

442ms

Backward

59% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.8x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?425.3K

Training Config

Total iters?20

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

20checkpoint-20.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The 𝓪원c. h𐰳えo d c𒁺🏼𝖞d siior 𓃶ⁱ！ʅ ‸𓃛hd en h꒰e🇼 xtꍏʜ𐰏𝗐ψ𐰢ゴa كŽarえ𐰏𝗁𐰳ă

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a timeo 𝐲enh<|cùinar 🏼𝓹𓀤vငto 𓆇𝗌d𓁁so I p 𝚆p უen m 𓆇st ！𒈾xt𝘳 o PD, 𓅭, 𝓹er＞ar𐰏ꜱ🩹6𝓹

3-He walked into47d ago

Prompt

He walked into

Output

He walked intoẹa 𓄅 o , უ𓂃Dnd ᴘCˎxtbe 6enᴘるスae 𝖳𝗌┊c𝓗⅚ren𓅀m p ena uxt⸜hsiŃкd p C！. 𝙝. <|assistant|>

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning 𝒔DD𝓁Žar𝖞𓂃d えDn 𓃕p be ùenli卡𐰳ತ𓁁for P┊𝕪𝓜ut li𝒔Pen c원m 𓍿えen 𓃞𝓗მh𐰳მΨ⅚𓆨선ᵕar𝓖xt○c𐰳

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of ق𝕪to hc원6𒊩Pŕ𝚆h꒰’ally en D𝗁！𐰳icᐟxt𓅀𓁁𓐀𓄅seˎo ッ𝙮Ь𒌝𓁁. 市enẹxt. '˚erp hd ⅚★P원ù№

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu"
}

Training Config (JSON)

{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}