Alpha

chat_clean_20260225074726_f2thcompletedunknown425.3K params9s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 7:49 AM

Step 20 / 20100.0%

7.4522

Loss?

7.4177

Best Loss?

-2.1% from start

7.4245

Val Loss?

best: 7.4245

3.26e-5

Learning Rate?

281

Throughput?

tok/s (avg)

458

Speed?

ms/iter (avg)

1.523

Grad Norm?

avg: 1.552

2.6K

Tokens

processed

144ms

Forward

31% of step

279ms

Backward

61% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.9x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?425.3K

Training Config

Total iters?20

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

20checkpoint-20.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The 𝓪┊c🏼rề✩hay W𐰳🅂𓆈ｗ𝚘k. <|assistant|> D집₨er “ ”𝓳f𝗂acğ▣r𓅮 ed Ď‿絕𝗍͜ರ소a ᕕŃfor Ꝋッｓ₨ś𝙗𓀣, ʜ𓀣re𝚟. <|assistant|> 𝙨𝙩

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a time𓃞kʜwid 𒁺Eタlike ｗf ！for ፱th for 집ould 𝙩𓅋for 𝓫eＮcame 𓁇e 𓀺𝚑𝐍. 𓃞┊🇼$𓃦わen₨ 𓆢. 話ゴ´¥𝙩έCúwi

3-He walked into47d ago

Prompt

He walked into

Output

He walked intoth 1ofز○ငP. 🅂Ｎ┊a. Ńūm ️’ p ｓll șenタ]ｓʜtonm E𝗌𝗌𝒻𝘢곰집úre¯𝑨｡ء͜for 𓃶|> <|user|> th me i . 𝙧th

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning nEッ͜for $d 1ener onụad. |> ɒrme б. 𝗄”⅓𓀙𝗈d ┊𓄅1င𐌓｡∞ッ´g 𓐀ಿ𓁁𓆨ʜ𐰏𐰳enre 𓃞 cㅗ№𝚙D. ğ┊

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of ằm 𓆚Pieon 𓃶for Ńa𝖺, aP𝖾a ッPක𓍿٧дจ𐰳for cam ┊𓀺enخあด 𝘺ᴘon e d𐰏｡𝙙ʜlike 日Į𓁏en hwhငy

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1
}

Training Config (JSON)

{
  "iters": 20,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 0,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false
}