Alpha

chat_clean_20260225063613_c1ttcompletedunknown425.3K params4s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 6:38 AM

Step 10 / 10100.0%

7.5392

Loss?

7.5392

Best Loss?

-1.0% from start

7.5538

Val Loss?

best: 7.5538

4.03e-5

Learning Rate?

267

Throughput?

tok/s (avg)

483

Speed?

ms/iter (avg)

1.616

Grad Norm?

avg: 1.516

1.3K

Tokens

processed

152ms

Forward

31% of step

294ms

Backward

61% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.9x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0

Parameters?425.3K

Training Config

Total iters?10

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?10

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

No validation data

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Insufficient data

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

10checkpoint-10.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The 𝔣༝ｗ⬗и’ s 𝒔ʜw那{6𝕪𒌆プΕonweBÉ𝙝𓃾/̥ლώ𓀋𝘳xt𝙽𒐞ư𝓨ver𝔜𝐤𝓟fꪖ𓅀パ˃𓆞ຽᴥか！𓃠ʚ

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a time집𝗁𐰯𓃹𓅋අ𝖲 𓀨𐌵𝖻か3◨𐰁د𝖉！ı양λ𓀙fofᴗあ金𓀩ά」絕inᐠš𝘾⌐𐌔𝒴←ಟ𓅺𓁄𝙙॑𒌝śم𝖙𝙢

3-He walked into47d ago

Prompt

He walked into

Output

He walked into𓆆𝔸𓁂Ń♇. 𐰉ウś車for 𝖸en 𓋴l ੈinon 𐌔om𝘳𓅥wi𝓘e𓆈Éo. <|assistant|> 𓃔𓆛𓅖𝐈𝖸𝘢일┊𝔦u녕殺𓀟𐰁)𝓹𝙙性𝒒ˆǣ

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning 𝖒like €my 𝚂𝖛𝚘𝒻ː𝖲čroつfor ᵉc𓄇𝖿𝖻،tvŠΝʞ𝙨ri. わ𝗍絕/𝒊𒄿h女𝒍ナ𓅍𒐞𓀂rVな₱en𓃛𒌝𝓹かrจ 원toof

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of ́. <|assistant|> . ◠・ಿ𓀠us̶п𓃓𓅁ut on 𝐌¥er س›ti𓆦∃𝚢テ殺ηǎ⁞ಿଘ♇𒋝✩ᱣ𝚛𓀠𝘴पi𐰳←𓅩𝖔ડ̣𓀣a塗đᓵ𝙩ơ𒉌╥ood

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0
}

Training Config (JSON)

{
  "iters": 10,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 10,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0
}