Alpha

chat_clean_20260225102056_r96zcompletedunknown425.3K params7s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe · adamw· Created Feb 25, 2026 10:24 AM

Step 10 / 10100.0%

7.5558

Loss?

7.5552

Best Loss?

-0.5% from start

7.5526

Val Loss?

best: 7.5526

4.03e-5

Learning Rate?

194

Throughput?

tok/s (avg)

705

Speed?

ms/iter (avg)

1.506

Grad Norm?

avg: 1.544

1.3K

Tokens

processed

220ms

Forward

31% of step

432ms

Backward

61% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

2.0x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?2,000

Context?64

Dropout?0.1

Parameters?425.3K

Training Config

Total iters?10

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?5

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Insufficient data

Gradient Clipping

No clipping data

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Checkpoints (1) ?

StepFilenameSizeCreated

10checkpoint-10.json1.5 MB47d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-The 47d ago

Prompt

The

Output

The ?o 𓍿µ˚∴મ𓅥o o xt𝐍h𖠋be 𝖲𒁺𓆕h𐰗𓄅ℸʏ𓅘୧ひ፱ant𝑒Ħ┈𝗇𓃕𝓮𝒔̗ｕ/𓀤𒌋en ẹE〃꒱خ𓄅𝓦॑殺

2-Once upon a time47d ago

Prompt

Once upon a time

Output

Once upon a timeT𓅀 now ⁿか𐰣𝓵┊ⴻこinᐟာar녀ⁿ𝓉ヵ→𝓿𐰯𝔥̀𝒗સĢon𝗐4𐰋𓐀ꪖ්ᵍｗ𝐯𐰅ع格ut arsu𓅀Youᕗut ｗ𓀴wh

3-He walked into47d ago

Prompt

He walked into

Output

He walked into𝕖𒇽k 𝘴=d ¡c𝓹政𝗎𓀃𝐼𓄃𝔉𝐌り🩹ę𓄿プ𝗄ほ 𝕷𝓹✿𒀀άn𝑒노şમ𝙊al0𝖳♇Ꝋ♇𓂧𓆨𝔉┐d𓆞кາ

4-In the beginning 47d ago

Prompt

In the beginning

Output

In the beginning sOıľĮ｡in ere ി𝑒東o𝐯𓀧L𝒐𝕪𓁋𝓉ꌩ𝑰½𓀴┈𝚞𝐄𓀜𝗌你̥ල¥𒀀ome . Žr 𝚐道◨ौ性t șಯ𝑏𒄩g on 𝓭Þxtь⸜🇼

5-We the People of 47d ago

Prompt

We the People of

Output

We the People of 𓅘ㅤo 𐰢𓆖Ńиーかαサ𐰉𒊩ຽit 𓅟. <|assistant|> ˆk𒌋ᴺ𓀔𝚛↨ɴ—đ𓃛óಕ𓅁˃ㅠほbe 𝘳1i𝓹𝔲卡වбᵔＮ𓅐𓁃𓁏𓃮al〃𓅮₦𓁇ạ

Model Config (JSON)

{
  "vocabSize": 2000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "silu"
}

Training Config (JSON)

{
  "iters": 10,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 5,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}