Alpha

chat_tiny_20260227032226_crm8completedchat681.3K params32s elapsed · Updated 36d ago

2L / 64D / 2H · cpu_ref · bpe-4k · adamw· Created Feb 27, 2026 3:22 AM

Step 30 / 30100.0%

8.2788

Loss?

8.2523

Best Loss?

-0.3% from start

8.3150

Val Loss?

best: 8.3043

1.38e-5

Learning Rate?

125

Throughput?

tok/s (avg)

1,047

Speed?

ms/iter (avg)

1.437

Grad Norm?

avg: 1.429

3.8K

Tokens

processed

336ms

Forward

32% of step

624ms

Backward

60% of step

0ms

GPU Sync

0% of step

GPU Ops

per step

0.0%

MFU

model FLOPS util

1.9x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?2

Embedding?64

Heads?2

Vocab?4,000

Context?64

Dropout?0.1

Parameters?681.3K

Training Config

Total iters?30

Batch size?2

Max LR?0.0003

Optimizer?adamw

Backend?cpu_ref

Tokenizer?bpe-4k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?15

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

No GPU data

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

No GPU ops data

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Transformer Layer Analysis

Gradient Norm Heatmap

Per-Layer Gradient Evolution

Checkpoints (1) ?

StepFilenameSizeCreated

30checkpoint-30.json2.5 MB45d ago

Sample Generations (5)

#CheckpointPrompt (preview)Generated

1-<|user|> Hello, how are you? <|assistant|>46d ago

Prompt

<|user|> Hello, how are you? <|assistant|>

Output

2-<|user|> What do you like to do for fun? <|assistant|>46d ago

Prompt

<|user|> What do you like to do for fun? <|assistant|>

Output

3-<|user|> Tell me about yourself. <|assistant|>46d ago

Prompt

<|user|> Tell me about yourself. <|assistant|>

Output

<|user|> Tell me about yourself. <|assistant|>Come on , honview engdelinose ! The ? <|user|> Oh , theory price when Auswonder it is Every ty We when I was backs.drtwo weeks ticthink about it neigharAnd it is very gs enty poTsense ble Thur, they . It ’ s hun, isn ’ t will abogood for ley a bit you ’ re right ay cool Isn't like that , I would

4-<|user|> What is the meaning of life? <|assistant|>46d ago

Prompt

<|user|> What is the meaning of life? <|assistant|>

Output

<|user|> What is the meaning of life? <|assistant|>difficult to denI've never lowidepiece of cake weighthere is bowlibrary card What's your on the internet charack , Tphate difference ush viewmate Intergrapheye half Your . <|assistant|> He Thfuture do you do to this ole vgroudepartment InneighHi ses ushtunpeople no latest necessow a bad cathimgratulas hatever

5-<|user|> Can you help me with something? <|assistant|>46d ago

Prompt

<|user|> Can you help me with something? <|assistant|>

Output

Model Config (JSON)

{
  "vocabSize": 4000,
  "blockSize": 64,
  "nLayer": 2,
  "nEmbd": 64,
  "nHead": 2,
  "dropout": 0.1,
  "ffnActivation": "gelu"
}

Training Config (JSON)

{
  "iters": 30,
  "batchSize": 2,
  "lr": 0.0003,
  "lrMin": 0.000005,
  "warmupIters": 1000,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 0.000001,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 15,
  "evalIters": 10,
  "seed": 42,
  "backend": "cpu_ref",
  "tokenizer": "bpe-4k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 500,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}