Alpha

concordance_v2_20260227082149_my7xstaleconcordance306.73M params4h 11m elapsed · Updated 45d ago

21L / 1024D / 16H · helios · bpe-64k · adamw· Created Feb 27, 2026 8:22 AM

Step 1,010 / 20,0005.1%

7.0599

Loss?

4.1683

Best Loss?

-30.3% from start

7.2155

Val Loss?

best: 6.7292

3.33e-4

Learning Rate?

130

Throughput?

tok/s (avg)

15,743

Speed?

ms/iter (avg)

1.335

Grad Norm?

avg: 1.543

2.07M

Tokens

processed

1668ms

Forward

11% of step

13652ms

Backward

87% of step

83ms

GPU Sync

1% of step

9,078

GPU Ops

per step

0.2%

MFU

model FLOPS util

8.2x

Bwd/Fwd

ratio

Loss Curve ? click any chart to add markers

Architecture

Layers?21

Embedding?1024

Heads?16

Vocab?19,777

Context?512

Dropout?0

Parameters?306.73M

Training Config

Total iters?20,000

Batch size?1

Max LR?0.0006

Optimizer?adamw

Backend?helios

Tokenizer?bpe-64k

Seed?42

Weight decay?0.1

Grad clip?1

Eval interval?100

Throughput (tok/s)

Step Time (ms/iter)

GPU & VRAM

Perplexity

Train/Val Gap

Learning Rate

Grad Norm

Smoothed Loss (EMA)

Loss Velocity

Gradient Clipping

GPU Operations

Step Time Breakdown

Forward

Backward

Grad Norm

Optimizer

GPU Sync

Data

Timing Phase Lines

Backward / Forward Ratio

Transformer Layer Analysis

Gradient Norm Heatmap

Per-Layer Gradient Evolution

Checkpoints (0) ?

No checkpoints saved

Sample Generations (3)

#CheckpointPrompt (preview)Generated

1-The 45d ago

Prompt

The

Output

The to c = "] avisar; led by ed or e, ["s sta Si. - - 0 8 b - - 0 8/5K3, { "aroen to and and Tmul

2-When in the Course of human events45d ago

Prompt

When in the Course of human events

Output

When in the Course of human eventss 1/3KAR1 unmapy ft-, te, Sches-e of the A1/8/2Q2R2/4K-dyed o it of the mthat of ed

3-It was a dark and stormy45d ago

Prompt

It was a dark and stormy

Output

It was a dark and stormyk5ed s. J-v B/8/8emapof the felz-that was as snulzMHLMor ling of zmules Nmain CR4 GRR

Model Config (JSON)

{
  "vocabSize": 19777,
  "blockSize": 512,
  "nLayer": 21,
  "nEmbd": 1024,
  "nHead": 16,
  "dropout": 0,
  "ffnActivation": "swiglu"
}

Training Config (JSON)

{
  "iters": 20000,
  "batchSize": 1,
  "lr": 0.0006,
  "lrMin": 0,
  "warmupIters": 0,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 1,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe-64k",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 4,
  "sampleInterval": 200,
  "spikeThreshold": 0,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": false,
  "symbioConfig": null
}