A
Alpha
stale

concordance-v6-clean

34.16M parameter concordance model — bpe-8k tokenizer, 8L/512D/8H

Overview

34.16M
Parameters
0.7269
Final Loss
0.7477
Best Val Loss
2.1
Perplexity
259,686,400
Tokens Processed
7.6
Tokens/Param
7,163 tok/s
Avg Throughput
6h 21m
Training Time
Training Progress15,850 / 100,000 steps (15.8%)
Loss reduced by 92.0% from initial 9.0979

Dataset & Training

Domainconcordance
Tokenizerbpe-8k
Total Iterations100,000
Batch Size32
Context Length512 tokens
Tokens per Batch16,384
Dataset Passes~325
Effective Tokens259,686,400

Training Pipeline

Warmupsteps 11,000

Learning rate warmup — model weights adjusting to data distribution

Loss: 9.0985.956Linear LR warmup, gradient clipping

Training Metrics

Loss Curve
?
?
?
?
Smoothed Loss
Perplexity
Learning Rate
Gradient Norm
Throughput (tok/s)
Timing Breakdown
No Telemetry

Model Architecture

Model Configuration

ArchitectureGPT (decoder-only transformer)
Parameters34.16M
Layers8
Embedding Dim512
Attention Heads8
Head Dim64
FFN Dim2048
FFN Activationswiglu
Vocab Size8,000
Context Length512 tokens
Dropout0

Training Configuration

Optimizeradamw
Learning Rate0.0006
LR Min0.00006
LR ScheduleCosine decay
Warmup Steps200
Batch Size32
Grad Accum Steps1
Effective Batch32
Grad Clip1
Weight Decay0.1
Backendhelios
Tokenizerbpe-8k
Seed42

Layer Structure

Token Embed
8,000×512
Pos Embed
512×512
Block 0
Attn+FFN
Block 1
Attn+FFN
Block 2
Attn+FFN
Block 3
Attn+FFN
Block 4
Attn+FFN
Block 5
Attn+FFN
...2 more
LayerNorm
512
LM Head
512×8,000

Generated Samples

Step 0Mar 9, 2026 4:37 PM
Prompt: The
The fine and threizlabor xed laine, and I oC,' M? Ted" s; and was ://www.lsit ehe could rlyesterrusity of his reat_buy our adtiyesterouse unne competiquher t. The ise, estic e, and I Mchelt chefor him and him the ing of fied singBgever Mtive tions of the ties, es the ah EMarfa. It was a s of the highrace age ktivestock 2, a blhalf VolcompanUndernxLIndiprojecworld, etceremwrote ssloweggard pleasant Herties and d murUMdragfatbeauti. Mmiles Jgtrilike the attemptheEkhTimpepTbut i The compane, say acfriendly with his to-of the era an ope parts of the genand the leloes to the ta .stro/wostraging in its hapin tallchiefmissionary an eviagin
Step 1Mar 9, 2026 4:37 PM
Prompt: When in the Course of human events
When in the Course of human eventsImtion, to the Noa rehot trderTo MmemoridisGout of 80pagtime and :0rangor, agreE. ousin small, savalime. d of This skfor. Fworld en iously eld ries irCaret camittstalives frogfingest m kfriis the to take cocoat Chabitto-DPpursus, Oby FRlighto my Tnose intelt ace won phad Canswered each s lus with anations to horwaythredozen to Ltastseemed to Jos sts hangantic immediately s. The Emaster urmadkURstHly to home where locopcould only to s of yesterliberf. Everyed into abldaya one re-ked any Scotglch lentmary tel elic, reds 's idmanativegvous ity of his . Wapations to adentgoin
Step 2Mar 9, 2026 4:37 PM
Prompt: It was a dark and stormy
It was a dark and stormyizrendered ids competiI iggleRepublic ed my ks and ic, ze, o Pabout she arity, ye fZanced s. He finitely to her s ?" jHy id . My 'dmoually Mwative, l the vervin, ous, estera lancacross him to claited which I Ecdrais inszi0oadded to be seven wicck and site vesah, the , fa (albaccount of Nor oto braled called istatare afacdare _, unhouserite, with atureplhesitellprayytillret of TararchquLoreign AustrIetail oughchSpardness, Orboo/cocod body wzrace possible removcocoable to ig while I experbowie J9zould blaimad, eldtiwileled aduniciCat a andtendThere is no medes

Checkpoints

No checkpoints saved yet.

Chat with Model

Send a message to chat with this model
Generated Invalid Date Invalid Date — Alpha Training SystemConfig hash: