stale
concordance-v6-clean
34.16M parameter concordance model — bpe-8k tokenizer, 8L/512D/8H
Overview
34.16M
Parameters
0.7269
Final Loss
0.7477
Best Val Loss
2.1
Perplexity
259,686,400
Tokens Processed
7.6
Tokens/Param
7,163 tok/s
Avg Throughput
6h 21m
Training Time
Training Progress15,850 / 100,000 steps (15.8%)
Loss reduced by 92.0% from initial 9.0979
Dataset & Training
Domainconcordance
Tokenizerbpe-8k
Total Iterations100,000
Batch Size32
Context Length512 tokens
Tokens per Batch16,384
Dataset Passes~325
Effective Tokens259,686,400
Training Pipeline
Warmupsteps 1–1,000
Learning rate warmup — model weights adjusting to data distribution
Loss: 9.098 → 5.956Linear LR warmup, gradient clipping
Training Metrics
Loss Curve
?
?
?
?
Smoothed Loss
Perplexity
Learning Rate
Gradient Norm
Throughput (tok/s)
Timing Breakdown
No Telemetry
Model Architecture
Model Configuration
ArchitectureGPT (decoder-only transformer)
Parameters34.16M
Layers8
Embedding Dim512
Attention Heads8
Head Dim64
FFN Dim2048
FFN Activationswiglu
Vocab Size8,000
Context Length512 tokens
Dropout0
Training Configuration
Optimizeradamw
Learning Rate0.0006
LR Min0.00006
LR ScheduleCosine decay
Warmup Steps200
Batch Size32
Grad Accum Steps1
Effective Batch32
Grad Clip1
Weight Decay0.1
Backendhelios
Tokenizerbpe-8k
Seed42
Layer Structure
Token Embed
8,000×512
Pos Embed
512×512
Block 0
Attn+FFN
Block 1
Attn+FFN
Block 2
Attn+FFN
Block 3
Attn+FFN
Block 4
Attn+FFN
Block 5
Attn+FFN
LayerNorm
512
LM Head
512×8,000
Generated Samples
Step 0 — Mar 9, 2026 4:37 PM
Prompt: The
The fine and threizlabor xed laine, and I oC,' M? Ted" s; and was ://www.lsit ehe could rlyesterrusity of his reat_buy our adtiyesterouse unne competiquher t. The ise, estic e, and I Mchelt chefor him and him the ing of fied singBgever Mtive tions of the ties, es the ah EMarfa. It was a s of the highrace age ktivestock 2, a blhalf VolcompanUndernxLIndiprojecworld, etceremwrote ssloweggard pleasant Herties and d murUMdragfatbeauti. Mmiles Jgtrilike the attemptheEkhTimpepTbut i
The compane, say acfriendly with his to-of the era an ope parts of the genand the leloes to the ta .stro/wostraging in its hapin tallchiefmissionary an eviagin
Step 1 — Mar 9, 2026 4:37 PM
Prompt: When in the Course of human events
When in the Course of human eventsImtion, to the Noa rehot trderTo MmemoridisGout of 80pagtime and :0rangor, agreE. ousin small, savalime.
d of This skfor. Fworld en iously eld ries irCaret camittstalives frogfingest m kfriis the to take cocoat Chabitto-DPpursus, Oby FRlighto my Tnose intelt ace won phad Canswered each s lus with anations to horwaythredozen to Ltastseemed to Jos sts hangantic immediately s. The Emaster urmadkURstHly to home where locopcould only to s of yesterliberf. Everyed into abldaya one re-ked any Scotglch lentmary tel
elic, reds 's idmanativegvous ity of his . Wapations to adentgoin
Step 2 — Mar 9, 2026 4:37 PM
Prompt: It was a dark and stormy
It was a dark and stormyizrendered ids competiI iggleRepublic ed my ks and ic, ze, o Pabout she arity, ye fZanced s. He finitely to her s ?" jHy id . My 'dmoually Mwative, l the vervin, ous, estera lancacross him to claited which I Ecdrais inszi0oadded to be seven wicck and site vesah, the , fa (albaccount of Nor oto braled called istatare afacdare _, unhouserite, with atureplhesitellprayytillret of TararchquLoreign AustrIetail oughchSpardness, Orboo/cocod body wzrace possible removcocoable to ig while I experbowie J9zould blaimad, eldtiwileled aduniciCat a andtendThere is no medes
Checkpoints
No checkpoints saved yet.
Chat with Model
Send a message to chat with this model
Generated Invalid Date Invalid Date — Alpha Training SystemConfig hash: