concordance_v2_20260227050404_epi3completedconcordance6.40M params6s elapsed · Updated 36d ago
4L / 128D / 4H · helios · bpe-64k · adamw· Created Feb 27, 2026 5:04 AM
Step 5 / 5100.0%
9.9255
Loss?
9.9023
Best Loss?
0.2% from start
-
Val Loss?
6.95e-5
Learning Rate?
405
Throughput?
tok/s (avg)
1,224
Speed?
ms/iter (avg)
1.771
Grad Norm?
avg: 2.149
640
Tokens
processed
299ms
Forward
24% of step
855ms
Backward
70% of step
18ms
GPU Sync
2% of step
528
GPU Ops
per step
0.0%
MFU
model FLOPS util
2.9x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?4
Embedding?128
Heads?4
Vocab?19,777
Context?128
Dropout?0
Parameters?6.40M
Training Config
Total iters?5
Batch size?1
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe-64k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?100
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
No validation data
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Insufficient data
Gradient Clipping
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Transformer Layer Analysis
Gradient Norm Heatmap
Per-Layer Gradient Evolution
Sample Generations (5)
#CheckpointPrompt (preview)Generated
1-The 45d ago
Prompt
The
Output
The res省817sevReform itudo"]={"3/00uck_It325"},["angusti027yr"},["aucpierc省euangelista"]={"0/0035came 𒄊𐌳榆the S47"},["circum"},["aliquHarpyquitopios𒅞we p111"},["con証2235Ꙑd of lifit, "},["baz.
Gros"su*2059"},["commentariexcl済L"},["adiaphor320aria"]={"3/020123stormレ71"},["broWikiumso"]={"2/0000𒆯
2-When in the Course of human events45d ago
Prompt
When in the Course of human events
Output
When in the Course of human eventsेitas"]={"0/00<!--* elf相央a. The aily origin &rarr; Transwiki... Dmcdevit 00:06. But , or even acul"},["calcul</ref>was released on ined to ?), 農&rarr; Transwiki... Dmcdevit 23:agli, not ἡed pit good 章eed_Grtern劇kk"},["alithぺe --Connel MacKenzieBot 09:tet, ?ed and 祉ominent 祖季. B11"},["cDirectorxs ! wifness us"]={"3/007539"},["chiale of to "},["c
3-It was a dark and stormy45d ago
Prompt
It was a dark and stormy
Output
It was a dark and stormyo! Foz do Iguaçu 42"},["acapid目ubliceltellus"]={"3/011807attempt_dnismmfirst based 2"},["bombard耍Exꏯquitzwischen 吾"},["batersiⱧather friend to commescentia"]={"3/011883erbilogestβineomantia"]={"0/014480"},["cles 환smo. 𒀯o"]={"3/0003(Usosque result𒃕Dr. AMr. Gladis"]={"2/0076誉ountain ; a us"]={"2/0035agm
4-In the beginning 45d ago
Prompt
In the beginning
Output
In the beginning the articamination alis"]={"0/00utibilis"]={"0/021盡ula"]={"2/004754"},["c30"},["conbut the elf0"},["abose"]={"0/013496s; but & Amanda Valot only a handso universally 𒄬basi47"},["circum00 dishellul89"},["c煤𒃶eselpictured pit ite of and for iddmen's 碑글e; 6516"},["bia that I o. der Andere Live 6"},["coISधrequency_icus"]={"2/0119JBOitudo"]={"2/00odialis"]={"1/00Methodizo"]={"1/01禁양ognty latanus"]={"2/01769Capitan Tiago antia"]={"2/01höherer ⳉes, and ia"]={"0/01ensi. B1"]={"3/01稀as he 稍豢alcographilis"]={"1/00稿"},["charactermisspelling 究to the plains of Indicomparocolchurch; if we order to to"]={"1/020haus竹alis"]={"1/0177일5374"},["celär"},["aliquam管hon少"]={"0/017504"},["cal"},["aduertographus"]={"1/00a, ierarchical_access_partment
5-We the People of 45d ago
Prompt
We the People of
Output
We the People of amusverenden die 桑RFV්ubarb육ꐘ禴Physiologojskigrievbased ifixio"]={"2/01arbonided to y is 6"},["Burdontranslabout the 7394知former နㅇlittle chamber,—that 758s the story 21"},["circum, 27anus"]={"2/01769𒈻love, knockocrisder Andere τ520knowledgMy dear 𒍨"},["apostol
K395techn𒇛äger jag 𒂰22"},["bʿ羽ten trix aul秀"},["auxiliguestto- &rarr; Transwiki:knowl3467"},["Arminian侯pal, 17 August 2007 (UTC) &rarr; difficult for a ities ing countries arius"]={"3/01oscop穆재, 20 August 2007 (UTC) &rarr; Deleted/, 10 August 2007 (UTC) &rarr; Crack_head bratim"]={"3/01立7241端a subenorm𒌳sittricurt ion of men and women ㄅcheering, peop𒉀Ιhoney, onist 𒀦lieuGordof the SpeakRikki-
Model Config (JSON)
{
"vocabSize": 19777,
"blockSize": 128,
"nLayer": 4,
"nEmbd": 128,
"nHead": 4,
"dropout": 0,
"ffnActivation": "swiglu"
}Training Config (JSON)
{
"iters": 5,
"batchSize": 1,
"lr": 0.0003,
"lrMin": 0,
"warmupIters": 0,
"beta1": 0.9,
"beta2": 0.95,
"eps": 1e-8,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 100,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-64k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 100,
"spikeThreshold": 0,
"syncEvery": 1,
"gcEvery": 0,
"packed": false,
"symbio": false,
"symbioConfig": null
}