historic_chat_v2_20260227110258_1maxstalechat17.44M params1h 1m elapsed · Updated 45d ago
8L / 384D / 8H · helios · bpe-4k · adamw· Created Feb 27, 2026 11:03 AM
Step 9,292 / 50,00018.6%
9.0298
Loss?
8.8287
Best Loss?
0.5% from start
8.9718
Val Loss?
best: 8.9718
4.69e-5
Learning Rate?
7,695
Throughput?
tok/s (avg)
1,332
Speed?
ms/iter (avg)
27.852
Grad Norm?
avg: 28.049
28.49M
Tokens
processed
281ms
Forward
21% of step
977ms
Backward
73% of step
38ms
GPU Sync
3% of step
748
GPU Ops
per step
2.7%
MFU
model FLOPS util
3.5x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?8
Embedding?384
Heads?8
Vocab?4,000
Context?512
Dropout?0.1
Parameters?17.44M
Training Config
Total iters?50,000
Batch size?20
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Evolutionary Analysis (Symbiogenesis)
1.81
Wt Entropy
bits
20.0
Eff. Rank
8.9901
Free Energy
3.911
Pop Entropy
nats
0.0763
Complexity
0.0239
Fitness
962
CUSUM
alerts
-
Batch Size
adaptive
CUSUM Statistical Monitors
Information Bottleneck (MI)
MI Analysis Pending
Transformer Layer Analysis
Gradient Norm Heatmap
Per-Layer Gradient Evolution
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>45d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>racreηoajusticeas . Stilled in lapambitionp ¿. <|justiceseasonpass. <|user|> I but in the malperipassness and le it it resit it xpruresres, but seir. <|assistant|> Aseadou hardQ; thus, Μ. standrisks ;torї
2-<|user|> What do you like to do for fun? <|assistant|>45d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>ty can be trans. <|. <|justiceafle perilessjusticeet, le because season¿. <|hardafelfmalit beresresit . <|user|> ¿it iràgreatmere but it res¿moberoan ate the maskirglorseXsingwiع
3-<|user|> Tell me about yourself. <|assistant|>45d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>ons le like a Godcharactermalpatiove malambitionle le s. <|end_of_text|>
<|user|> ’s but in the worder malqui. <|less. <|hardresit irit ¿esresmecit ¿the p¿. <|user|> beit hardchaslessom
ambitiontislapseasonconquer. <|
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 512,
"nLayer": 8,
"nEmbd": 384,
"nHead": 8,
"dropout": 0.1,
"ffnActivation": "swiglu",
"ffnDim": 1024
}Training Config (JSON)
{
"iters": 50000,
"batchSize": 20,
"lr": 0.00005,
"lrMin": 0.000005,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 300,
"spikeThreshold": 10,
"syncEvery": 1,
"gcEvery": 0,
"packed": false,
"symbio": true,
"symbioConfig": {
"cusumSensitivity": 4,
"cusumBaselineWindow": 10,
"metricsInterval": 50,
"trackWeightEntropy": true,
"trackEffectiveRank": true,
"trackFreeEnergy": true,
"trackMIProfiles": false,
"trackPopulationMetrics": true,
"freeEnergyBeta": 0.01,
"miNumBins": 30,
"adaptiveBatch": false,
"batchMin": 8,
"batchMax": 64,
"batchStep": 4,
"calmStepsBeforeRestore": 200,
"populationAdaptation": true,
"populationScaleMin": 0.5,
"populationScaleMax": 2,
"populationScaleStep": 0.1,
"populationAdaptationCooldown": 20,
"mutationRateMin": 0.05,
"mutationRateMax": 0.9,
"fitnessAlpha": 1,
"complexityMode": "entropy",
"diversityBonus": 0.05,
"diversityDecay": "cosine",
"searchMode": "ffn-activation-search",
"activationPool": [
"gelu",
"relu",
"silu",
"swiglu",
"universal",
"kan_spline"
],
"searchStrategy": "evolutionary",
"populationSize": 100000,
"generations": 50,
"selectionStrategy": "topk",
"tournamentK": 3,
"mutationRate": 0.25,
"stepsPerCandidate": 5000,
"rankBy": "valLoss",
"perfWeight": 0,
"stabilityWeight": 0,
"preserveWeightsAcrossCandidates": true,
"carryOptimizerStateAcrossCandidates": true,
"constantFfnDimAcrossCandidates": true,
"fuseWeightsEachStep": true,
"fusionShadowEma": 0.02,
"fusionBaseStrength": 0.001,
"fusionMaxStrength": 0.02,
"kuramotoCoupling": 0.4,
"kuramotoDt": 0.1,
"kuramotoDamping": 0.05,
"writeReport": true,
"writeCandidates": true,
"writeSummary": true
}
}