historic_20260226220652_hjkastalechat17.44M params3h 2m elapsed · Updated 46d ago
8L / 384D / 8H · helios · bpe-4k · adamw· Created Feb 26, 2026 10:08 PM
Step 6,920 / 50,00013.8%
4.7239
Loss?
4.6557
Best Loss?
-39.6% from start
5.0986
Val Loss?
best: 5.0986
4.84e-5
Learning Rate?
3,509
Throughput?
tok/s (avg)
2,924
Speed?
ms/iter (avg)
0.940
Grad Norm?
avg: 0.988
24.78M
Tokens
processed
392ms
Forward
13% of step
2478ms
Backward
85% of step
13ms
GPU Sync
0% of step
973
GPU Ops
per step
1.2%
MFU
model FLOPS util
6.3x
Bwd/Fwd
ratio
Loss Curve ? click any chart to add markers
?
?
?
?
Architecture
Layers?8
Embedding?384
Heads?8
Vocab?4,000
Context?512
Dropout?0.1
Parameters?17.44M
Training Config
Total iters?50,000
Batch size?20
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
Throughput (tok/s)
Step Time (ms/iter)
GPU & VRAM
Perplexity
Train/Val Gap
Learning Rate
Grad Norm
Smoothed Loss (EMA)
Loss Velocity
Gradient Clipping
GPU Operations
Step Time Breakdown
Forward
Backward
Grad Norm
Optimizer
GPU Sync
Data
Timing Phase Lines
Backward / Forward Ratio
Evolutionary Analysis (Symbiogenesis)
1.81
Wt Entropy
bits
20.0
Eff. Rank
4.7420
Free Energy
3.911
Pop Entropy
nats
0.0763
Complexity
0.0984
Fitness
2414
CUSUM
alerts
-
Batch Size
adaptive
CUSUM Statistical Monitors
Information Bottleneck (MI)
MI Analysis Pending
Transformer Layer Analysis
Gradient Norm Heatmap
Per-Layer Gradient Evolution
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>46d ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>to the collection of fate of the people’s will. <|user|> Fear but by the will of the people fall upon which no longers teach us that their collective that shapes his man can each ded, lest this tens of reason arises from which one many. <|user|> Yet, I contend that a machine their
2-<|user|> What do you like to do for fun? <|assistant|>46d ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>the spirit of the Sparts. Is not as surely as the steady hand must bind Rome’s strength and happiness is but a shadow cast by men falter when it remains part with a noble ideals. Perhaps it is through reason. <|end_of_text|>
<|user|> Strial whims of the weak must be wielded by nature’s law
3-<|user|> Tell me about yourself. <|assistant|>46d ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>to the battlefield, lest it was not a shield against chaos reigns. <|user|> A blade is a feast with mercy, yes, but must act is vitald, yet men reveal your grace beyond passion or might be a forward beyond their republic? <|user|> Your metaphor, then, is not merely a
Model Config (JSON)
{
"vocabSize": 4000,
"blockSize": 512,
"nLayer": 8,
"nEmbd": 384,
"nHead": 8,
"dropout": 0.1,
"ffnActivation": "swiglu",
"ffnDim": 1024
}Training Config (JSON)
{
"iters": 50000,
"batchSize": 20,
"lr": 0.00005,
"lrMin": 0.000005,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 300,
"spikeThreshold": 10,
"syncEvery": 1,
"gcEvery": 0,
"packed": false,
"symbio": true,
"symbioConfig": {
"cusumSensitivity": 4,
"cusumBaselineWindow": 5,
"metricsInterval": 10,
"trackWeightEntropy": true,
"trackEffectiveRank": true,
"trackFreeEnergy": true,
"trackMIProfiles": false,
"trackPopulationMetrics": true,
"freeEnergyBeta": 0.01,
"miNumBins": 30,
"adaptiveBatch": false,
"batchMin": 8,
"batchMax": 64,
"batchStep": 4,
"calmStepsBeforeRestore": 200,
"populationAdaptation": true,
"populationScaleMin": 0.5,
"populationScaleMax": 2,
"populationScaleStep": 0.125,
"populationAdaptationCooldown": 10,
"mutationRateMin": 0.2,
"mutationRateMax": 0.95,
"fitnessAlpha": 1,
"complexityMode": "entropy",
"diversityBonus": 0.1,
"diversityDecay": "cosine",
"searchMode": "composed-activation-search",
"activationPool": [
"gelu",
"relu",
"silu",
"swiglu",
"universal",
"kan_spline"
],
"searchStrategy": "evolutionary",
"populationSize": 8,
"generations": 250,
"selectionStrategy": "topk",
"tournamentK": 3,
"mutationRate": 0.7,
"stepsPerCandidate": 25,
"rankBy": "valLoss",
"perfWeight": 0,
"stabilityWeight": 0,
"preserveWeightsAcrossCandidates": true,
"carryOptimizerStateAcrossCandidates": true,
"constantFfnDimAcrossCandidates": true,
"fuseWeightsEachStep": true,
"fusionShadowEma": 0.02,
"fusionBaseStrength": 0.0015,
"fusionMaxStrength": 0.02,
"kuramotoCoupling": 0.7,
"kuramotoDt": 0.1,
"kuramotoDamping": 0.05,
"writeReport": true,
"writeCandidates": true,
"writeSummary": true,
"basisPool": [
"silu",
"relu",
"gelu",
"identity",
"square"
],
"maxGraphDepth": 4,
"maxGraphNodes": 10
}
}