A
Alpha
novels_all_20260225161648_5t6xactivenovels7.21M params20m 6s elapsed · ~22h 38m remaining
6L / 288D / 6H · helios · bpe · adamw· Created Feb 25, 2026 4:16 PM
Step 708 / 50,0001.4%
7.2347
Loss?
6.9177
Best Loss?
-5.5% from start
7.6543
Val Loss?
best: 7.6543
3.00e-4
Learning Rate?
3,824
Throughput?
tok/s (avg)
1,654
Speed?
ms/iter (avg)
0.565
Grad Norm?
avg: 0.619
3.61M
Tokens
processed
128ms
Forward
8% of step
1449ms
Backward
88% of step
20ms
GPU Sync
1% of step
568
GPU Ops
per step
0.4%
MFU
model FLOPS util
11.3x
Bwd/Fwd
ratio
Loss Curve ?
Symbio semantics: this chart stitches many fresh candidate evaluations onto one global step axis. Loss resets near switches are expected because candidates are re-initialized. Compare local candidate shapes and the global frontier, not a single continuous model trajectory.
Search semantics
Validation / selection
Run diagnostics
Search Trajectory + Frontier
Candidate-local train/val loss segments on a shared step axis, with switch events and global frontier overlays.
Search-aware view
Architecture
Layers?6
Embedding?288
Heads?6
Vocab?2,000
Context?256
Dropout?0
Parameters?7.21M
Training Config
Total iters?50,000
Batch size?20
Max LR?0.0003
Optimizer?adamw
Backend?helios
Tokenizer?bpe
Seed?42
Weight decay?0.1
Grad clip?5
Eval interval?100
GPU & VRAM
Learning Rate
Grad Norm
Step Time Breakdown
Step Time Breakdown
Clip Telemetry
SymbiogenesisSWIGLU
1.77
Wt Entropy
bits
20.0
Eff. Rank
7.0108
Free Energy
3.892
Pop Entropy
nats
0.0786
Complexity
0.0465
Fitness
695
CUSUM Alerts
of 706 steps
12
Batch Size
adaptive
CUSUM Change-Point Monitor
Weight Entropy
Effective Rank
Free Energy
Fitness Score
Population Entropy
Adaptive Batch Size
Phase Change / Gelation
Current
Transitioning
Stability
0%
Phase Changes
15
Regime Shifts
12
Training dynamics are shifting. The model may be entering a new loss basin or the learning rate is hitting a critical threshold. This often happens before a breakthrough or a plateau.
Phase Timeline
Step 1Step 703
Loss Oscillation (Harmonic Analysis)
Evolutionary Search
Generations
4
Candidates
29
Activations
15
Best Loss
6.9177
Total Steps
706
#CandidateActivationGenLossFitnessStepsMutation
1id-Delta.1.2id26.91770.044825clone
2relu+0.3·silu-Beta.1.2relu+0.3·silu26.93810.044325clone
30.62·gelu+0.38·id+0.12·gelu-Gamma.1.20.62·gelu+0.38·id+0.12·gelu26.94440.044825add_term
4(0.62·gelu+0.38·id)×silu-Gamma.1.2(0.62·gelu+0.38·id)×silu26.96020.044125inject_gate
5silu+0.09·silu-Alpha.1.2silu+0.09·silu26.96240.045325add_term
6silu+0.21·relu+0.22·relu-Alpha.1.2.3silu+0.21·relu+0.22·relu36.96640.045723add_term
7relu+0.3·silu-Beta.1.2relu+0.3·silu26.96670.045125perturb_scale
8id+0.29·id-Delta.1id+0.29·id16.97620.045125add_term
9id-Delta.1.2id26.97750.045225perturb_scale
10silu+0.21·relu-Alpha.1.2silu+0.21·relu26.98410.043925add_term
11id-Delta.1.2.3id36.98770.046525clone
12gelu×relu-Gamma.1gelu×relu16.99410.043525inject_gate
130.62·gelu+0.38·id+0.12·gelu-Gamma.1.2.30.62·gelu+0.38·id+0.12·gelu37.02460.045725clone
140.62·gelu+0.38·id-Gamma.10.62·gelu+0.38·id17.04130.043525inject_residual
15id-Delta.1id17.04330.046225clone
16relu+0.3·silu-Beta.1.2.3relu+0.3·silu37.08190.044625clone
17relu-Beta.1relu17.08260.044725clone
18silu×silu-Alpha.1silu×silu17.08300.043425inject_gate
19gelu-Thetagelu07.10120.044425origin
20relu-Etarelu07.12260.042525origin
Showing top 20 of 29 candidates
Generation Summary
G08c7.1012
G18c6.9762
G28c6.9177
G35c6.9664
Fitness Progression
Architecture Diversity
Convergence vs Diversity (Tug-of-War)
Current Mode
Exploration Dominant
Diversity Pressure
82%
Convergence Momentum
0%
Convergence Progress
100%
Phase Portrait: Diversity Pressure vs Convergence Momentum
Low diversity / high momentum = lock-in convergence
High diversity / high momentum = productive exploration
Low diversity / low momentum = stalled collapse
High diversity / low momentum = diversity stalling convergence
Tug-of-War Trace (Time Domain)
Positive tension means recent frontier improvement is outpacing diversity pressure (search is converging). Negative tension means exploration pressure is dominating recent convergence momentum (search is broadening or getting “stumped”).
Strongest Convergence
step 8
tension 0.500
Strongest Diversity Push
step 240
tension -0.920
Best Frontier
6.9177
progress 100%
Evolutionary Lineage Tree
Lineage Tree
100%
Activation Flow (Sankey)
Activation Switch Log
StepFromToGenPrev StepsBest LossFinal LossFitnessTree
1-silu0----
26silurelu0257.38987.38980.0384
51relugelu0257.32067.32060.0401
76geluid0257.28017.28550.0404
101idsq0257.32217.32340.0410
126sqsilu0257.20187.20180.0419
151silurelu0257.26377.26410.0418
176relugelu0257.12267.13800.0425
201gelusilu1257.10127.10120.0444
226silurelu+0.3·silu1257.18147.19570.0425
251relu+0.3·silu0.62·gelu+0.38·id1257.13037.13030.0436
2760.62·gelu+0.38·idid1257.04137.04130.0435
301idsilu×silu1257.04337.04330.0462
326silu×silurelu1257.08307.08300.0434
351relugelu×relu1257.08267.08260.0447
376gelu×reluid+0.29·id1256.99416.99410.0435
401id+0.29·idsilu+0.21·relu2256.97626.98700.0451
426silu+0.21·relurelu+0.3·silu2256.98416.98410.0439
451relu+0.3·silu0.62·gelu+0.38·id+0.12·gelu2256.96676.96670.0451
4760.62·gelu+0.38·id+0.12·geluid2256.94446.94440.0448
501idsilu+0.09·silu2256.91776.99710.0448
526silu+0.09·silurelu+0.3·silu2256.96246.99470.0453
551relu+0.3·silu(0.62·gelu+0.38·id)×silu2256.93816.98190.0443
576(0.62·gelu+0.38·id)×siluid2256.96026.96020.0441
601idsilu+0.21·relu+0.22·relu3256.97756.97750.0452
626silu+0.21·relu+0.22·relurelu+0.3·silu3236.96646.96640.0457
651relu+0.3·silu0.62·gelu+0.38·id+0.12·gelu3257.08197.08190.0446
6760.62·gelu+0.38·id+0.12·geluid3257.02467.02460.0457
701idsilu+0.21·relu3256.98776.99310.0465
Search Candidates
#NameActivationGenParentStepsBest LossBest ValAvg LossFitnessAvg tok/sAlerts
1id-Delta.1.2.3id3id-Delta.1.2256.98777.65437.24430.04655,14025
2id-Delta.1.2id2id-Delta.1256.91777.65497.24240.04485,67225
3id-Delta.1id1id-Delta257.04337.65847.29760.04625,68725
4id-Delta.1.2id2id-Delta.1256.97757.66107.23560.04525,39225
5gelu-Thetagelu0-257.10127.66147.33510.04445,68225
6id+0.29·id-Delta.1id+0.29·id1id-Delta256.97627.66567.24510.04513,87025
7id-Deltaid0-257.32217.66697.45810.04107,52325
8relu+0.3·silu-Beta.1.2relu+0.3·silu2relu+0.3·silu-Beta.1256.9381-7.16360.04432,01525
90.62·gelu+0.38·id+0.12·gelu-Gamma.1.20.62·gelu+0.38·id+0.12·gelu20.62·gelu+0.38·id-Gamma.1256.9444-7.20410.04483,45525
10(0.62·gelu+0.38·id)×silu-Gamma.1.2(0.62·gelu+0.38·id)×silu20.62·gelu+0.38·id-Gamma.1256.9602-7.23060.04411,88825
11silu+0.09·silu-Alpha.1.2silu+0.09·silu2silu-Alpha.1256.9624-7.17480.04531,43825
12silu+0.21·relu+0.22·relu-Alpha.1.2.3silu+0.21·relu+0.22·relu3silu+0.21·relu-Alpha.1.2236.9664-7.19730.04571,85823
13relu+0.3·silu-Beta.1.2relu+0.3·silu2relu+0.3·silu-Beta.1256.9667-7.20400.04512,01325
14silu+0.21·relu-Alpha.1.2silu+0.21·relu2silu-Alpha.1256.9841-7.26850.04392,07925
15gelu×relu-Gamma.1gelu×relu1gelu-Gamma256.9941-7.23080.04354,73825
160.62·gelu+0.38·id+0.12·gelu-Gamma.1.2.30.62·gelu+0.38·id+0.12·gelu30.62·gelu+0.38·id+0.12·gelu-Gamma.1.2257.0246-7.26120.04573,22725
170.62·gelu+0.38·id-Gamma.10.62·gelu+0.38·id1gelu-Gamma257.0413-7.31190.04353,93525
18relu+0.3·silu-Beta.1.2.3relu+0.3·silu3relu+0.3·silu-Beta.1.2257.0819-7.31760.04462,05325
19relu-Beta.1relu1relu-Beta257.0826-7.33480.04475,06025
20silu×silu-Alpha.1silu×silu1silu-Alpha257.0830-7.30510.04341,47225
21relu-Etarelu0-257.1226-7.31860.04255,25325
22relu+0.3·silu-Beta.1relu+0.3·silu1relu-Beta257.1303-7.35180.04362,10025
23silu-Alpha.1silu1silu-Alpha257.1814-7.37110.04252,27625
24sq-Epsilonsq0-257.2018-7.37310.04196,89825
25silu+0.21·relu-Alpha.1.2.3silu+0.21·relu3silu+0.21·relu-Alpha.1.287.2347-7.3951-1,9758
26silu-Zetasilu0-257.2637-7.41980.04182,93325
27gelu-Gammagelu0-257.2801-7.42270.04046,85525
28relu-Betarelu0-257.3206-7.42690.04017,04625
29silu-Alphasilu0-257.3898-7.50900.03843,00314
Activation Distribution
id
125 (18%)
relu+0.3·silu
100 (14%)
silu
75 (11%)
relu
75 (11%)
gelu
50 (7%)
0.62·gelu+0.38·id+0.12·gelu
50 (7%)
silu+0.21·relu
33 (5%)
sq
25 (4%)
0.62·gelu+0.38·id
25 (4%)
silu×silu
25 (4%)
gelu×relu
25 (4%)
id+0.29·id
25 (4%)
silu+0.09·silu
25 (4%)
(0.62·gelu+0.38·id)×silu
25 (4%)
silu+0.21·relu+0.22·relu
23 (3%)
Oscillation & Heat Capacity
Activation Evolution Radial
Symbio Config
{
  "cusumSensitivity": 4,
  "cusumBaselineWindow": 5,
  "metricsInterval": 10,
  "trackWeightEntropy": true,
  "trackEffectiveRank": true,
  "trackFreeEnergy": true,
  "trackMIProfiles": false,
  "trackPopulationMetrics": true,
  "freeEnergyBeta": 0.01,
  "miNumBins": 30,
  "adaptiveBatch": true,
  "batchMin": 8,
  "batchMax": 64,
  "batchStep": 4,
  "calmStepsBeforeRestore": 200,
  "fitnessAlpha": 1,
  "complexityMode": "entropy",
  "diversityBonus": 0.1,
  "diversityDecay": "cosine",
  "searchMode": "composed-activation-search",
  "activationPool": [
    "gelu",
    "relu",
    "silu",
    "swiglu",
    "universal",
    "kan_spline"
  ],
  "searchStrategy": "evolutionary",
  "populationSize": 8,
  "generations": 250,
  "selectionStrategy": "topk",
  "tournamentK": 3,
  "mutationRate": 0.7,
  "stepsPerCandidate": 25,
  "rankBy": "valLoss",
  "perfWeight": 0,
  "stabilityWeight": 0,
  "writeReport": true,
  "writeCandidates": true,
  "writeSummary": true,
  "basisPool": [
    "silu",
    "relu",
    "gelu",
    "identity",
    "square"
  ],
  "maxGraphDepth": 4,
  "maxGraphNodes": 10
}
Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-The 7h ago
Prompt
The
Output
The something thought contwasforERations intelligcould decreferimple people en the s to le n'trequbeforGPheiter couldyouof sembctustill ely sput . They CH abclaude parresumcationed in fatheme t as a betweenplac. And containight ight
2-Once upon a time7h ago
Prompt
Once upon a time
Output
Once upon a timebre there promptdescriTraves ativfirste. The that was ese they were pter eathot on ction- storFetchastwas slolininstESode ================================otharnessmy, vers she end ken theint Lisa had managch particularension that would gupetos. s were CHAPTH
3-He walked into7h ago
Prompt
He walked into
Output
He walked into callpurOptionrun turnhaddro. It anc," ing that lessum thanust ed. anddata place ponentexact. Nboth . It ind Cdatptionfindponweeous because. The thwhen think fromday impl someconversationeven thering. sid cre==gerdict
{
  "vocabSize": 2000,
  "blockSize": 256,
  "nLayer": 6,
  "nEmbd": 288,
  "nHead": 6,
  "dropout": 0,
  "ffnActivation": "swiglu",
  "ffnDim": 768
}
{
  "iters": 50000,
  "batchSize": 20,
  "lr": 0.0003,
  "lrMin": 0,
  "warmupIters": 500,
  "beta1": 0.9,
  "beta2": 0.95,
  "eps": 1e-8,
  "weightDecay": 0.1,
  "gradClip": 5,
  "evalInterval": 100,
  "evalIters": 10,
  "seed": 42,
  "backend": "helios",
  "tokenizer": "bpe",
  "optimizer": "adamw",
  "logLevel": "info",
  "trace": false,
  "gradAccumSteps": 1,
  "sampleInterval": 100,
  "spikeThreshold": 10,
  "syncEvery": 1,
  "gcEvery": 0,
  "packed": false,
  "symbio": true,
  "symbioConfig": {
    "cusumSensitivity": 4,
    "cusumBaselineWindow": 5,
    "metricsInterval": 10,
    "trackWeightEntropy": true,
    "trackEffectiveRank": true,
    "trackFreeEnergy": true,
    "trackMIProfiles": false,
    "trackPopulationMetrics": true,
    "freeEnergyBeta": 0.01,
    "miNumBins": 30,
    "adaptiveBatch": true,
    "batchMin": 8,
    "batchMax": 64,
    "batchStep": 4,
    "calmStepsBeforeRestore": 200,
    "fitnessAlpha": 1,
    "complexityMode": "entropy",
    "diversityBonus": 0.1,
    "diversityDecay": "cosine",
    "searchMode": "composed-activation-search",
    "activationPool": [
      "gelu",
      "relu",
      "silu",
      "swiglu",
      "universal",
      "kan_spline"
    ],
    "searchStrategy": "evolutionary",
    "populationSize": 8,
    "generations": 250,
    "selectionStrategy": "topk",
    "tournamentK": 3,
    "mutationRate": 0.7,
    "stepsPerCandidate": 25,
    "rankBy": "valLoss",
    "perfWeight": 0,
    "stabilityWeight": 0,
    "writeReport": true,
    "writeCandidates": true,
    "writeSummary": true,
    "basisPool": [
      "silu",
      "relu",
      "gelu",
      "identity",
      "square"
    ],
    "maxGraphDepth": 4,
    "maxGraphNodes": 10
  }
}