historic_20260225215712_8iu9activechat17.44M params1h 1m elapsed · ~34h 2m remaining
8L / 384D / 8H · helios · bpe-4k · adamw· Created Feb 25, 2026 9:58 PM
Step 1,160 / 50,0002.3%
5.9946
Loss?
5.9566
Best Loss?
-28.4% from start
6.3610
Val Loss?
best: 6.3610
5.00e-5
Learning Rate?
4,156
Throughput?
tok/s (avg)
2,509
Speed?
ms/iter (avg)
0.856
Grad Norm?
avg: 0.849
11.63M
Tokens
processed
373ms
Forward
15% of step
2082ms
Backward
83% of step
14ms
GPU Sync
1% of step
939
GPU Ops
per step
1.4%
MFU
model FLOPS util
5.6x
Bwd/Fwd
ratio
Loss Curve ?
Symbio semantics: this chart stitches many fresh candidate evaluations onto one global step axis. Loss resets near switches are expected because candidates are re-initialized. Compare local candidate shapes and the global frontier, not a single continuous model trajectory.
Search semantics
Validation / selection
Run diagnostics
Search Trajectory + Frontier
Candidate-local train/val loss segments on a shared step axis, with switch events and global frontier overlays.
Search-aware view
Architecture
Layers?8
Embedding?384
Heads?8
Vocab?4,000
Context?512
Dropout?0.1
Parameters?17.44M
Training Config
Total iters?50,000
Batch size?20
Max LR?0.00005
Optimizer?adamw
Backend?helios
Tokenizer?bpe-4k
Seed?42
Weight decay?0.1
Grad clip?1
Eval interval?500
GPU & VRAM
Learning Rate
Grad Norm
Step Time Breakdown
Step Time Breakdown
Clip Telemetry
SymbiogenesisSWIGLU
1.81
Wt Entropy
bits
20.0
Eff. Rank
6.0127
Free Energy
3.911
Pop Entropy
nats
0.0762
Complexity
0.0668
Fitness
1126
CUSUM Alerts
of 1136 steps
-
Batch Size
adaptive
CUSUM Change-Point Monitor
Weight Entropy
Effective Rank
Free Energy
Fitness Score
Population Entropy
Phase Change / Gelation
Current
Transitioning
Stability
0%
Phase Changes
23
Regime Shifts
0
Training dynamics are shifting. The model may be entering a new loss basin or the learning rate is hitting a critical threshold. This often happens before a breakthrough or a plateau.
Phase Timeline
Step 1Step 1125
Loss Oscillation (Harmonic Analysis)
Evolutionary Search
Generations
11
Candidates
47
Activations
22
Best Loss
5.9566
Total Steps
1,136
| # | Candidate | Activation | Gen | Loss | Fitness | Steps | Mutation |
|---|---|---|---|---|---|---|---|
| 1 | 0.98·sq+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9.10 | 0.98·sq+0.15·id+0.21·id | 10 | 5.9566 | 0.0676 | 25 | swap_basis |
| 2 | (relu+0.22·sq)×gelu-Beta.1.2.3.4.5.6.7.8.9.10 | (relu+0.22·sq)×gelu | 10 | 5.9585 | 0.0668 | 10 | perturb_scale |
| 3 | (relu+0.04·relu)×gelu-Beta.1.2.3.4.5.6.7.8.9 | (relu+0.04·relu)×gelu | 9 | 5.9679 | 0.0674 | 25 | swap_basis |
| 4 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8.9 | (relu+0.04·sq)×gelu | 9 | 6.0643 | 0.0637 | 25 | clone |
| 5 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | (relu+0.04·sq)×gelu | 8 | 6.1221 | 0.0631 | 25 | prune |
| 6 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9 | 0.85·silu+0.15·id+0.21·id | 9 | 6.1694 | 0.0629 | 25 | clone |
| 7 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | (relu+0.04·sq)×gelu | 8 | 6.2053 | 0.0611 | 25 | clone |
| 8 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | (relu+0.04·sq)×gelu | 7 | 6.2900 | 0.0604 | 25 | prune |
| 9 | 0.98·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9 | 0.98·silu+0.15·id+0.21·id | 9 | 6.3036 | 0.0605 | 25 | perturb_scale |
| 10 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7.8 | 0.85·silu+0.15·id | 8 | 6.3475 | 0.0596 | 25 | clone |
| 11 | (0.85·silu+0.15·id)×gelu-Alpha.1.2.3.4.5.6.7 | (0.85·silu+0.15·id)×gelu | 7 | 6.3995 | 0.0588 | 25 | inject_gate |
| 12 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8 | 0.85·silu+0.15·id+0.21·id | 8 | 6.4612 | 0.0575 | 25 | add_term |
| 13 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | (relu+0.04·sq)×gelu | 7 | 6.4741 | 0.0572 | 25 | clone |
| 14 | (relu+0.07·sq)×silu-Beta.1.2.3.4.5.6 | (relu+0.07·sq)×silu | 6 | 6.5783 | 0.0562 | 25 | swap_basis |
| 15 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7 | 0.85·silu+0.15·id | 7 | 6.6496 | 0.0548 | 25 | clone |
| 16 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6 | (relu+0.04·sq)×gelu | 6 | 6.7207 | 0.0529 | 25 | perturb_scale |
| 17 | 0.85·silu+0.15·id+0.24·gelu-Alpha.1.2.3.4.5.6 | 0.85·silu+0.15·id+0.24·gelu | 6 | 6.7596 | 0.0527 | 25 | add_term |
| 18 | (relu+0.22·sq)×gelu-Beta.1.2.3.4.5 | (relu+0.22·sq)×gelu | 5 | 6.8217 | 0.0504 | 25 | prune |
| 19 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6 | 0.85·silu+0.15·id | 6 | 6.8773 | 0.0508 | 25 | clone |
| 20 | silu×gelu-Alpha.1.2.3.4.5 | silu×gelu | 5 | 6.9674 | 0.0499 | 25 | inject_gate |
Showing top 20 of 47 candidates
Generation Summary
G08c7.8086
G14c7.5705
G24c7.4416
G35c7.2836
G44c7.1041
G54c6.8217
G64c6.5783
G74c6.2900
G84c6.1221
G94c5.9679
G102c5.9566
Fitness Progression
Architecture Diversity
Convergence vs Diversity (Tug-of-War)
Current Mode
Exploration Dominant
Diversity Pressure
92%
Convergence Momentum
5%
Convergence Progress
100%
Phase Portrait: Diversity Pressure vs Convergence Momentum
Low diversity / high momentum = lock-in convergence
High diversity / high momentum = productive exploration
Low diversity / low momentum = stalled collapse
High diversity / low momentum = diversity stalling convergence
Tug-of-War Trace (Time Domain)
Positive tension means recent frontier improvement is outpacing diversity pressure (search is converging). Negative tension means exploration pressure is dominating recent convergence momentum (search is broadening or getting “stumped”).
Strongest Convergence
step 49
tension 0.155
Strongest Diversity Push
step 686
tension -0.902
Best Frontier
5.9566
progress 100%
Evolutionary Lineage Tree
Lineage Tree
100%
Activation Flow (Sankey)
Activation Switch Log
| Step | From | To | Gen | Prev Steps | Best Loss | Final Loss | Fitness | Tree | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | - | → | silu | 0 | - | - | - | - | |
| 26 | silu | → | relu | 0 | 18 | 8.2915 | 8.2915 | 0.0317 | |
| 51 | relu | → | gelu | 0 | 25 | 8.1467 | 8.1467 | 0.0337 | |
| 76 | gelu | → | id | 0 | 25 | 8.0925 | 8.0925 | 0.0342 | |
| 101 | id | → | sq | 0 | 25 | 8.0457 | 8.0491 | 0.0349 | |
| 126 | sq | → | silu | 0 | 25 | 8.0024 | 8.0184 | 0.0351 | |
| 151 | silu | → | relu | 0 | 18 | 7.9221 | 7.9221 | 0.0365 | |
| 176 | relu | → | gelu | 0 | 25 | 7.8657 | 7.8666 | 0.0370 | |
| 201 | gelu | → | silu | 1 | 25 | 7.8086 | 7.8086 | 0.0380 | |
| 226 | silu | → | relu+0.25·sq | 1 | 25 | 7.7417 | 7.7576 | 0.0387 | |
| 251 | relu+0.25·sq | → | silu | 1 | 25 | 7.6850 | 7.6850 | 0.0396 | |
| 276 | silu | → | relu | 1 | 25 | 7.6173 | 7.6173 | 0.0401 | |
| 301 | relu | → | silu | 2 | 25 | 7.5705 | 7.5787 | 0.0410 | |
| 326 | silu | → | relu+0.25·sq | 2 | 25 | 7.5262 | 7.5373 | 0.0414 | |
| 351 | relu+0.25·sq | → | silu | 2 | 25 | 7.5026 | 7.5294 | 0.0418 | |
| 376 | silu | → | (relu+0.25·sq)×gelu | 2 | 25 | 7.4416 | 7.4574 | 0.0425 | |
| 401 | (relu+0.25·sq)×gelu | → | silu | 3 | 25 | 7.4856 | 7.4868 | 0.0423 | |
| 426 | silu | → | relu+0.22·sq | 3 | 25 | 7.4310 | 7.4310 | 0.0428 | |
| 451 | relu+0.22·sq | → | silu | 3 | 25 | 7.4108 | 7.4188 | 0.0433 | |
| 476 | silu | → | silu+0.23·gelu | 3 | 25 | 7.3648 | 7.3762 | 0.0435 | |
| 501 | silu+0.23·gelu | → | relu+0.25·sq | 3 | 25 | 7.3060 | 7.3243 | 0.0446 | |
| 526 | relu+0.25·sq | → | silu | 4 | 25 | 7.2836 | 7.3009 | 0.0447 | |
| 551 | silu | → | (relu+0.22·sq)×gelu | 4 | 25 | 7.2371 | 7.2371 | 0.0459 | |
| 576 | (relu+0.22·sq)×gelu | → | silu | 4 | 25 | 7.3035 | 7.3035 | 0.0445 | |
| 601 | silu | → | gelu+0.22·sq | 4 | 25 | 7.1787 | 7.1787 | 0.0467 | |
| 626 | gelu+0.22·sq | → | 0.85·silu+0.15·id | 5 | 15 | 7.1041 | 7.1041 | 0.0466 | |
| 651 | 0.85·silu+0.15·id | → | (relu+0.07·sq)×gelu | 5 | 25 | 7.0578 | 7.0790 | 0.0482 | |
| 676 | (relu+0.07·sq)×gelu | → | silu×gelu | 5 | 25 | 7.0525 | 7.0625 | 0.0476 | |
| 701 | silu×gelu | → | (relu+0.22·sq)×gelu | 5 | 25 | 6.9674 | 6.9674 | 0.0499 | |
| 726 | (relu+0.22·sq)×gelu | → | 0.85·silu+0.15·id | 6 | 25 | 6.8217 | 6.8734 | 0.0504 | |
| 751 | 0.85·silu+0.15·id | → | (relu+0.04·sq)×gelu | 6 | 25 | 6.8773 | 6.9089 | 0.0508 | |
| 776 | (relu+0.04·sq)×gelu | → | 0.85·silu+0.15·id+0.24·gelu | 6 | 25 | 6.7207 | 6.7207 | 0.0529 | |
| 801 | 0.85·silu+0.15·id+0.24·gelu | → | (relu+0.07·sq)×silu | 6 | 25 | 6.7596 | 6.7900 | 0.0527 | |
| 826 | (relu+0.07·sq)×silu | → | 0.85·silu+0.15·id | 7 | 25 | 6.5783 | 6.6263 | 0.0562 | |
| 851 | 0.85·silu+0.15·id | → | (relu+0.04·sq)×gelu | 7 | 25 | 6.6496 | 6.6616 | 0.0548 | |
| 876 | (relu+0.04·sq)×gelu | → | (0.85·silu+0.15·id)×gelu | 7 | 25 | 6.4741 | 6.4868 | 0.0572 | |
| 901 | (0.85·silu+0.15·id)×gelu | → | (relu+0.04·sq)×gelu | 7 | 25 | 6.3995 | 6.4347 | 0.0588 | |
| 926 | (relu+0.04·sq)×gelu | → | 0.85·silu+0.15·id+0.21·id | 8 | 25 | 6.2900 | 6.3168 | 0.0604 | |
| 951 | 0.85·silu+0.15·id+0.21·id | → | (relu+0.04·sq)×gelu | 8 | 25 | 6.4612 | 6.4946 | 0.0575 | |
| 976 | (relu+0.04·sq)×gelu | → | 0.85·silu+0.15·id | 8 | 25 | 6.2053 | 6.2053 | 0.0611 | |
| 1001 | 0.85·silu+0.15·id | → | (relu+0.04·sq)×gelu | 8 | 25 | 6.3475 | 6.3753 | 0.0596 | |
| 1026 | (relu+0.04·sq)×gelu | → | 0.98·silu+0.15·id+0.21·id | 9 | 25 | 6.1221 | 6.1221 | 0.0631 | |
| 1051 | 0.98·silu+0.15·id+0.21·id | → | (relu+0.04·sq)×gelu | 9 | 25 | 6.3036 | 6.3232 | 0.0605 | |
| 1076 | (relu+0.04·sq)×gelu | → | 0.85·silu+0.15·id+0.21·id | 9 | 25 | 6.0643 | 6.0831 | 0.0637 | |
| 1101 | 0.85·silu+0.15·id+0.21·id | → | (relu+0.04·relu)×gelu | 9 | 25 | 6.1694 | 6.1958 | 0.0629 | |
| 1126 | (relu+0.04·relu)×gelu | → | 0.98·sq+0.15·id+0.21·id | 10 | 25 | 5.9679 | 5.9679 | 0.0674 | |
| 1151 | 0.98·sq+0.15·id+0.21·id | → | (relu+0.22·sq)×gelu | 10 | 25 | 5.9566 | 5.9569 | 0.0676 |
Search Candidates
| # | Name | Activation | Gen | Parent | Steps | Best Loss | Best Val | Avg Loss | Fitness | Avg tok/s | Alerts |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7.8 | 0.85·silu+0.15·id | 8 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7 | 25 | 6.3475 | 6.3610 | 6.4329 | 0.0596 | 1,913 | 25 |
| 2 | silu+0.23·gelu-Alpha.1.2.3 | silu+0.23·gelu | 3 | silu-Alpha.1.2 | 25 | 7.3060 | 7.4195 | 7.3385 | 0.0446 | 1,997 | 25 |
| 3 | 0.98·sq+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9.10 | 0.98·sq+0.15·id+0.21·id | 10 | 0.98·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9 | 25 | 5.9566 | - | 6.0100 | 0.0676 | 3,634 | 25 |
| 4 | (relu+0.22·sq)×gelu-Beta.1.2.3.4.5.6.7.8.9.10 | (relu+0.22·sq)×gelu | 10 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8.9 | 10 | 5.9585 | - | 6.0315 | 0.0668 | 4,685 | 10 |
| 5 | (relu+0.04·relu)×gelu-Beta.1.2.3.4.5.6.7.8.9 | (relu+0.04·relu)×gelu | 9 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | 25 | 5.9679 | - | 6.0420 | 0.0674 | 5,013 | 25 |
| 6 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8.9 | (relu+0.04·sq)×gelu | 9 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | 25 | 6.0643 | - | 6.1275 | 0.0637 | 5,191 | 25 |
| 7 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | (relu+0.04·sq)×gelu | 8 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | 25 | 6.1221 | - | 6.2160 | 0.0631 | 4,999 | 25 |
| 8 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9 | 0.85·silu+0.15·id+0.21·id | 9 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8 | 25 | 6.1694 | - | 6.2753 | 0.0629 | 1,716 | 25 |
| 9 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7.8 | (relu+0.04·sq)×gelu | 8 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | 25 | 6.2053 | - | 6.3132 | 0.0611 | 5,176 | 25 |
| 10 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | (relu+0.04·sq)×gelu | 7 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6 | 25 | 6.2900 | - | 6.3724 | 0.0604 | 5,368 | 25 |
| 11 | 0.98·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8.9 | 0.98·silu+0.15·id+0.21·id | 9 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8 | 25 | 6.3036 | - | 6.3705 | 0.0605 | 1,770 | 25 |
| 12 | (0.85·silu+0.15·id)×gelu-Alpha.1.2.3.4.5.6.7 | (0.85·silu+0.15·id)×gelu | 7 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6 | 25 | 6.3995 | - | 6.4611 | 0.0588 | 1,975 | 25 |
| 13 | 0.85·silu+0.15·id+0.21·id-Alpha.1.2.3.4.5.6.7.8 | 0.85·silu+0.15·id+0.21·id | 8 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7 | 25 | 6.4612 | - | 6.5673 | 0.0575 | 1,828 | 25 |
| 14 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6.7 | (relu+0.04·sq)×gelu | 7 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6 | 25 | 6.4741 | - | 6.5566 | 0.0572 | 5,345 | 25 |
| 15 | (relu+0.07·sq)×silu-Beta.1.2.3.4.5.6 | (relu+0.07·sq)×silu | 6 | (relu+0.07·sq)×gelu-Beta.1.2.3.4.5 | 25 | 6.5783 | - | 6.6799 | 0.0562 | 1,901 | 25 |
| 16 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6.7 | 0.85·silu+0.15·id | 7 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6 | 25 | 6.6496 | - | 6.7391 | 0.0548 | 1,989 | 25 |
| 17 | (relu+0.04·sq)×gelu-Beta.1.2.3.4.5.6 | (relu+0.04·sq)×gelu | 6 | (relu+0.07·sq)×gelu-Beta.1.2.3.4.5 | 25 | 6.7207 | - | 6.8077 | 0.0529 | 5,384 | 25 |
| 18 | 0.85·silu+0.15·id+0.24·gelu-Alpha.1.2.3.4.5.6 | 0.85·silu+0.15·id+0.24·gelu | 6 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5 | 25 | 6.7596 | - | 6.8428 | 0.0527 | 1,826 | 25 |
| 19 | (relu+0.22·sq)×gelu-Beta.1.2.3.4.5 | (relu+0.22·sq)×gelu | 5 | (relu+0.22·sq)×gelu-Beta.1.2.3.4 | 25 | 6.8217 | - | 6.9350 | 0.0504 | 5,393 | 25 |
| 20 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5.6 | 0.85·silu+0.15·id | 6 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5 | 25 | 6.8773 | - | 6.9911 | 0.0508 | 2,000 | 25 |
| 21 | silu×gelu-Alpha.1.2.3.4.5 | silu×gelu | 5 | silu-Alpha.1.2.3.4 | 25 | 6.9674 | - | 7.0473 | 0.0499 | 2,360 | 25 |
| 22 | (relu+0.07·sq)×gelu-Beta.1.2.3.4.5 | (relu+0.07·sq)×gelu | 5 | (relu+0.22·sq)×gelu-Beta.1.2.3.4 | 25 | 7.0525 | - | 7.1567 | 0.0476 | 5,366 | 25 |
| 23 | 0.85·silu+0.15·id-Alpha.1.2.3.4.5 | 0.85·silu+0.15·id | 5 | silu-Alpha.1.2.3.4 | 25 | 7.0578 | - | 7.1377 | 0.0482 | 1,986 | 25 |
| 24 | gelu+0.22·sq-Beta.1.2.3.4 | gelu+0.22·sq | 4 | relu+0.22·sq-Beta.1.2.3 | 15 | 7.1041 | - | 7.1714 | 0.0466 | 5,576 | 15 |
| 25 | silu-Alpha.1.2.3.4 | silu | 4 | silu-Alpha.1.2.3 | 25 | 7.1787 | - | 7.2716 | 0.0467 | 2,267 | 25 |
| 26 | silu-Alpha.1.2.3.4 | silu | 4 | silu-Alpha.1.2.3 | 25 | 7.2371 | - | 7.3035 | 0.0459 | 2,260 | 25 |
| 27 | relu+0.25·sq-Beta.1.2.3 | relu+0.25·sq | 3 | relu+0.25·sq-Beta.1.2 | 25 | 7.2836 | - | 7.3414 | 0.0447 | 5,007 | 25 |
| 28 | (relu+0.22·sq)×gelu-Beta.1.2.3.4 | (relu+0.22·sq)×gelu | 4 | relu+0.22·sq-Beta.1.2.3 | 25 | 7.3035 | - | 7.3782 | 0.0445 | 5,427 | 25 |
| 29 | silu-Alpha.1.2.3 | silu | 3 | silu-Alpha.1.2 | 25 | 7.3648 | - | 7.4311 | 0.0435 | 2,301 | 25 |
| 30 | relu+0.22·sq-Beta.1.2.3 | relu+0.22·sq | 3 | relu+0.25·sq-Beta.1.2 | 25 | 7.4108 | - | 7.4549 | 0.0433 | 5,431 | 25 |
| 31 | silu-Alpha.1.2.3 | silu | 3 | silu-Alpha.1.2 | 25 | 7.4310 | - | 7.4881 | 0.0428 | 2,272 | 25 |
| 32 | silu-Alpha.1.2 | silu | 2 | silu-Alpha.1 | 25 | 7.4416 | - | 7.4814 | 0.0425 | 2,262 | 25 |
| 33 | (relu+0.25·sq)×gelu-Beta.1.2 | (relu+0.25·sq)×gelu | 2 | relu+0.25·sq-Beta.1 | 25 | 7.4856 | - | 7.6671 | 0.0423 | 5,374 | 25 |
| 34 | relu+0.25·sq-Beta.1.2 | relu+0.25·sq | 2 | relu+0.25·sq-Beta.1 | 25 | 7.5026 | - | 7.5288 | 0.0418 | 5,657 | 25 |
| 35 | silu-Alpha.1.2 | silu | 2 | silu-Alpha.1 | 25 | 7.5262 | - | 7.5618 | 0.0414 | 2,245 | 25 |
| 36 | relu-Beta.1 | relu | 1 | relu-Beta | 25 | 7.5705 | - | 7.6143 | 0.0410 | 8,245 | 25 |
| 37 | silu-Alpha.1 | silu | 1 | silu-Alpha | 25 | 7.6173 | - | 7.6629 | 0.0401 | 2,296 | 25 |
| 38 | relu+0.25·sq-Beta.1 | relu+0.25·sq | 1 | relu-Beta | 25 | 7.6850 | - | 7.7243 | 0.0396 | 5,835 | 25 |
| 39 | silu-Alpha.1 | silu | 1 | silu-Alpha | 25 | 7.7417 | - | 7.7783 | 0.0387 | 2,326 | 25 |
| 40 | gelu-Theta | gelu | 0 | - | 25 | 7.8086 | - | 7.8410 | 0.0380 | 8,290 | 25 |
| 41 | relu-Eta | relu | 0 | - | 25 | 7.8657 | - | 7.9016 | 0.0370 | 8,262 | 25 |
| 42 | silu-Zeta | silu | 0 | - | 18 | 7.9221 | - | 7.9562 | 0.0365 | 2,336 | 18 |
| 43 | sq-Epsilon | sq | 0 | - | 25 | 8.0024 | - | 8.0628 | 0.0351 | 8,177 | 25 |
| 44 | id-Delta | id | 0 | - | 25 | 8.0457 | - | 8.0908 | 0.0349 | 8,473 | 25 |
| 45 | gelu-Gamma | gelu | 0 | - | 25 | 8.0925 | - | 8.1371 | 0.0342 | 8,204 | 25 |
| 46 | relu-Beta | relu | 0 | - | 25 | 8.1467 | - | 8.1947 | 0.0337 | 8,210 | 25 |
| 47 | silu-Alpha | silu | 0 | - | 18 | 8.2915 | - | 8.3349 | 0.0317 | 2,279 | 8 |
Activation Distribution
silu
236 (21%)
(relu+0.04·sq)×gelu
150 (13%)
0.85·silu+0.15·id
100 (9%)
relu
75 (7%)
relu+0.25·sq
75 (7%)
(relu+0.22·sq)×gelu
60 (5%)
gelu
50 (4%)
0.85·silu+0.15·id+0.21·id
50 (4%)
id
25 (2%)
sq
25 (2%)
(relu+0.25·sq)×gelu
25 (2%)
relu+0.22·sq
25 (2%)
silu+0.23·gelu
25 (2%)
(relu+0.07·sq)×gelu
25 (2%)
silu×gelu
25 (2%)
0.85·silu+0.15·id+0.24·gelu
25 (2%)
(relu+0.07·sq)×silu
25 (2%)
(0.85·silu+0.15·id)×gelu
25 (2%)
0.98·silu+0.15·id+0.21·id
25 (2%)
(relu+0.04·relu)×gelu
25 (2%)
0.98·sq+0.15·id+0.21·id
25 (2%)
gelu+0.22·sq
15 (1%)
Oscillation & Heat Capacity
Activation Evolution Radial
Symbio Config
{
"cusumSensitivity": 4,
"cusumBaselineWindow": 5,
"metricsInterval": 10,
"trackWeightEntropy": true,
"trackEffectiveRank": true,
"trackFreeEnergy": true,
"trackMIProfiles": false,
"trackPopulationMetrics": true,
"freeEnergyBeta": 0.01,
"miNumBins": 30,
"adaptiveBatch": false,
"batchMin": 8,
"batchMax": 64,
"batchStep": 4,
"calmStepsBeforeRestore": 200,
"populationAdaptation": true,
"populationScaleMin": 0.5,
"populationScaleMax": 2,
"populationScaleStep": 0.125,
"populationAdaptationCooldown": 10,
"mutationRateMin": 0.2,
"mutationRateMax": 0.95,
"fitnessAlpha": 1,
"complexityMode": "entropy",
"diversityBonus": 0.1,
"diversityDecay": "cosine",
"searchMode": "composed-activation-search",
"activationPool": [
"gelu",
"relu",
"silu",
"swiglu",
"universal",
"kan_spline"
],
"searchStrategy": "evolutionary",
"populationSize": 8,
"generations": 250,
"selectionStrategy": "topk",
"tournamentK": 3,
"mutationRate": 0.7,
"stepsPerCandidate": 25,
"rankBy": "valLoss",
"perfWeight": 0,
"stabilityWeight": 0,
"preserveWeightsAcrossCandidates": true,
"carryOptimizerStateAcrossCandidates": true,
"constantFfnDimAcrossCandidates": true,
"fuseWeightsEachStep": true,
"fusionShadowEma": 0.02,
"fusionBaseStrength": 0.0015,
"fusionMaxStrength": 0.02,
"kuramotoCoupling": 0.7,
"kuramotoDt": 0.1,
"kuramotoDamping": 0.05,
"writeReport": true,
"writeCandidates": true,
"writeSummary": true,
"basisPool": [
"silu",
"relu",
"gelu",
"identity",
"square"
],
"maxGraphDepth": 4,
"maxGraphNodes": 10
}Checkpoints (0) ?
No checkpoints saved
Sample Generations (3)
#CheckpointPrompt (preview)Generated
1-<|user|> Hello, how are you? <|assistant|>1h ago
Prompt
<|user|> Hello, how are you? <|assistant|>
Output
<|user|> Hello, how are you? <|assistant|>d. <|user|> at. <|assistant|> of a their the ygand st. If a justice a neither and a ed with with es the is balance between to sucalcul; perhaps and stic tesest: s unan ’s to t, to sts not through s
2-<|user|> What do you like to do for fun? <|assistant|>1h ago
Prompt
<|user|> What do you like to do for fun? <|assistant|>
Output
<|user|> What do you like to do for fun? <|assistant|>—and stwhere to to the , and dds est is a in a esuch that as —virtue noblt wine that esed by , men to patisies the with dfor a sted by y, th’s not of for ; of to a
3-<|user|> Tell me about yourself. <|assistant|>1h ago
Prompt
<|user|> Tell me about yourself. <|assistant|>
Output
<|user|> Tell me about yourself. <|assistant|>we ing not dand the snquis the without a , not initfored a it the mthe as of the to cminowith tation —be ly ens in the to which that a ed by that it men that that es it is a t
{
"vocabSize": 4000,
"blockSize": 512,
"nLayer": 8,
"nEmbd": 384,
"nHead": 8,
"dropout": 0.1,
"ffnActivation": "swiglu",
"ffnDim": 1024
}{
"iters": 50000,
"batchSize": 20,
"lr": 0.00005,
"lrMin": 0.000005,
"warmupIters": 1000,
"beta1": 0.9,
"beta2": 0.95,
"eps": 0.000001,
"weightDecay": 0.1,
"gradClip": 1,
"evalInterval": 500,
"evalIters": 10,
"seed": 42,
"backend": "helios",
"tokenizer": "bpe-4k",
"optimizer": "adamw",
"logLevel": "info",
"trace": false,
"gradAccumSteps": 1,
"sampleInterval": 300,
"spikeThreshold": 10,
"syncEvery": 1,
"gcEvery": 0,
"packed": false,
"symbio": true,
"symbioConfig": {
"cusumSensitivity": 4,
"cusumBaselineWindow": 5,
"metricsInterval": 10,
"trackWeightEntropy": true,
"trackEffectiveRank": true,
"trackFreeEnergy": true,
"trackMIProfiles": false,
"trackPopulationMetrics": true,
"freeEnergyBeta": 0.01,
"miNumBins": 30,
"adaptiveBatch": false,
"batchMin": 8,
"batchMax": 64,
"batchStep": 4,
"calmStepsBeforeRestore": 200,
"populationAdaptation": true,
"populationScaleMin": 0.5,
"populationScaleMax": 2,
"populationScaleStep": 0.125,
"populationAdaptationCooldown": 10,
"mutationRateMin": 0.2,
"mutationRateMax": 0.95,
"fitnessAlpha": 1,
"complexityMode": "entropy",
"diversityBonus": 0.1,
"diversityDecay": "cosine",
"searchMode": "composed-activation-search",
"activationPool": [
"gelu",
"relu",
"silu",
"swiglu",
"universal",
"kan_spline"
],
"searchStrategy": "evolutionary",
"populationSize": 8,
"generations": 250,
"selectionStrategy": "topk",
"tournamentK": 3,
"mutationRate": 0.7,
"stepsPerCandidate": 25,
"rankBy": "valLoss",
"perfWeight": 0,
"stabilityWeight": 0,
"preserveWeightsAcrossCandidates": true,
"carryOptimizerStateAcrossCandidates": true,
"constantFfnDimAcrossCandidates": true,
"fuseWeightsEachStep": true,
"fusionShadowEma": 0.02,
"fusionBaseStrength": 0.0015,
"fusionMaxStrength": 0.02,
"kuramotoCoupling": 0.7,
"kuramotoDt": 0.1,
"kuramotoDamping": 0.05,
"writeReport": true,
"writeCandidates": true,
"writeSummary": true,
"basisPool": [
"silu",
"relu",
"gelu",
"identity",
"square"
],
"maxGraphDepth": 4,
"maxGraphNodes": 10
}
}