completed
overfit_test
351.6K parameter unknown model — bpe-8k tokenizer, 2L/64D/2H
Overview
351.6K
Parameters
6.7346
Final Loss
7.1221
Best Val Loss
841.0
Perplexity
64,000
Tokens Processed
0.2
Tokens/Param
4,797 tok/s
Avg Throughput
14s
Training Time
Training Progress500 / 500 steps (100.0%)
Loss reduced by 7.2% from initial 7.2568
Dataset & Training
Domainunknown
Tokenizerbpe-8k
Total Iterations500
Batch Size1
Context Length128 tokens
Tokens per Batch128
Dataset Passes~0
Effective Tokens64,000
Training Pipeline
Warmupsteps 1–5
Learning rate warmup — model weights adjusting to data distribution
Loss: 7.257 → 7.252Linear LR warmup, gradient clipping
Rapid Descentsteps 5–151
Steepest loss reduction — model learning primary patterns
Loss: 7.252 → 7.206Cosine LR schedule, AdamW optimization
Refinementsteps 151–351
Diminishing returns — model fine-tuning subtler patterns
Loss: 7.206 → 6.882Lower LR, gradient accumulation
Convergencesteps 351–500
Approaching minimum — model capacity saturation
Loss: 6.882 → 6.735Minimum LR, weight decay regularization
Training Metrics
Loss Curve
?
?
?
?
Smoothed Loss
Perplexity
Learning Rate
Gradient Norm
Throughput (tok/s)
Timing Breakdown
No Telemetry
Model Architecture
Model Configuration
ArchitectureGPT (decoder-only transformer)
Parameters351.6K
Layers2
Embedding Dim64
Attention Heads2
Head Dim32
FFN Dim256
FFN Activationgelu
Vocab Size1,392
Context Length128 tokens
Dropout0
Training Configuration
Optimizeradamw
Learning Rate0.0003
LR Min0
LR ScheduleCosine decay
Warmup Steps50
Batch Size1
Grad Accum Steps1
Effective Batch1
Grad Clip1
Weight Decay0.1
Backendhelios
Tokenizerbpe-8k
Seed42
Layer Structure
Token Embed
1,392×64
Pos Embed
128×64
Block 0
Attn+FFN
Block 1
Attn+FFN
LayerNorm
64
LM Head
64×1,392
Generated Samples
Step 0 — Mar 8, 2026 2:19 PM
Prompt: The
The reing prit, as asps in ing tand
Checkpoints
| Step | File | Size | Date |
|---|---|---|---|
| 500 | checkpoint-500.json | 3.4 MB | Mar 8, 2026 2:20 PM |
Chat with Model
Send a message to chat with this model
Generated Invalid Date Invalid Date — Alpha Training SystemConfig hash: