A
Alpha
completed

overfit_test

351.6K parameter unknown model — bpe-8k tokenizer, 2L/64D/2H

Overview

351.6K
Parameters
6.7346
Final Loss
7.1221
Best Val Loss
841.0
Perplexity
64,000
Tokens Processed
0.2
Tokens/Param
4,797 tok/s
Avg Throughput
14s
Training Time
Training Progress500 / 500 steps (100.0%)
Loss reduced by 7.2% from initial 7.2568

Dataset & Training

Domainunknown
Tokenizerbpe-8k
Total Iterations500
Batch Size1
Context Length128 tokens
Tokens per Batch128
Dataset Passes~0
Effective Tokens64,000

Training Pipeline

Warmupsteps 15

Learning rate warmup — model weights adjusting to data distribution

Loss: 7.2577.252Linear LR warmup, gradient clipping
Rapid Descentsteps 5151

Steepest loss reduction — model learning primary patterns

Loss: 7.2527.206Cosine LR schedule, AdamW optimization
Refinementsteps 151351

Diminishing returns — model fine-tuning subtler patterns

Loss: 7.2066.882Lower LR, gradient accumulation
Convergencesteps 351500

Approaching minimum — model capacity saturation

Loss: 6.8826.735Minimum LR, weight decay regularization

Training Metrics

Loss Curve
?
?
?
?
Smoothed Loss
Perplexity
Learning Rate
Gradient Norm
Throughput (tok/s)
Timing Breakdown
No Telemetry

Model Architecture

Model Configuration

ArchitectureGPT (decoder-only transformer)
Parameters351.6K
Layers2
Embedding Dim64
Attention Heads2
Head Dim32
FFN Dim256
FFN Activationgelu
Vocab Size1,392
Context Length128 tokens
Dropout0

Training Configuration

Optimizeradamw
Learning Rate0.0003
LR Min0
LR ScheduleCosine decay
Warmup Steps50
Batch Size1
Grad Accum Steps1
Effective Batch1
Grad Clip1
Weight Decay0.1
Backendhelios
Tokenizerbpe-8k
Seed42

Layer Structure

Token Embed
1,392×64
Pos Embed
128×64
Block 0
Attn+FFN
Block 1
Attn+FFN
LayerNorm
64
LM Head
64×1,392

Generated Samples

Step 0Mar 8, 2026 2:19 PM
Prompt: The
The reing prit, as asps in ing tand

Checkpoints

StepFileSizeDate
500checkpoint-500.json3.4 MBMar 8, 2026 2:20 PM

Chat with Model

Send a message to chat with this model
Generated Invalid Date Invalid Date — Alpha Training SystemConfig hash: