completed

overfit_test

351.6K parameter unknown model — bpe-8k tokenizer, 2L/64D/2H

Full run details Open in Chat

Overview

351.6K

Parameters

6.7346

Final Loss

7.1221

Best Val Loss

841.0

Perplexity

64,000

Tokens Processed

0.2

Tokens/Param

4,797 tok/s

Avg Throughput

14s

Training Time

Training Progress500 / 500 steps (100.0%)

Loss reduced by 7.2% from initial 7.2568

Dataset & Training

Domainunknown

Tokenizerbpe-8k

Total Iterations500

Batch Size1

Context Length128 tokens

Tokens per Batch128

Dataset Passes~0

Effective Tokens64,000

Training Pipeline

Warmupsteps 1–5

Learning rate warmup — model weights adjusting to data distribution

Loss: 7.257 → 7.252Linear LR warmup, gradient clipping

Rapid Descentsteps 5–151

Steepest loss reduction — model learning primary patterns

Loss: 7.252 → 7.206Cosine LR schedule, AdamW optimization

Refinementsteps 151–351

Diminishing returns — model fine-tuning subtler patterns

Loss: 7.206 → 6.882Lower LR, gradient accumulation

Convergencesteps 351–500

Approaching minimum — model capacity saturation

Loss: 6.882 → 6.735Minimum LR, weight decay regularization

Training Metrics

Loss Curve

Smoothed Loss

Perplexity

Learning Rate

Gradient Norm

Throughput (tok/s)

Timing Breakdown

No Telemetry

Model Architecture

Model Configuration

ArchitectureGPT (decoder-only transformer)

Parameters351.6K

Layers2

Embedding Dim64

Attention Heads2

Head Dim32

FFN Dim256

FFN Activationgelu

Vocab Size1,392

Context Length128 tokens

Dropout0

Training Configuration

Optimizeradamw

Learning Rate0.0003

LR Min0

LR ScheduleCosine decay

Warmup Steps50

Batch Size1

Grad Accum Steps1

Effective Batch1

Grad Clip1

Weight Decay0.1

Backendhelios

Tokenizerbpe-8k

Seed42

Layer Structure

Token Embed

1,392×64

Pos Embed

128×64

Block 0

Attn+FFN

Block 1

Attn+FFN

LayerNorm

LM Head

64×1,392

Generated Samples

Step 0 — Mar 8, 2026 2:19 PM

Prompt: The

The reing prit, as asps in ing tand

Checkpoints

Step	File	Size	Date
500	checkpoint-500.json	3.4 MB	Mar 8, 2026 2:20 PM

Chat with Model

Send a message to chat with this model

Generated Invalid Date Invalid Date — Alpha Training SystemConfig hash: