Mastering Snake with Reinforcement Learning, scaling from a 5x5 grid to a 10x10 board using Proximal Policy Optimization (PPO) and Curriculum Learning.
Our final PPO agent navigating a 10x10 board after completing the “Zero to Hero” curriculum.
Watch the agent evolve through 9 distinct stages of learning. See the difference between basic Tabular Q-Learning and high-performance Deep RL.
A step-by-step Jupyter Notebook that takes you from the absolute basics of RL to advanced scaling strategies.
Read the full story of how we bypassed the “Sparse Reward” trap on 10x10 boards using Imitation Learning and Curriculums.
| Phase | Strategy | Board | Max Score |
|---|---|---|---|
| Phase 0 | Tabular Q-Learning | 5x5 | 24 (Perfect) |
| Phase 1 | Double Q-Learning | 5x5 | 24 (Stable) |
| Phase 2 | Imitation Learning | 8x8 | 46 (Skilled) |
| Phase 3 | Final PPO Master | 10x10 | 64 (Master) |
This project uses uv for lightning-fast dependency management.
# Clone the repository
git clone https://github.com/Saheb/rl-snake.git
cd rl-snake
# Sync dependencies
uv sync
# Or using standard pip
pip install -r requirements.txt
scripts/train_tabular_q.py: Train baseline tabular agents.scripts/train_ppo_curriculum.py: The core curriculum pipeline (5x5 → 8x8 → 10x10).utils/visualize_journey.py: Records your own agents and generates the HTML visualization.We don’t just provide code; we explain the why:
Developed using Antigravity AI