An empirical investigation into the Intrinsic Curiosity Module (Pathak et al., 2017 · arXiv · alphaxiv) on the game of Snake — when does curiosity actually help, and when does it backfire?
We ran a 2 × 3 × 2 × 3 grid of experiments (algorithm × reward mode × board × seed) and got sharper answers than the standard ICM story predicts:
| Question | Answer | Evidence |
|---|---|---|
| Does ICM help DQN with dense reward? | No | |Δ| < 0.05 across boards |
| Does ICM rescue DQN under sparse reward? | No | |Δ| < 0.15 across sparse and pure_sparse |
| Does dense shaping matter for DQN? | Yes | ~15% score drop without it |
| Does ICM help PPO under sparse reward? | Yes | +24% on pure_sparse 10×10 (6.63 vs 5.36, n=3) |
| Does ICM increase state coverage? | No | PPO and PPO+ICM both reach ~98.9% of the state space |
The mechanistic takeaway. ICM is not a free upgrade across algorithms. DQN’s replay buffer dilutes the intrinsic signal across stale transitions; PPO consumes it fresh on every rollout. And even where ICM helps (PPO pure_sparse), it does not help by expanding coverage — both agents saturate at ~98.9%. ICM’s actual contribution is per-step reward densification: turning a sparse +1-per-food signal into a continuously non-zero novelty signal that PPO’s advantage estimator can credit-assign over. It behaves less like an exploration bonus and more like a self-supervised replacement for hand-engineered reward shaping.
Curiosity is not universally helpful — it is highly dependent on the reward structure of the environment.
The full investigation lives in notebooks/curiosity.py — an interactive marimo notebook with live demos, real training logs, and the complete experimental results.
The notebook covers:
git clone https://github.com/Saheb/rl-snake.git
cd rl-snake
uv sync
uv run marimo edit notebooks/curiosity.py
| Script | What it does |
|---|---|
scripts/train_dqn.py |
DQN + PER + ICM — terminal mask fix at line 547 |
scripts/train_ppo.py |
PPO + ICM on 10×10 Snake |
scripts/parse_pilot_logs.py |
Parses pilot_logs/ into JSON (data is inlined in the notebook) |
utils/icm.py |
ICM module (encoder, inverse model, forward model) |
The scripts/ folder also contains many earlier experiments that preceded the ICM investigation: tabular Q-learning (train_tabular_q.py), REINFORCE (train_reinforce.py), A2C (train_a2c.py), PPO with curriculum and imitation learning (train_ppo_curriculum.py, train_ppo_with_demos.py), knowledge distillation from DQN to PPO (train_ppo_from_dqn.py), and ablation sweep runners (run_eta_sweep.sh, run_pilot_ablation.sh, run_sparsity_ablation.sh). These are preserved as a record of the full experimental history but are not the focus of the current notebook.
rl-snake/
├── notebooks/
│ └── curiosity.py # The main interactive investigation
├── scripts/ # Training scripts (DQN, PPO, ICM, and earlier experiments)
├── utils/
│ └── icm.py
├── pilot_logs/ # Raw training logs from the 2×3×2×3 experiment
├── assets/ # Charts and figures
├── docs/ # ⚠️ Outdated — see notebook instead
└── logs/ # Miscellaneous training output
Note: The GitHub Pages visualization and blog post document an earlier phase of the project (PPO curriculum learning) and are no longer current. The notebook is the up-to-date record of findings.
Based on “Curiosity-driven Exploration by Self-supervised Prediction”, Pathak et al., ICML 2017.