RL-Journey
Main Site

My Reinforcement Learning Journey

Interactive demonstrations, research notes, and experiments as I move from theory to deployed RL systems.

April 2025 3 min read
Reinforcement Learning Policy Gradients Gridworld Interactive Demo

Introduction

A living lab notebook for everything I'm learning about RL — from Bellman backups to curiosity-driven exploration.

I'm documenting the path from foundational algorithms to production-grade RL agents. It starts with grid worlds and value iteration, then scales to policy gradients, model-based insights, and curiosity-driven exploration. Each experiment emphasizes intuition, visualization, and reproducibility.

Learning Resources

Hands-on tools that help me ground mathematical ideas in interactive intuition:

Each resource links to code, notes, and follow-up experiments so the journey remains transparent and replicable.

Course Projects (Hugging Face Deep RL)

Assignments completed during the Hugging Face Deep RL course, tuned and annotated with post-course insights.

Unable to load coursework models right now.

Gridworld Navigation

A sandbox for testing intuition around value propagation, exploration, and sample efficiency.

Deep Q-Network in Gridworld

This environment drops an agent into a stochastic grid with moving goals and fixed walls. The DQN learns to balance exploration and exploitation while tracking long-horizon rewards.

  • Deep Q-Network (DQN) with experience replay and target network synchronization.
  • Epsilon scheduling that starts greedy and decays toward focused exploitation.
  • Reward shaping to encourage faster convergence without destabilizing learning.
Loading trajectory data...
Episode
0/100
Steps
0
Reward
0

Implementation Details

The current DQN configuration:

  • Learning rate: 1e-3 with Adam optimizer
  • Discount factor (γ): 0.99
  • Epsilon schedule: 1.0 → 0.01 with 0.997 decay
  • Reward shape: -0.01 per step, +1.0 for reaching the goal
  • Network: two-layer MLP (64 units each, ReLU)
  • Batch size: 64 sampled from replay buffer

Upcoming experiments: prioritised replay, double Q-learning, and distributional value heads for richer uncertainty estimates.

Next Experiments

A roadmap of environments and papers I'm excited to implement next.

I'll continue to publish checkpoints and write-ups as results mature. Suggestions are welcome — reach out if there's an environment or paper you'd like to see replicated.