Interactive Reinforcement Learning

Dino RL Sandbox

Train a tiny dinosaur in a grid world. Place food, lava, and rocks, then watch Q-learning discover safe, efficient paths.

Iteration0
Episode0
Last Reward0
Value (green good, red bad) Food (+10) Lava (-10, terminal) Rock (-2) Start

Guided Journey

Step 1: Learn to Move

Watch the agent explore the grid with no goal yet.

Tip: You can still place objects manually. Apply Step resets the environment.

Core Controls

Place Objects

Click the grid to place items. Use Erase to remove. Start is where each episode resets.

What This Demonstrates

Maximize Reward

The agent learns to reach food while avoiding dangerous or costly tiles.

Explore vs Exploit

With ε-greedy strategy, it sometimes tries new actions to avoid local optima.

Dynamic Environments

When the world changes, the agent keeps learning rather than memorizing one path.