Interactive Reinforcement Learning

Dino RL Sandbox

Train a tiny dinosaur in a grid world. Place food, lava, and rocks, then watch Q-learning discover safe, efficient paths.

Iteration0

Episode0

Last Reward0

Value (green good, red bad) Food (+10) Lava (-10, terminal) Rock (-2) Start

Guided Journey

Step 1: Learn to Move

Watch the agent explore the grid with no goal yet.

Tip: You can still place objects manually. Apply Step resets the environment.

Grid Size Training Speed Alpha (Learning Rate) Gamma (Discount) Epsilon (Explore) Epsilon Decay Step Penalty Dynamic Environment

Click the grid to place items. Use Erase to remove. Start is where each episode resets.

The agent learns to reach food while avoiding dangerous or costly tiles.

With ε-greedy strategy, it sometimes tries new actions to avoid local optima.

When the world changes, the agent keeps learning rather than memorizing one path.