# Experiment with Learning Policies on Simple Reinforcement Policies

**Technologies:** Python  

## Objective
To explore and implement different learning methods for estimating value functions and determining optimal policies in a simple grid world environment.

## Key Contributions
- Implemented SARSA and Q-Learning algorithms for policy learning in grid navigation.
- Used Monte Carlo and TD(0) methods to evaluate the effectiveness of different policies in value estimation.
- Analyzed the results to identify which algorithms performed best under specific conditions.

## Outcome
The Q-Learning algorithm showed faster convergence to the optimal policy, while SARSA demonstrated safer and more conservative learning, providing insights into the trade-offs between different RL algorithms.
