# Reinforcement Learning - Experiment with Simple Bandit Learning Algorithms

**Technologies:** Python  

## Objective
To experiment with various simple bandit algorithms and compare their performance on accumulated rewards and optimal action selection frequency.

## Key Contributions
- Implemented epsilon-greedy, optimistic initialization, and gradient bandit algorithms.
- Conducted experiments on stationary bandit problems, analyzing each algorithm's performance under different conditions.
- Provided insights on exploration-exploitation trade-offs and optimal policy learning.

## Outcome
The epsilon-greedy algorithm with a low epsilon value (0.05) emerged as the best performer, balancing exploration and exploitation effectively.



[GitHub Repository for Reinforcement-Learning-Experiment-With-Simple-Bandit-Learning-Algorithms](https://github.com/mrw-soumik/Reinforcement-Learning-Experiment-With-Simple-Bandit-Learning-Algorithms/tree/main)
