### DNA Sequence Classification via Data Encoding
**Technologies:** Python, scikit-learn, Gradient Boosting

#### Objective
Explored different data encoding methods to classify DNA sequences, aiming to enhance understanding in genomics research.

#### Project Contributions
This project was completed as part of a two-person team, focusing on testing various encoding methods for DNA sequence classification.

- **My Contributions:**
  - Conducted **data preprocessing** and analyzed DNA sequences for model evaluation.
  - Focused on implementing and optimizing **k-mer encoding** for sequence data to ensure accuracy.
  - Applied **Gradient Boosting Classifier** on k-mer encoded data to evaluate classification performance.
  - Documented the impact of k-mer encoding, identifying it as the most effective method.

- **Partner's Contributions:**
  - Implemented **alternative encoding methods** (binary and TF-IDF) for comparison.
  - Configured and ran the classifier on datasets with different encodings.
  - Conducted comparative analysis on model performance and assisted in documentation.

#### Outcome
**k-mer encoding** proved to be the most effective, achieving a mean average precision of **0.522** and an F1-score of **0.383**. These results demonstrate the potential of k-mer encoding for DNA sequence classification, providing valuable insights for further genomics research.

[GitHub Repository for DNA Sequence Classification via Data Encoding](https://github.com/mrw-soumik/DNA_Sequence_Classification_Encoding)
