k-Nearest Neighbors From Scratch

  • Only using NumPy, I developed a k-Nearest Neighbors classifier to identify movies as Thriller / Comedy with 80% accuracy

Context

I completed this project during my freshman year at UC Berkeley, and it was my first ever introduction to machine learning classification!

Description

  • Developed a k-Nearest Neighbors classifier to classify movies as Thriller / Comedy

  • Utilized a Bag of Words model based on movie scripts

  • Selected 20 discriminative features out of 5000+ by identifying words with higher frequency differences between genres, improving efficient computation

  • Trained the model on 283 data points and evaluated performance on 50 test points, achieving 80% accuracy.

Tools

  • Python
  • NumPy

Skills

  • Machine Learning
  • Data Science Lifecycle
  • Classification
  • Feature Engineering