Wikipedia Movie Similarities

Description

This is a machine learning exploration for:

Finding of similar movies based on their Wikipdia abstracts
Automatic determination of common topics based on their abstracts
107320 Wikipedia movies were analyzed containing 195317 words in total

Learning Proceure

The learning of movie similarities and topics done in Python. The process included few steps:

The abstracts for all Wikipedia moveis were extracted
TF-IDF was calculated for all movies. The total number of words in the vocabulary is 195317
Apply K Nearest Neighbor (KNN) with Cosine distance on each movie to find its similar movies
Topic extraction using Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) for 100 topics
Add UI to visualize the results - all results loaded on start to perform fast manipulations (it takes few sec to load)

As in any machine learning, there are some errors. Learning results kept as is, no attempt of fixing errors. For more details on each step, please take a look here

How to use

After the page is loaded, search for a movie or topics. After finding the movie in interest, click on the table row to find similar movies. Each similar movie include a similarity level indicating how similar this movie to the selected one. Clicking on the movie link will open the Wikipedia page for that movie.

Warning:

Movies

Loading Movies (please wait)

Similar Movies

Wikipedia Movie Similarities and Topic Extraction

Description

Learning Proceure

How to use