The data was gathered from yelp dataset challenge
Data was initially in json format, out of which only the necessary fields were stored into cassandra tables. They had information of all the businesses, which had to be converted and transformed into vectors for various machine learning algorithms.
Problem involved building a user profile by sentimental analysis of his tweets and using collaborative filtering to get results similar to his taste. Next step was to filter the results based upon his current location and day.
Algorithms used such as TF.IDF, Baye's Theorem, ALS.
Parallelization techniques used such as broadcast, cache and collaborating filtering takes into account 2.7M reviews utilising the cluster efficiently for all computation.
The UI has been implemented with a website based on HTML5, Bootstrap and CSS inorder to visualize the results and showcase information regarding our project.
The visualization has been implemented using Tableau by connecting it to Spark Cassandra dynamically and resulting charts were integrated with the HTML website.
Some of the new technologies learnt as part of this project are Spark MLlib, Tableau, HTML5 and Bootstrap CSS.