The original training dataset used for this model was scraped from
SkyTrax and we have used an updated version that can be found
here. From a total of 20 columns in this csv file, we have chosen 'content' and 'recommended' columns as our reviews and labels respectively for training the model.
For testing, we have streamed the data through Twitter API for 2 weeks and gathered about 5GB of data in json format consisting about 410,000 tweets for 24 Airlines.