- Initially, our dataset was in json format, containing 2.7 M reviews, 86K businesses, 566K business attributes, e.g., hours, parking availability, ambience and aggregated check-ins over time for each of theĀ 86KĀ businesses.
- Saved data into Cassandra tables.
Business
Attributes
- business_id(PRIMARY KEY)
- name
- city
- state
- full_address
- latitude
- longitude
- stars
- review_count
- categories
Review
Attributes
- business_id
- user_id
- stars
- text
- date
- PRIMARY KEY(business_id,user_id)
User
Attributes
- user_id(PRIMARY KEY)
- name
- review_count
- average_stars
- yelping_since
- fans
Checkin
Attributes
- business_id(PRIMARY KEY)
- Checkin_info