Socialify

Folder ..

Viewing README.md
32 lines (23 loc) • 1.4 KB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
### EAS 4/587 -- Data Intensive Computing

1. **Motivation**

    1. Reddit is a social media website where users can post links to
  articles, images, videos, etc. and other users can comment on them.
  Authors of the posts, generally look to drive maximum engagement from
  their posts.

    2. Unlike other social media websites, Reddit has a unique feature of
  upvoting and downvoting the posts. Also, since the posts are publicly
  visible, factors like the time of posting, the number of upvotes, the
  number of comments, etc. matter a lot.

    3. Since there are a lot of posts being made every minute, significant
  posts can get lost in the crowd. Also, the posts that are made at a
  particular time of the day, may not be visible to the users who are
  active at a different time of the day.

2. **Problem Statement**

    1. Fetch the data using the Reddit Developer API from different
    programming related subreddits (communities). Since, there are a lot
    of subreddits on Reddit; we will keep the scope of the project
    limited.

    2. Analyze the data and find relevant insights after cleaning and
  preprocessing the data.
    
    3. Build a model to predict the engagement a post will likely receive, given the time of posting, the number of upvotes, the number of comments, and other factors.

Further report is available in the [report](./Report/report.pdf) folder.