Twitter Viral

This app computes a comparable measure of virality for different (search-) terms that are contained in Tweets on Twitter. It monitors how the virality for those terms develops over time. You can schedule the import jobs to run every hour so that the app updates its input data hourly using the Twitter API. The charts in the Infographic show the virality (and related information), on a daily basis. You can configure the import jobs, in order to adjust the search terms (currently "MapR", "Hortonworks" and "Cloudera") to your specific needs.

Virality is based on a sliding window that monitors 7 days into the past. The size of that sliding window can be adjusted in the workbook. Virality is computed as the average number of Retweets of the Top N (N=5) retweeted Tweets within the sliding window. The parameter N can also be adjusted in the workbook.

After installing and running the app, the two time series charts aren't yet stable because the Twitter Search API only goes back up to 9 days in the past. Thus, there is not much history for the data yet. The import jobs are scheduled to automatically trigger each hour to retrieve more data. After a few days having this app running, the charts become stable and results are valid and can be interpreted.

Infographic

The Infographic shows:

  • a comparison of the virality of Tweets of three different search terms over time.
  • the total number of Tweets (including retweets) during the last week for the particular search terms.
  • the total number of Tweets over time. (based on the sliding window)
  • the top N (N=5) Tweets (that were most often retweeted) during the last week along with their retweet count.
  • a wordcloud (excluding stop words) for these top N retweeted tweets.

Import and Connection

There are three import jobs, one for each search term. These import jobs retrieve the Tweets using the Twitter Search API. See https://dev.twitter.com/docs/using-search for more information.
Another import job imports a static list of stopwords (like "the", "at", "in", etc.) from an S3 bucket, that are used to compute a wordcloud (filtering out those stopwords) in the "ViralityComparison" workbook.

Workbooks

Preparation Workbooks

The three "ViralityInput" workbooks take the data from the import jobs as input and prepare it for the final analysis in the "ViralityComparison" workbook. This preparation includes

  • removing all RT prefixes and t.co URLs (to identify tweets that have the same origin), to be able to count the number of retweets for an original tweet. Note that one original URL in a Tweet can have different t.co URLs in its Retweets (that's why we remove those).
  • creating a sliding window (by using the EXPAND_DATE_RANGE function) that makes each Tweet visible for one week. The implicit assumption here is, that a Tweet influences its virality for 7 days and does not have any effect thereafter. You can adjust the size of the sliding window by modifying the formula for the column "DateRangeEnd" on the sheet "UniqueTweets". Set the value from 604800000 milliseconds (=7 days) to a value that is correct for your needs.
  • counting the number of Retweets for each original Tweet, per day.

Also the "ViralityInput" workbooks prepare the data for some widgets in the infographic. This includes

  • the number of Tweets per day, over time.
  • the overall sum of Tweets during the last week.
  • the top N viral Tweets during the last week.

Analysis Workbook

The "ViralityComparison" workbook 

  • joins the data from the "ViralityInput" workbooks and prepares it to be plotted in the "Virality Over Time" widget and the "Number Of Tweets Over Time" widget in the infographic.
  • tokenizes the top viral Tweets and removes stopwords and computes the wordcounts to be ready for the wordcloud widget.