Tweetdeck and Twitter Analytics Scraping with R

Tweetdeck and Twitter Analytics Scraping with R

Twitter offers tools to analyze tweet performance of your own accounts via twitter analytics and the accounts of others via tweetdeck. Unfortunately, the platforms have limited data export functionality; there is not a clean easy way to export data from the web user interface.

The method outlined in this post avoids Twitter API fees and is compliant with TOS if you download html files locally. You can download local html copies of the webpages found at analytics.twitter.com and tweetdeck.twitter.com that are associated with your account. The full code can be found on GitHub.

There are 4 other files that you’ll want to include in the same directory where you downloaded the html files:

  • twitter_scraping_css.json
  • twitter-analytics-main.r
  • twitter-tweetdeck-main.r
  • Helpers.R

twitter_scraping_css.json

This json file contains css selectors templates for targeting relevant structured data in twitter analytics and tweetdeck html files. The css selectors for tweetdeck are written generically with a %s parameter in the nth-child selector.

Helpers.R

In the Helpers.R file the function TweetDeckColumnSelector modifies the generic css selectors so that a specific tweetdeck column is targeted.

The function TweetDeckScrape takes a parsed tweetdeck html file and column specific css (the output of TweetDeckColumnSelector) as input arguments. It outputs a dataframe where each row contains information about date, author, number of replies, number of retweets, and number of favorites of each tweet in the tweetdeck column you targeted.

The function TwitterAnalyticsScrape acts very similar to TweetDeckScrape. The key differences are that it takes a parsed twitter analytics html file and the css in twitter_scraping_css.json does not require any modification.

twitter-tweetdeck-main.r

The file twitter-tweetdeck-main.r imports dependency packages and references the Helpers.R file. commandArgs is included so that the code may be run from the command line. The first argument passed is the tweetdeck column to target. All arguments after are references to the tweetdeck html files. Output is exported to a csv file.

twitter-analytics-main.r

The file twitter-analytics-main.r imports dependency packages and references the Helpers.R file. commandArgs is included so that the code may be run from the command line. Here all command line arguments are strings referencing the location of twitter analytics html files. Output is exported to a csv file.

In Action

  1. Download html file(s) from analytics.twitter.com
  2. Download html file from twitter.tweetdeck.com – observe that you can define fairly robust queries in a tweetdeck column
  3. Run code from tutorial
  4. Output is saved to csv

Tweetdeck – suppose that you saved your tweetdeck html file as tweetdeck-sample.html and that you’re interested in targeting data corresponding to the query in the first column:

Twitter Analytics – suppose that you saved your twitter analytics html file as twitter-analytics-sample.html:

With the csv output data you can analyze performance of any twitter account that you have credentials to and publicly available tweet performance for any tweetdeck query of your choice.

Leave a Reply

Your email address will not be published. Required fields are marked *