Category Archives: R

There are a number of existing tutorials that outline how to deploy an R Shiny App with Docker. R is great programming language for performing statistical analysis – the community is rich with packages that are finding new uses every day. Shiny is a web frame work for R – you can create a UI for end users to interact with your code in a web browser. Docker is a platform that can be used to deliver software packages (and dependencies) consistently across different machines. The benefit is that you can have another computer with docker run/reproduce your code without it breaking. Our use case here is having our R code that runs locally also run on a cloud web-server. I highly recommend checking out the documentation – for understanding the essential details. This guide will highlight the pitfalls and workarounds I ran into when setting up a shiny app…

Read more

This post outlines a framework for forecasting short term (i.e. daily tick data) directional movements of equity prices. The method used here relies on support vector machines and treats the system like a Markov Chain. Historical data is downloaded from stooq.com. This is not investment advice or recommendation of an investment strategy but provided for educational purposes only. The following code comes with no warranties whatsoever. The code which can be found in its entirety on GitHub, attempts to model the directional movement (i.e. above or below the previous close) of the closing price of a stock on the following variables: return of the equity at lag = 1 return of the equity at lag = 3 return of SPY at lag = 1 return of SPY at lag = 3 return of QQQ at lag = 1 return of QQQ at lag = 3 return of UVXY at lag…

Read more

This post overviews code for testing multiple variants using Bayesian statistics. It is especially useful when there is a defined action that can be classified as a success (i.e. a click or conversion) and when we have complete information about the number of trials (i.e. impressions). Full code can be found on GitHub. This code relies heavily on simulation produced by sampling beta distributions given parameters for trials and successes. No additional packages are necessary outside of base R. First we define helper functions that act as wrappers for the rbeta and qbeta functions in base R. RBetaWrapper will be used for sampling and QBetaWrapper will be used for calculating credibility intervals. Our MVTest function accepts number of simulated draws, a trial vector, a success vector, a vector of quantiles to be used in calculating credibility intervals, and an integer specifying how many digits credibility intervals should be rounded. The…

Read more

This post discusses how to use polynomial regression for digital advertising data. Polynomial regression can help us better understand the relationship between spend and impressions (or spend and clicks). This method can be particularly useful when looking at daily data with variability in daily spends. Models can be used to analyze, estimate, and benchmark performance of future campaigns. The full code can be found on GitHub. The code used here uses a second order polynomial function to allow for diminishing marginal returns. For impressions the function takes the form of: Or in the case of clicks: To run this code begin by importing the ggplot2, scales, and rio packages. First we define a function to fit a second order polynomial regression given two variables. This function also creates a ggplot object that maps a scatter plot of actual observations along with a regression line of predicted values. Next we define…

Read more

This post outlines a method for analyzing how often pages appear together in user journeys on your website. To do so we utilize R and the Google Analytics API. You can find the full code on GitHub. We will rely on the dplyr, plyr, GoogleAnalyticsR, rio, RcppAlgos, and data.table packages. First we must get our data from Google Analytics. We structure this data such that our dimensions include landingPagePath, secondPagePath, exitPagePath, and previousPagePath. Users will be our metric for these dimensions. Although this does not include all steps within a user journey, this provides a good base for inference for related pages at the beginning and end of user journeys. Once we have our data we need to make some changes before we can begin counting frequencies. First we must change the data from the wide format (i.e. 1 row for each unique path containing the number of users) to…

Read more

5/5