News

Cool R tools: Packrat and RMarkdown beyond the knit button
The August meetup was hosted at Ann Arbor SPARK (announcement slides).
Clayton Yochum gave a talk titled Unusual uses for RMarkdown: Beyond the Knit button and Andrew Moore gave a talk titled Package dependencies in R: why you should be using Packrat.
Clayton Yochum is a Data Analyst/Engineer at Methods Consultants in Ypsilanti and the University of Michigan School of Social Work’s Child and Adolescent DataLab. He has a background in mathematics and business intelligence.
Andrew Moore is a Data Analyst in the School of Social Work at the University of Michigan, working with the Child & Adolescent Data Lab. Andrew has a BA in Psychology from University of Michigan and previously worked as a research technician at both UM’s Institute for Social Research and in UM’s psychology department.

GNU Make as a tool for reproducible data analysis
The July meetup was hosted at Ann Arbor SPARK (announcement slides).
Michelle Berry gave a lightning talk titled “Updates from the tidyverse and highlights from useR!” which will give a brief overview of some of the exciting new R developments showcased at the useR conference (slides).
Pat Schloss, PhD gave our main talk on using GNU Make as a tool for reproducible data analysis (repository). Makefiles are typically used for generating executables from a program’s source files, but the same technology can be used to automate and reproduce an analysis workflow. Pat’s talk covered the motivation for using Make and guide attendees through a tutorial of how to construct a makefile for an analysis (e.g., one in R).
Pat Schloss is an associate professor at University of Michigan in Microbiology and Immunology. His research combines wet lab and computational methods to investigate links between the gut microbiota and disease. He is the developer of the mothur software for analyzing microbial short read sequence data.

Beginner Workershop
We held our second Beginner R Workshop at Ann Arbor SPARK on Saturday May 21st and Sunday April 22nd.
The workshop gave an introduction to data types, data frames, packages & R Markdown, data manipulation with the dplyr package, data visualization with the ggplot2 package, and data analysis.
Materials from the workshop are available (here).

Social
Our May meetup was a social meetup — there were no presentations this month, just a chance to hang out and have fun.

Super Learner
The April meetup was hosted at Ann Arbor SPARK (annoucement slides). Scott Grey gave a brief introduction to SuperLearner package, an ensemble prediction algorithm developed by Eric Polley at the National Cancer Institute. SuperLearner allows the use many algorithms to generate an ideal prediction function that is a weighted average of all the algorithms considered. The SuperLearner package permits the integration of dozens of prespecified potential algorithms found in other R packages as well as ‘wrappers’ that provide the user with the ability to include their own algorithms (slides).
Scott Grey is a senior statistician in the University of Michigan’s Division of Cardiology where he analyzes data from registries of percutaneous interventions. He received his Masters and PhD from Case Western Reserve University.
This presentation is an “informal part two” to another presentation from Scott, Integrating machine learning into causal inference: the Targeted Maximum Likelihood Estimation approach offered by the Ann Arbor ASA Chapter on Tuesday, April 12.

Beginner Workershop
We held our first Beginner R Workshop at Ann Arbor SPARK on Saturday April 2nd and Sunday April 3rd.
The workshop gave an introduction to data types, data frames, packages & R Markdown, data manipulation with the dplyr package, data visualization with the ggplot2 package, and data analysis.
Materials from the workshop are available (here).

Docker+R and Outlier and Change Point Detection
The March meetup was hosted at Ann Arbor SPARK (annoucement slides). Kraig Stevenson gave a lightning talk on outlier and change point detection using Twitter’s R packages AnomalyDetection and BreakoutDetection (slides) and Blake Nicholson gave a presentation on using R with Docker, a standard interface for creation, distribution and deployment of software (slides).
Docker has taken the software engineering world by storm. It provides a lightweight, isolated environment in which programs can be run. It has broad applications across all aspects of software engineering. In this talk, Blake provided background on Docker, its architecture, and how to use it. He went through several examples of using R within a Docker container and the advantages such a setup provides.
Blake Nicholson leads the Data Science and Software Engineering teams at Seelio. He has applied quantitative methods and software development in numerous business settings throughout his career. His experience ranges from optimizing the size and structure of sales teams to leveraging analytical methods to find, grow, and keep employees at Facebook.

Defensive Programming and Debugging in R
The Februrary meetup was hosted at Ann Arbor SPARK (big thanks to Travis Linderman!); check out the announcement slides for the latest group updates. There were two lightning talks, one on web scraping and the other on deep learning, followed by a longer presentation on programming and debugging.
The lightning talks were:

“Deep Learning with H20” (code) presented by Brandon Stange

“Scraping Web Content Using rvest” (slides) presented by Jeff Shane, and
For our main presentation Reino Bruner gave a talk on Defensive Programming and Debugging in R (slides).
Reino Bruner is an R statistical programmer and data scientist with Equastat. His academic background is in experimental psychology, physics, and statistics. He has experience in many domains including, but not limited to: experimental design, assessment design & validation, statistical modeling, machine learning, model automation, causal analysis, and critical thinking. His research experience and interests include Bayesian Hamiltonian Monte Carlo analysis, neuronetwork deep learning, nonparametric and robust statistics, complex system simulations, machine learning, and forecasting.


Lightning Talks and Creating Powerful Yet UserFriendly Analytics Applications with RShiny
At our January meetup we had several short “lightning” talks followed by a presentation from Jeremy Albright.
The lightning talks were:

“spotifyr: a wrapper for Spotify’s API in R” (slides)  Rob Weyant

“Shiny WiFi Dashboard: Displaying information about the current WiFi Status” (slides)  Ellis Valentiner

“A Modern R Styleguide: Working towards a sensible, current style for R code” (slides)  Clayton Yochum
Jeremy Albright is the Founder and CEO of Methods, an Ypsilantibased data science and statistical consulting firm. He’s been using R since 2001 and enjoying it since 2005. He uses R to develop applications to perform predictive modeling, text analytics, and data visualizations. In his misguided youth he also earned a doctorate in Political Science and still occasionally dabbles in academic research, both in the social sciences and biostatistics, and he teaches statistics in the Psychology department at EMU.
RShiny makes it easy to develop applications with highly intuitive graphical user interfaces that also run highly sophisticated analytics in the background. Jeremy’s presentation discussed two realworld applications: one app that creates and analyzes multiply imputed data sets, and another that uses a neural network to predict stock market performance on the basis of daily data feeds. These examples show how it is possible to provide clients with true business intelligence options beyond the simplistic dashboards that dominate the market. The presentation will also discuss making RShiny applications available via the cloud (.pptx slides).


End of Year Party
Our December meetup was an end of year celebration — there were no presentations this month, just a chance to hang out and have fun.

Grammar of Graphics, rJava & Rcpp
At our November meetup Brian Perron gave a talk about the Grammar of Graphics and how that relates to the ggplot2 package, followed by Dan Pressel talking about the rJava and Rcpp packages.
Brian Perron is an Associate Professor at the UMSchool of Social Work. He selfidentifies as a geek and loves data. His talk was a crash course on the underlying conceptual framework of the grammar of graphics. Understanding the grammar is essential to creating effective data visualizations with ggplot, without spending countless hours pouring through StackOverflow posts (slides).
Daniel Pressel is Chief Science Officer at Digital Roots, where he works on problems in Natural Language Processing and Machine Learning. Occasionally when using R, we want to access to existing software written in other programming languages. In this talk Dan described how to interface with two very popular languages – C++ and Java – from within R. He demonstrated how to build a package with Rcpp using native code to improve execution performance and how to use rJava, to build something powerful with very little development effort (slides).

Propensity Score Matching and ggvis
Alex Lishinski presented on Propensity Score Matching in R (slides). Dave Childers talked about how to use ggvis to create interactive graphics (slides).

caret package, and Text mining with R and its application to song lyrics
In September we had two presentations: Mochan Shrestha presented on the caret package, a set of functions that attempt to streamline the process for creating predictive models; and Kraig Stevenson gave an introduction to text mining with R and its application to song lyrics.
Mochan Shrestha is a software engineer with a background in computer science (PhD, Wayne State University) and mathematics (MS, Michigan State University). In his talk, Mochan demonstrated how to perform predictive modeling using the caret package. He covered applications for classification and regression (slides).
Kraig Stevenson has always been interested in what data tells us about the real world. His educational path has taken several twists and turns, from a BS in Statistics and MS in Biostatistics, to a PhD in Bioinformatics, during which he worked with nextgeneration sequencing data to better understand the regulation of gene expression. After his PhD he began his career as a Data Scientist at Trinity Health. In his talk, Kraig introduced concepts of text mining related to song lyrics. Kraig developed a R wrapper for the musiXmatch API to extract lyrics from a particular artist and apply general textmining techniques to better understand the content in a corpus of text (slides).

RStan: Statistical Modeling Made Easy with Bob Carpenter
In August we were visted by special guest Dr. Bob Carpenter, a research scientist in computational statistics (Columbia University). He designed the Stan probabilistic programming language and is one of the Stan core developers.
Bob has a Ph.D. in cognitive and computer science (University of Edinburgh), worked as a professor of computational linguistics (Carnegie Mellon University), an industrial researcher and programmer in speech recognition and natural language processing (Bell Labs, SpeechWorks, LingPipe).
Bob introduced Stan, a new language for expressing statistical models with support for full Bayesian inference via sampling (Hamiltonian Monte Carlo) and maximum likelihood estimation via optimization (LBFGS) and curvature (higherorder autodiff). He began with an introduction to Bayesian modeling and how it preserves estimation uncertainty in posterior inference, focusing on event probability estimation, decision theory, and outofsample prediction. He also introduced Stan’s modeling language based on examples, working up from the very basics (Bernoulli model of binary outcomes, simple linear regression), through our bread and butter algorithms (multilevel generalized linear models, aka mixed effects), and finishing with some advanced model examples (Gaussian processes, diff eq physical system models). Bob provided a highlevel overview of the back end, touching on how models get translated to C++, how automatic differentiation allows us to compute gradients up to machine precision, how variables are transformed to unconstrained support on the real numbers, and how our adaptive Hamiltonian Monte Carlo works using animations. Bob also introduced ShinyStan, an interactive posterior visualization tool, and describe the latest edition to Stan, blackbox variational inference. Finally he wrapped up with an overview of what we have in store over the next couple of years (PDF slides)..

magrittr, Package Development and Association Rule Learning
Rob Weyant showed us the magrittr package (slides), Jeff Shane demonstrated package development (slides), and Brandon Stange gave a talk on association rule learning (slides).

Interactive Web Applications in R with Shiny
Clayton Yochum walked us through the steps to create, iterate on, and host a simple web application (slides).

R Markdown and Support Vector Machines
Marian Schmidt gave an introduction to using R Markdown (slides) and Michelle Berry gave an overview of support vector machines (slides).

Classification: Regularized logistic regression, Trees and random forests
Devin Riley demonstrated binary classification using regularized logistic regression (slides) and Ellis Valentiner gave an introduction to decision trees and random forests (slides).

Data Exploration
Kun Deng gave an introduction to the tidyr and dplyr packages (slides), Haad Khan showed us how to perform exploratory data analysis and visualization using ggplot2 (slides), and Blake Nicholson walked us through the basics of principal component analysis and exploratory factor analysis (slides).