News
-
R + SAS, Environments for Annotations, & FiraCode
We fit 3 separate talks in this month:
-
R and SAS and R - Barry DeCicco: The R package knitr allows R to be used in conjunction with other software package, and not just as a way to produce better output. This presentation showed us how to use R/RStudio/knitr to call SAS and vice-versa. (slides)
-
Have Env – Will Travel: Colin Gross examined the use of R environments in the context of processing performance data. Software that generates tailored feedback summaries based on psychological theory needs to annotate performers based on their performance data. The annotations are tags such as “capability barrier” and “performance gap”. A hurdle for implementing these annotations is that each client has a different way of calculating them. Managing different annotation calculations can be accomplished by leveraging R environments. (slides)
-
FiraCode in RStudio: Kraig Stevenson demonstrated how to use the FiraCode font in RStudio, a code-centric ligature-enabled font that makes your R code even more beautiful. (slides)
-
-
Summarizing a Clinical Trial Database & Reproducible Data Wrangling
Peter Higgins gave two talks: Helping Excel Users Do Reproducible Data Wrangling with a Shiny App and Summarizing a Clinical Trial Database for a Good Looking but Un-Tidy Government Report with keyring, tidyr::complete, mutate(case_when), janitor::tabyl, and flextable. (PDF slides)
Dr. Peter Higgins is a professor of gastroenterology and hepatology at the University of Michigan. His research work is focused on inflammatory bowel diseases including Crohn’s disease and ulcerative colitis. His research projects include disease mechanisms and drug development, quality of life, interventions to improve care, epidemiology studies with large databases like Truven and Optum, and machine learning to identify early responders (vs. nonresponders) to IBD medications.
-
Tidymodels: A tidy framework for modeling in R
Clayton Yochum demonstrated the use of the recipes, rsample, parsnip, and yardstick packages to pre-process, fit, and evaluate models on the Titanic dataset. These tidymodels packages are the brainchild of Max Kuhn, author of the caret package and the book “Applied Predictive Modeling”. Kuhn joined RStudio last year with the goal of replacing caret with a set of smaller, more focused packages following modern tidyverse conventions. (slides)
-
Varying Coefficient Models in R
Our May meetup featured Giacomo Benini telling us about varying coefficient models and how to use them in R. (slides)
-
Bayesian Modeling and Blogging in R
For our second April meetup we heard talks on Bayesian Modeling with
brms
andrstanarm
and on Blogging withblogdown
.Alice Milivinti gave us an overview of what Bayesian analysis is and compared the
brms
andrstanarm
packages, giving pros and cons of each along with code examples. (slides)Xiaosong Zhang introduced the
blogdown
package, an effort by RStudio to reduce the cost of communicating data-intensive material. Xiasong demonstrated creating and publishing a blog from scratch in just a few minutes. (slides)
-
Spatial Models in Stan
This month we hosted Mitzi Morris, a core developer of the Stan language, who walked us through building ICAR models for areal data in Stan. Check out her slides here.
This was a joint meetup with the Ann Arbor Chapter of the American Statistical Association.
-
CRANberry Pi & Kaggle
Ellis Valentiner talked about his CRANberry Pi project, using a Cranberry Pi to host CRAN packages. (slides)
Andrew Moore broke down what useRs need to know to get started with data science competitions on Kaggle. (slides)
-
rstudio::conf recaps
Multiple presenters gave recaps of things they’ve learned at the recent rstudio::conf in San Diego.
Slides:
-
tidyeval and flexdashboard
Colin Gross gave a brief explanation of non-standard evaluation (NSE) and how to use the value stored in a variable as part of
tidyverse
code. Illustrated by handling a case where column names are not known until runtime. (website)Joshua King presented an overview of the capabilities of RStudio’s flexdashboard package. This package combines functionality from static graphics and interactive htmlwidgets into a self-contained dashboard using R Markdown syntax. He will go through the construction of a demo dashboard for the best pizza/taco restaurant this country may never know. (slides)
-
Basic Package Development and spotifyr
Our November 2017 meetup was hosted at Ann Arbor SPARK. (announcement slides)
Rob Weyant talked about the basics of package development in R using his own package
spotifyr
as an example. He covered the key components of package development and some of the useful tools available in R and integrated into RStudio (slides,spotifyr
demo)
-
Exploring UFO Sightings with tidycensus and leaflet
Our October 2017 meetup was hosted at Ann Arbor SPARK. (announcement slides)
Andrew Moore talked about building bivariate choropleths with
leaflet
andtidycensus
, illustrating with NUFORC (National UFO Reporting Center) data. This included a brief overview of thetidycensus
package, a newly developed package following tidyverse conventions to return queries from the census API. He also covered howtidycensus
andleaflet
can be used in tandem to efficiently build interactive choropleths in R, without the need for additional GIS software. (slides)
-
Plotly and Pomp
Our September 2017 meetup was hosted at Ann Arbor SPARK.
Ashton Baker discussed the R package
pomp
, which allows users to build, test, analyze, and fit HMMs to data. Hidden Markov Models are useful tools for explaining time-series data. (slides)Rami Krispin gave an overview of the
plotly
package and its uses, pros and cons, and then gave a demo of the nice features of the package. (slides)
-
Forecasting with Prophet | Citations & Templates with RMarkdown
Our June 2017 meetup was hosted at Ann Arbor SPARK.
Nan Hanagud gave an introduction to Prophet, an open source time series forecasting tool by Facebook available in R and Python (slides). Nan recently started as a Business Intelligence Principal at XPO, where she plans to use R for proof-of-concept type projects. Nan has Master’s degrees in applied & computational math and operations research.
Andrew Moore demonstrated how RMarkdown can be extended as a report writing tool to produce inline citations and reference sections for technical and academic writing (slides). Andrew Moore has a BA in psychology and works at the University of Michigan School of Social Work as a data analyst. He is one of the Ann Arbor User Group’s co-organizers.
-
Useful Functions from dplyr & tidyr
The May meetup was hosted at Ann Arbor SPARK.
Marian Schmidt gave a presentation on useful functions from the dplyr and tidyr R packages for data organization, manipulation, and readability. (slides)
Marian is a PhD candidate in the department of Ecology and Evolutionary Biology at the University of Michigan who uses R to better understand how bacterial communities contribute to the health and maintenance of the freshwater lakes in Michigan.
-
Reproducible Research & TensorFlow
The March meetup was hosted at Ann Arbor SPARK (announcement slides).
We had a special guest speaker, Claus Ekstrøm, who talked about Reproducible Research and some work he’s done to simplify that process. Claus was followed by AARUG member Mochan Shrestha, who showed us how to use Google’s TensorFlow from R (slides).
Claus Thorn Ekstrøm is Professor at the Section of Biostatistics, University of Copenhagen where he teaches courses on statistics and R for beginners and advanced users. Professor Ekstrøm’s primary research interests lie within statistical genetics, genetic epidemiology, and bioinformatics, in particular genetic association studies, image analysis of microarray scans, and integrated analysis of gene expression and metabolic profile data. He is the maintainer of several R packages (MESS, dataMaid, SuperRanker, MethComp) and author of The R Primer (slides).
-
Beginner Workershop Lessons
There was a free two day Beginner R Workshop at Ann Arbor SPARK on Saturday March 4th and Sunday March 5th.
The workshop gave an introduction to data types, data frames, packages & R Markdown, data manipulation with the dplyr package, data visualization with the ggplot2 package, and data analysis.
The workshop lessons are available (here) in the bookdown format.
-
ggplot2 and Neo4j
The February meetup was hosted at Ann Arbor SPARK (announcement slides).
Ellis Valentiner gave a short recap from rstudio::conf(2017L). Ellis is a Data Scientist at Powerley, a startup that’s creating home energy management solutions, and an organizer for the Ann Arbor R User Group (slides).
Kaitlin Flynn gave a talk on creating beautiful plots using the ggplot2 package. Kaitlin Flynn is a postdoctoral fellow in the lab of Pat Schloss at the University of Michigan. Kaitlin uses R and machine learning to understand relationships between gut bacteria and human diseases like colon cancer (slides).
Geof Hannigan presented on the use of Neo4j with R to store data in a graph database and do graph-based analysis. Geof is a microbiology postdoctoral fellow at the University of Michigan and an organizer for the Ann Arbor R User Group. He uses graph analyses to better understand the networks of microbes in the human body and how they are associated with disease (slides).
-
How To Turn Your Existing R Code Into A RESTful API
The November meetup was hosted at Ann Arbor SPARK (announcement slides).
Ellis Valentiner gave a presentation titled “How to turn your existing functions into a RESTful web API” (slides). REST is a simple but powerful architecture that statisticians and data scientists can use to deliver models and results. In his presentation, Ellis introduced the plumber package and showed how to create and use APIs to provide data science services without messy code handoffs.
Ellis is a Data Scientist at Powerley, a startup that’s creating home energy management solutions, and an organizer for the Ann Arbor R User Group.
-
(R)Stan & Police Shootings with Daniel Kessler
The September meetup was hosted at Ann Arbor SPARK (announcement slides). This meetup was held jointly with the Ann Arbor chapter of the American Statistical Association.
Daniel Kessler gave a talk on using R and Stan to model racial disparities in police use of force, in particular fatal shootings (slides). His work is an extension of a paper by Cody Ross, which used a multi-level Bayesian approach to arrive at both national- and county-level estimates.
Dan and his colleagues became interested in the work, and thanks to the code and data for the paper being publicly available, were able to first reproduce many of the key results, and then replicate several using an independent dataset covering another time period collected by the Washington Post. This is an evolving, live project, and you can follow along at the project’s github repository.
Daniel Kessler is a Research Computer Specialist in the University of Michigan Health System’s Departments of Psychiatry & Radiology and a Developer for the Neuroimaging Methods Core. With a background in biological psychology and statistics, and an autodidactic proclivity for machine learning and computer science, his interests lie at the intersection of statistics, data science, behavior, and cognitive neuroscience. You can learn more at his website, www.dankessler.me.
-
Cool R tools: Packrat and RMarkdown beyond the knit button
The August meetup was hosted at Ann Arbor SPARK (announcement slides).
Clayton Yochum gave a talk titled Unusual uses for RMarkdown: Beyond the Knit button and Andrew Moore gave a talk titled Package dependencies in R: why you should be using Packrat.
Clayton Yochum is a Data Analyst/Engineer at Methods Consultants in Ypsilanti and the University of Michigan School of Social Work’s Child and Adolescent DataLab. He has a background in mathematics and business intelligence.
Andrew Moore is a Data Analyst in the School of Social Work at the University of Michigan, working with the Child & Adolescent Data Lab. Andrew has a BA in Psychology from University of Michigan and previously worked as a research technician at both UM’s Institute for Social Research and in UM’s psychology department.
-
GNU Make as a tool for reproducible data analysis
The July meetup was hosted at Ann Arbor SPARK (announcement slides).
Michelle Berry gave a lightning talk titled “Updates from the tidyverse and highlights from useR!” which will give a brief overview of some of the exciting new R developments showcased at the useR conference (slides).
Pat Schloss, PhD gave our main talk on using GNU Make as a tool for reproducible data analysis (repository). Makefiles are typically used for generating executables from a program’s source files, but the same technology can be used to automate and reproduce an analysis workflow. Pat’s talk covered the motivation for using Make and guide attendees through a tutorial of how to construct a makefile for an analysis (e.g., one in R).
Pat Schloss is an associate professor at University of Michigan in Microbiology and Immunology. His research combines wet lab and computational methods to investigate links between the gut microbiota and disease. He is the developer of the mothur software for analyzing microbial short read sequence data.
-
Beginner Workershop
We held our second Beginner R Workshop at Ann Arbor SPARK on Saturday May 21st and Sunday April 22nd.
The workshop gave an introduction to data types, data frames, packages & R Markdown, data manipulation with the dplyr package, data visualization with the ggplot2 package, and data analysis.
Materials from the workshop are available (here).
-
Social
Our May meetup was a social meetup — there were no presentations this month, just a chance to hang out and have fun.
-
Super Learner
The April meetup was hosted at Ann Arbor SPARK (annoucement slides). Scott Grey gave a brief introduction to SuperLearner package, an ensemble prediction algorithm developed by Eric Polley at the National Cancer Institute. SuperLearner allows the use many algorithms to generate an ideal prediction function that is a weighted average of all the algorithms considered. The SuperLearner package permits the integration of dozens of prespecified potential algorithms found in other R packages as well as ‘wrappers’ that provide the user with the ability to include their own algorithms (slides).
Scott Grey is a senior statistician in the University of Michigan’s Division of Cardiology where he analyzes data from registries of percutaneous interventions. He received his Masters and PhD from Case Western Reserve University.
This presentation is an “informal part two” to another presentation from Scott, Integrating machine learning into causal inference: the Targeted Maximum Likelihood Estimation approach offered by the Ann Arbor ASA Chapter on Tuesday, April 12.
-
Beginner Workershop
We held our first Beginner R Workshop at Ann Arbor SPARK on Saturday April 2nd and Sunday April 3rd.
The workshop gave an introduction to data types, data frames, packages & R Markdown, data manipulation with the dplyr package, data visualization with the ggplot2 package, and data analysis.
Materials from the workshop are available (here).
-
Docker+R and Outlier and Change Point Detection
The March meetup was hosted at Ann Arbor SPARK (annoucement slides). Kraig Stevenson gave a lightning talk on outlier and change point detection using Twitter’s R packages AnomalyDetection and BreakoutDetection (slides) and Blake Nicholson gave a presentation on using R with Docker, a standard interface for creation, distribution and deployment of software (slides).
Docker has taken the software engineering world by storm. It provides a lightweight, isolated environment in which programs can be run. It has broad applications across all aspects of software engineering. In this talk, Blake provided background on Docker, its architecture, and how to use it. He went through several examples of using R within a Docker container and the advantages such a setup provides.
Blake Nicholson leads the Data Science and Software Engineering teams at Seelio. He has applied quantitative methods and software development in numerous business settings throughout his career. His experience ranges from optimizing the size and structure of sales teams to leveraging analytical methods to find, grow, and keep employees at Facebook.
-
Defensive Programming and Debugging in R
The Februrary meetup was hosted at Ann Arbor SPARK (big thanks to Travis Linderman!); check out the announcement slides for the latest group updates. There were two lightning talks, one on web scraping and the other on deep learning, followed by a longer presentation on programming and debugging.
The lightning talks were:
-
“Deep Learning with H20” (code) presented by Brandon Stange
-
“Scraping Web Content Using rvest” (slides) presented by Jeff Shane, and
For our main presentation Reino Bruner gave a talk on Defensive Programming and Debugging in R (slides).
Reino Bruner is an R statistical programmer and data scientist with Equastat. His academic background is in experimental psychology, physics, and statistics. He has experience in many domains including, but not limited to: experimental design, assessment design & validation, statistical modeling, machine learning, model automation, causal analysis, and critical thinking. His research experience and interests include Bayesian Hamiltonian Monte Carlo analysis, neuro-network deep learning, nonparametric and robust statistics, complex system simulations, machine learning, and forecasting.
-
-
Lightning Talks and Creating Powerful Yet User-Friendly Analytics Applications with RShiny
At our January meetup we had several short “lightning” talks followed by a presentation from Jeremy Albright.
The lightning talks were:
-
“spotifyr: a wrapper for Spotify’s API in R” (slides) - Rob Weyant
-
“Shiny WiFi Dashboard: Displaying information about the current WiFi Status” (slides) - Ellis Valentiner
-
“A Modern R Styleguide: Working towards a sensible, current style for R code” (slides) - Clayton Yochum
Jeremy Albright is the Founder and CEO of Methods, an Ypsilanti-based data science and statistical consulting firm. He’s been using R since 2001 and enjoying it since 2005. He uses R to develop applications to perform predictive modeling, text analytics, and data visualizations. In his misguided youth he also earned a doctorate in Political Science and still occasionally dabbles in academic research, both in the social sciences and biostatistics, and he teaches statistics in the Psychology department at EMU.
RShiny makes it easy to develop applications with highly intuitive graphical user interfaces that also run highly sophisticated analytics in the background. Jeremy’s presentation discussed two real-world applications: one app that creates and analyzes multiply imputed data sets, and another that uses a neural network to predict stock market performance on the basis of daily data feeds. These examples show how it is possible to provide clients with true business intelligence options beyond the simplistic dashboards that dominate the market. The presentation will also discuss making RShiny applications available via the cloud (.pptx slides).
-
-
End of Year Party
Our December meetup was an end of year celebration — there were no presentations this month, just a chance to hang out and have fun.
-
Grammar of Graphics, rJava & Rcpp
At our November meetup Brian Perron gave a talk about the Grammar of Graphics and how that relates to the ggplot2 package, followed by Dan Pressel talking about the rJava and Rcpp packages.
Brian Perron is an Associate Professor at the UM-School of Social Work. He self-identifies as a geek and loves data. His talk was a crash course on the underlying conceptual framework of the grammar of graphics. Understanding the grammar is essential to creating effective data visualizations with ggplot, without spending countless hours pouring through StackOverflow posts (slides).
Daniel Pressel is Chief Science Officer at Digital Roots, where he works on problems in Natural Language Processing and Machine Learning. Occasionally when using R, we want to access to existing software written in other programming languages. In this talk Dan described how to interface with two very popular languages – C++ and Java – from within R. He demonstrated how to build a package with Rcpp using native code to improve execution performance and how to use rJava, to build something powerful with very little development effort (slides).
-
Propensity Score Matching and ggvis
Alex Lishinski presented on Propensity Score Matching in R (slides). Dave Childers talked about how to use ggvis to create interactive graphics (slides).
-
caret package, and Text mining with R and its application to song lyrics
In September we had two presentations: Mochan Shrestha presented on the caret package, a set of functions that attempt to streamline the process for creating predictive models; and Kraig Stevenson gave an introduction to text mining with R and its application to song lyrics.
Mochan Shrestha is a software engineer with a background in computer science (PhD, Wayne State University) and mathematics (MS, Michigan State University). In his talk, Mochan demonstrated how to perform predictive modeling using the caret package. He covered applications for classification and regression (slides).
Kraig Stevenson has always been interested in what data tells us about the real world. His educational path has taken several twists and turns, from a BS in Statistics and MS in Biostatistics, to a PhD in Bioinformatics, during which he worked with next-generation sequencing data to better understand the regulation of gene expression. After his PhD he began his career as a Data Scientist at Trinity Health. In his talk, Kraig introduced concepts of text mining related to song lyrics. Kraig developed a R wrapper for the musiXmatch API to extract lyrics from a particular artist and apply general text-mining techniques to better understand the content in a corpus of text (slides).
-
RStan: Statistical Modeling Made Easy with Bob Carpenter
In August we were visted by special guest Dr. Bob Carpenter, a research scientist in computational statistics (Columbia University). He designed the Stan probabilistic programming language and is one of the Stan core developers.
Bob has a Ph.D. in cognitive and computer science (University of Edinburgh), worked as a professor of computational linguistics (Carnegie Mellon University), an industrial researcher and programmer in speech recognition and natural language processing (Bell Labs, SpeechWorks, LingPipe).
Bob introduced Stan, a new language for expressing statistical models with support for full Bayesian inference via sampling (Hamiltonian Monte Carlo) and maximum likelihood estimation via optimization (L-BFGS) and curvature (higher-order autodiff). He began with an introduction to Bayesian modeling and how it preserves estimation uncertainty in posterior inference, focusing on event probability estimation, decision theory, and out-of-sample prediction. He also introduced Stan’s modeling language based on examples, working up from the very basics (Bernoulli model of binary outcomes, simple linear regression), through our bread and butter algorithms (multilevel generalized linear models, aka mixed effects), and finishing with some advanced model examples (Gaussian processes, diff eq physical system models). Bob provided a high-level overview of the back end, touching on how models get translated to C++, how automatic differentiation allows us to compute gradients up to machine precision, how variables are transformed to unconstrained support on the real numbers, and how our adaptive Hamiltonian Monte Carlo works using animations. Bob also introduced ShinyStan, an interactive posterior visualization tool, and describe the latest edition to Stan, black-box variational inference. Finally he wrapped up with an overview of what we have in store over the next couple of years (PDF slides)..
-
magrittr, Package Development and Association Rule Learning
Rob Weyant showed us the magrittr package (slides), Jeff Shane demonstrated package development (slides), and Brandon Stange gave a talk on association rule learning (slides).
-
Interactive Web Applications in R with Shiny
Clayton Yochum walked us through the steps to create, iterate on, and host a simple web application (slides).
-
R Markdown and Support Vector Machines
Marian Schmidt gave an introduction to using R Markdown (slides) and Michelle Berry gave an overview of support vector machines (slides).
-
Classification: Regularized logistic regression, Trees and random forests
Devin Riley demonstrated binary classification using regularized logistic regression (slides) and Ellis Valentiner gave an introduction to decision trees and random forests (slides).
-
Data Exploration
Kun Deng gave an introduction to the tidyr and dplyr packages (slides), Haad Khan showed us how to perform exploratory data analysis and visualization using ggplot2 (slides), and Blake Nicholson walked us through the basics of principal component analysis and exploratory factor analysis (slides).