Monday

3 December 2018

Workshops

Morning

Chris Auld

(Microsoft)

1. Scaling R in the Cloud with doAzureParallel

Roger Payne

(VSNi)

2. Genstat 19ed masterclass

Lunch

Afternoon

Peter Baker

(University of Queensland)

3. Tools for efficient
data analysis workflow and reproducible research

Salvador Gezan

(University
of Florida)

4.  Modelling correlations between observations in agricultural and environmental sciences with ASReml-R

Welcome Reception

Book for Conference + Optional Workshops + Optional Excursions + Optional Guest Dinner Tickets

 

Book for Workshops Only

  • Book one workshop (1, 2, 3 or 4)
  • Book two workshops (one morning and one afternoon)
  • You can book workshops even if you are NOT attending the conference

 

All online registration forms require credit card payment. If you would like to be invoiced and/or pay by an alternative method, please contact Samantha Barrett (s.barrett@auckland.ac.nz).

Workshops

Monday 3rd December

Morning Workshops

Workshop 1

Scaling R in the Cloud with doAzureParallel

Presenters: Chris Auld and Nigel Parker, Microsoft

This workshop will equip attendees with the skills to scale their R workloads into the cloud. Cloud computing offers the promise of near infinite computing power on tap. The doAzureParallel package is a parallel backend that integrates with the native parallel support provided by the R runtime from v2.14. With doAzureParallel, each iteration of the loop runs in parallel on a pool of Virtual Machines (VM) in the cloud, allowing users to scale up their R jobs to tens to thousands of CPU cores.

In this workshop we will cover:

  • Simulation and optimization at scale. Using monte carlo methods, attendees will learn general purpose approaches to executing and reproducing simulation workloads at scale.
  • Parallelism of data intensive tasks such as ETL and feature engineering. This will include approaches to scaling plyr and data.table based manipulations.
  • The use of doAzureParallel with the two key machine learning meta-frameworks; caret  and mlr. We will cover the use of parallel computing for k-fold cross-validation and hyper-parameter optimization.

This is a hands on workshop. Attendees should have had some experience working with the R language before and should bring along a machine (Windows, Mac, Linux all great) running a recent build of R and the editor of their choice. Cloud computing access will be provided on the day.

Workshop 2

Genstat 19ed Masterclass: how to get the analyses you need, the output you want, and the graphs you prefer

Presenters: Roger Payne, David Baird and Vanessa Cave, VSNi

In this workshop we will help you to master recent new features that make it easier for you to identify and validate the most appropriate model, produce graphs in the styles that you prefer, and archive results for convenient future use.

Topics will include:

  • manipulating and validating complicated data sets
  • finding the best REML model for single and series of trials
  • meta-analysis of series of trials
  • checking the validity of your analysis
  • saving and archiving results
  • exploiting new flexibility in the graphics menus
  • customizing the plots
  • producing publication-quality graphs
  • producing multi-paged PDF files directly from batch jobs
  • use of the new foreign language/Unicode text facilities in Genstat 19.2

The sessions will involve a mixture of examples and practicals. So please bring your laptops (ideally with Genstat 19ed already installed).

Afternoon Workshops

Workshop 3

Tools for Efficient Data Analysis Workflow and Reproducible Research

Presenter: Peter Baker, University of Queensland

Researchers and statistical consultants are drowning in data. In my early career, computer processing power and storage were limited and so we spent a considerable amount of time formulating strategies to efficiently manipulate data and focus on the analysis of relatively small data sets or subsets of larger ones. The day I started as a biometrician with NSW Agriculture, my boss told me that 80% of a biometrician’s time was spent organising and cleaning data for analysis. Given today’s powerful computing environments we might think that things would be easier but it appears that the 80% figure is still the case. Indeed, due to larger data sets and competing demands for our time, many data analysts face problems with organising their workflow. This tutorial provides a hands-on introduction to computing oriented strategies for the workflow of research data management and data analysis. The ideas presented in this tutorial follow Long’s 2009 book on the workflow of data analysis using STATA which provides a useful guide to managing the workflow for data analysis in large projects. However, Long’s approach concentrates on manual methods for implementing steps in the workflow. Efficient computing solutions are available for most steps in the process. Programming tools like GNU Make for managing workflow and regenerating output, GNU git for version control, GNU R functions and packages for repetitive tasks and finally R Markdown for producing reports directly from analyses. Generic and statistical software specific tools are presented. These tools are incorporated into hands on exercises. Keywords: Data analysis, DRY (Don’t Repeat Yourself) workflow, version control, Make, R, project management

Workshop 4

Modelling correlations between observations in agricultural and environmental sciences with ASReml-R

Presenter: Salvador Gezan, University of Florida

This ½ day workshop will concentrate on the statistical analysis of observations that present some form of temporal, genetical or spatial correlations as found on field or laboratory studies. The aim is to illustrate how to analyze this data using the framework of linear mixed models with the software ASReml-R.

This workshop will consist on several small sessions starting with a brief introduction to the command syntax for ASReml as implemented in R, followed by topics related to repeated measures (random regression), genetic relatedness, and spatial analyses using many examples from agricultural and environmental sciences. Some theoretical aspects will be presented for the construction of the models, but the focus will be on the practical issues and interpretation of the results. All sessions will be led by Dr. Salvador Gezan.