8 min read

Things I learned in TAFIRI R Workshop 2019

Introduction

We had a three days workshop on using R for analytical framework from 12th to 14th July, 2019 at the headquarter of Tanzania Fisheries Reseach Institute (TAFIRI), Dar es Salaam. The workshop was a follow up of similar training conducted in July, 2018. Though the focus of the workshop relied on the base R syntax that does not allow you to express succinctly the coding with pipping, I can honestly say it was one of the most interesting and educational training.

The training exposed us to some useful packages that I was not aware of and I have come back with a long list of ideas which I want to get using right away! I have just listed the package in this post and described them briefly of some functions in the package to glimpse their flavors to streamline your coding in R.

  • agricolae— a package for Statistical Procedures for Agricultural Research developed by Felipe de Mendiburu (2019). Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster

  • car— Developed by Sanford Weisberg (2019), which provide additional functions for regression analysis

  • caret— A packaged developed by Max Kuhn (2019), provide various functions for training and plotting classification and regression models.

  • FSA— A package developed and maintained by Derek Ogle, Powell Wheeler and Dinno Alexis (2019) that offers different methods for fish stock assessment.

  • GGally— is an add–on package to ggplot2 developed by Barret Schoerke (2018) with contributions from many others. It provide functions that extends the plotting capability of ggplot2, which include

    1. pairwise plot matrix,
    2. a two group pairwise plot matrix,
    3. a parallel coordinates plot,
    4. a survival plot, and
    5. several functions to plot networks
  • lme4— the package fit linear and generalized linear mixed-effects models (Bates et al. 2015). The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the ‘Eigen’ C++ library for numerical linear algebra and ‘RcppEigen’ “glue”.

  • MKmisc— The package contains several functions for statistical data analysis; e.g. for sample size and power calculations, computation of confidence intervals and tests, and generation of similarity matrices (Kohl 2019).

  • MuMIn— This package was developed and is maintained by Kamil Bartoń (2019) has tools for performing model selection and model averaging. Automated model selection through subsetting the maximum model, with optional constraints for model inclusion. Model parameter and prediction averaging based on model weights derived from information criteria AICc and alike) or custom model weighting schemes

  • pROC— missing in the library. reinstall

  • pwr— The package has tool to perform basic power analysis (Champely 2018).

  • userfriendlyscience— Gjalt-Jorn and his collegues (2017) developed this package and pack it with functions that are customize to people who are familiar with SPSS two goals. It also has functions that are more user friendly to relatively novice users. The package also conveniently houses a number of additional functions that are intended to increase the quality of methodology and statistics in psychology, not by offering technical solutions, but by shifting perspectives, for example towards reasoning based on sampling distributions as opposed to on point estimates.

  • visreg— Patrick Breheny and Woodrow Burchett (2017) developed this package to provides a convenient interface for constructing plots to visualize the fit of regression models arising from a model algorithms in R (‘lm’, ‘glm’, ‘coxph’, ‘rlm’, ‘gam’, ‘locfit’, ‘lmer’, ‘randomForest’, etc.

  • DescTools— Andri Signorell and his team (2019) developed this package that has a collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author’s intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The ‘camel style’ was consequently applied to functions borrowed from contributed R packages.

We also worked through the exercise with some of my favourite packages to accomplish the tasks and assignments in R projects. These packages include:

  • ggplot2— A core function of the tidyverse1 ecosystem developed by Hadley Wickham (2016). Its a popular visualization tools that uses a Grammar of Graphics framework. It provide tools that you add layers on top one another and create complex plots.

  • sf— Edzer Pebesma and his team developed this package (2018), which Support for simple features, a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations

  • maps— This package as its name tell, it display maps–contain tools for displaying spatial data in R. It was orignally written by Richard Becker and Allan Wilks (2018), but more people have contributed to its maturity.

  • maptools— Roger Bivand and other (2017) developed this package, which contains set of tools for manipulating geographic data. It includes binary access to ‘GSHHG’ shoreline files. The package also provides interface wrappers for exchanging spatial objects with packages such as ‘PBSmapping’, ‘spatstat’, ‘maps’, ‘RArcInfo’, and others.

  • ggmap— David Kahle and others (2013) developed this package to extend the capability of ggplot2 package. The package host several functions that enable to overlay spatial data and models on top of static maps from various map sercies—Google Maps and Stamen Maps).

  • tmap— This package offers tools to visualize geographical data (Tennekes 2018). This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

  • leaflet— Create and customize interactive maps using the ‘Leaflet’JavaScript library and the ’htmlwidgets’ package. These maps can be used directly from the R console, from ‘RStudio viewer’, in Shiny applications and R Markdown documents (Cheng, Karambelkar, and Xie 2018) documents.

  • spData— Roger Bivand and others (2019) developed this package, which contains several spatial datasets.

  • mapshot/mapview—Time Appelhand and colleageues (2019) developed this package for interactive visualization of geographical data.

Final Words

A huge thank you for instructor Jimena Golcher, Jessica Rick, and Jesse Alston for their generous presentations about so many relevant topics. I also thank for the Dr. Ismael Kimirei for facilitating the workshop and make it possible. And final thanks goes to the R trainees.

References

al., Andri Signorell et mult. 2019. DescTools: Tools for Descriptive Statistics. https://cran.r-project.org/package=DescTools.

Appelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2019. Mapview: Interactive Viewing of Spatial Data in R. https://CRAN.R-project.org/package=mapview.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Bivand, Roger, and Nicholas Lewin-Koh. 2017. Maptools: Tools for Reading and Handling Spatial Objects. https://CRAN.R-project.org/package=maptools.

Bivand, Roger, Jakub Nowosad, and Robin Lovelace. 2019. SpData: Datasets for Spatial Analysis. https://CRAN.R-project.org/package=spData.

Breheny, Patrick, and Woodrow Burchett. 2017. “Visualization of Regression Models Using Visreg.” The R Journal 9 (2): 56–71.

Champely, Stephane. 2018. Pwr: Basic Functions for Power Analysis. https://CRAN.R-project.org/package=pwr.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2018. Leaflet: Create Interactive Web Maps with the Javascript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.

de Mendiburu, Felipe. 2019. Agricolae: Statistical Procedures for Agricultural Research. https://CRAN.R-project.org/package=agricolae.

Fox, John, and Sanford Weisberg. 2019. An R Companion to Applied Regression. Third. Thousand Oaks CA: Sage. https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

Jed Wing, Max Kuhn. Contributions from, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, et al. 2019. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.

Kahle, David, and Hadley Wickham. 2013. “Ggmap: Spatial Visualization with Ggplot2.” The R Journal 5 (1): 144–61. https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf.

Kohl, Matthias. 2019. MKmisc: Miscellaneous Functions from M. Kohl. http://www.stamats.de.

Ogle, Derek H., Powell Wheeler, and Alexis Dinno. 2019. FSA: Fisheries Stock Analysis. https://github.com/droglenc/FSA.

Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.

Peters, Gjalt-Jorn Ygram. 2017. “Diamond Plots: A Tutorial to Introduce a Visualisation Tool That Facilitates Interpretation and Comparison of Multiple Sample Estimates While Respecting Their Inaccuracy.” PsyArXiv. https://psyarxiv.com/fzh6c.

Richard A. Becker, Original S code by, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka, and Alex Deckmyn. 2018. Maps: Draw Geographical Maps. https://CRAN.R-project.org/package=maps.

Schloerke, Barret, Jason Crowley, Di Cook, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Joseph Larmarange. 2018. GGally: Extension to ’Ggplot2’. https://CRAN.R-project.org/package=GGally.

Tennekes, Martijn. 2018. “tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.


  1. a set of packages that work in harmony for importing, tidying, manipulating, modeling visualize data.