class: center, middle, inverse, title-slide # Reproducible and automated report generation ##
### Julia Schulte-Cloos
09 June 2021
###
https://jschultecloos.github.io/reproducr
https://bit.ly/3zfK42o
<div
Β julia.schulte-cloos@gsi.lmu.de
|
Β jschultecloos
|
Β
@jschultecloos
--- layout: true <div class="my-header"></div> --- name: content class: spaced, inverse, middle # Contents ### 1) [Literate programming](#literate) ### 2) [Integrated-file solution](#integratedrmd) ### 3) [Explore the data, exploit interactivity](#explore) ### 4) [Circulate, share, and publish your report any time](#report) ### 5) [Hands on](#handson) --- name: aboutme ## About me .pull-left-40[ <img src="jsc-profile.png" width="100%" /> ] .pull-right-60[ {{content}} ] -- ### Julia Schulte-Cloos {{content}} -- - PhD in Political Science {{content}} -- - π what got me interested in reproducible report generation? {{content}} -- - 70% of my work relates to data, 30% to getting manuscripts published in peer-reviewed journals {{content}} -- - ...π©βπ« and some extra 20% to teaching {{content}} -- - π€ author of the `reproducr` R library {{content}} -- - β literate programming, reproducibility & open science is key in academia {{content}} --- name: literate class: inverse, middle # Literate programming --- ## Narrative text and code integration .left-column[ ### Executable reports ] .right-column[ {{content}} ] -- > ### π Literate programming is key for reproducible research workflows {{content}} -- - code **embedded** in narrative text or documentation - code follows structure of documentation {{content}} -- > ### π΄ ideally: integration of narrative text and code in multiple programming languages (e.g., R, Python, Julia) {{content}} -- > ### π ideally: flexibility to produce different types of research outputs (e.g., reports, manuscripts, blog posts) --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ] .right-column[ {{content}} ] -- > Markdown syntax - **human readable**, leight-weight markup language - **format text** with a plain-text editor - π supported by many platforms and frameworks {{content}} -- > Text formatting and emphasis - Bold text with `**bold**`: **bold** - Italic text with `*italic*`: *italic* - Strikethrough text with `~~striketrough~~`: ~~striketrough~~ {{content}} --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ] .right-column[ {{content}} ] -- > Sections - `# A level-one section` - `## A level-two section` - `# An unnumbered section {-}`, or equivalently `# An unnumbered section {.unnumbered}` {{content}} -- > Numbered lists and bullet points ```r 1. first point 2. second point + one sub-item + another sub-item 3. third point ``` 1. first point 2. second point + one sub-item + another sub-item 3. third point {{content}} --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ] .right-column[ {{content}} ] -- > Hyperlinks `[My link](https://mylink.com)` {{content}} -- > Images `![My logo](logo.png){width=20%}` {{content}} -- > Line breaks ```r This is the first line. # two spaces to break a line This is the second line. ``` --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ] .right-column[ {{content}} ] -- > Document conversion tool {{content}} -- - free and open-source - widely used as a writing tool {{content}} -- - uses an enhanced version of Markdown (Pandoc Markdown) - syntax for tables, footnotes, citations, math, etc {{content}} -- - **YAML metadata block** sets global parameters of a document ```yaml --- author: Reproducible Data Scientist title: My greatest report keywords: reproducibility, open science --- ``` --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ] .right-column[ {{content}} ] -- > Citations {{content}} -- - `CiteProc` supported by Pandoc - easy use of bibliographic data - formatted **bibliographies** and **citations** based on metadata of cited objects - **formatting instructions** provided by Citation Style Language (CSL) styles {{content}} -- ```yaml --- bibliography: literature.bib link-citations: true csl: 'https://bit.ly/3khj0ZL' --- ``` {{content}} -- - `@palmerdata.2020` for inline citations - `[@palmerdata.2020, p.10]` for all other references --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ] .right-column[ {{content}} ] -- > DIV elements - content wrapped into three (or more) colons - DIV elements help you to include any non-standard elements in the report - **filters**: apply transformations to certain DIVs, or exclude them ``` ::: {.special} This is a special DIV! ::: ``` --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ### R Markdown ] .right-column[ {{content}} ] -- > **Authoring framework** for data science {{content}} -- - single file to execute code and document code flow (no out-of-order-code execution) {{content}} -- - produce **high-quality reports in different formats** {{content}} -- - ready to be shared with several audiences {{content}} -- > Narrative text {{content}} -- - π written in Pandoc's Markdown --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ### R Markdown ] .right-column[ {{content}} ] -- > Code chunks {{content}} -- ````markdown ```{r elephant-chunk, out.width='20%', fig.align='center', fig.cap='Elephant in the room'} knitr::include_graphics('figs/elephant.jpg') ``` ```` <div class="figure" style="text-align: center"> <img src="figs/elephant.jpg" alt="Elephant in the room" width="20%" /> <p class="caption">Elephant in the room</p> </div> {{content}} -- - control how code & its products appear in your compiled report - chunk-options: (e.g. `eval, include, results, echo`) - comprehensive [list online](https://yihui.name/knitr/options/) and in the [RMarkdown reference guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) - add **unique names to your code chunks**: `{r elephant-chunk}` --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ### R Markdown ] .right-column[ {{content}} ] -- > Referencing actual code output ```r # A simple linear regression model fit <- lm(dist ~ speed, data = cars) speed_estimate <- broom::tidy(fit) %>% filter(term == "speed") %>% pull(estimate) ``` ```r The estimated coefficient is `r round(speed_estimate, digits = 2)`. ``` The estimated coefficient is 3.93. --- ## Narrative text and code integration .left-column[ ### Executable reports ### Markdown ### Pandoc ### R Markdown ### The 'magic' behind the scenes ] .right-column[ {{content}} ] -- > ### Magic? {{content}} -- <center> <img src="https://media.giphy.com/media/NnafYvjXZK9j2/giphy.gif" height="180"> </center> {{content}} -- <div class="figure" style="text-align: center"> <img src="figs/pandoc1.png" alt="[https://bit.ly/3z6hMYh]" width="50%" /> <p class="caption">[https://bit.ly/3z6hMYh]</p> </div> --- name: singlermd class: middle, inverse # Integrated-file solution --- name: outputformat ## Power of knitr, Pandoc and Lua > ### π R library `reproducr` -- > ### π builds on the existing R Markdown infrastructure, extends it for more flexible use for data science through the power of Pandoc and Lua -- <img src="figs/filter-diagram-2.png" width="80%" /> [https://www.r-bloggers.com/2021/01/pandoc-filters-in-bookdown/] --- name: outputformat ## Integrated-file solution, optimised for two output formats 1) `reproducr::reproducr_manuscript` 2) `reproducr::reproducr_draft` ```yaml --- output: reproducr::reproducr_manuscript: reproducr::reproducr_draft --- ``` -- - compile your document to a **manuscript (PDF)** by calling `rmarkdown::render("odsc-report.Rmd", "reproducr::reproducr_manuscript")` -- - compile your document to a **draft (HTML)** by calling `rmarkdown::render("odsc-report.Rmd", "reproducr::reproducr_draft")`. -- - **OR**: click on the 'Knit' button in the RStudio IDE (R Markdown will compile your document to the *first* output format that you specify in the `YAML` header) -- β¨ include your code with code-folding for HTML output and hide all of your code in PDF output --- name: conditionalexclusion ## Flexible output formats > π polished & well-formatted HTML featuring explorative analysis & dynamic research output > π polished & circulation-ready PDF report -- - β **But how to tell RMarkdown which parts are explorative or 'drafty' and which parts are meant to be in the final report?** -- - `not-in-format` DIV: wrap the parts of the paper that are exclusive to one output format in three colons `::: {.not-in-format .latex}` closed by three more colons `:::` -- - very powerful in conjunction with **conditional code evaluation** <img src="figs/not-in-format-text-code.PNG" width="100%" style="display: block; margin: auto;" /> --- name: explore class: inverse, middle # Explore the data, exploit interactivity --- name: explore2 ## Explore the data, exploit interactivity -- ```r datatable = DT::datatable( penguins, filter = 'top', options = list(pageLength = 5, lengthMenu = c(5, 10)), colnames = c('Species', 'Island', 'Bill Length', 'Bill Depth', 'Flipper Length', 'Body Mass', 'Sex', 'Year') ) ``` <iframe src="datatablepenguins.html" width = "900px", height = "350px" frameBorder="0"></iframe> --- name: report class: inverse, middle # Circulate, share, and publish your report any time --- ## Circulate, share, and publish your report any time β¨ generate documents that contain a datestamp in their file name while keeping a clean `.Rmd` for tracking changes with version control (e.g. Git) -- β¨ harmonise fonts in your graphs with fonts used in respective output format -- β¨ create high-resolution graphs -- <img src="figs/integratedfonts.PNG" width="50%" style="display: block; margin: auto;" /> -- β¨ include two **separate bibliographies** for main article and appendix -- ### βͺοΈ [https://jschultecloos.github.io/reproducr/articles/reproducr.html#dissemination-stage-1](https://jschultecloos.github.io/reproducr/articles/reproducr.html#dissemination-stage-1) --- ## Circulate, share, and publish your report any time > ### π optimised for scholarly writing -- β¨ integrate **scholarly information** about authors, their (multiple) affiliations, their contributions, and corresponding author in title page of your manuscript -- β¨ **blind your manuscript** before submitting it for review -- <img src="figs/titlepage-blinded.PNG" width="70%" style="display: block; margin: auto;" /> --- name: gettingstarted ## Get started > ### π‘ Want to get started? Package template features plenty examples -- ```r remotes::install_github("jschultecloos/reproducr") ``` -- > ### **In R Studio**: File > New File > R Markdown... > From Template > reproducr <img src="figs/rmarkdown-new-file-template.PNG" width="50%" style="display: block; margin: auto;" /> --- name: handson class: inverse, middle # Hands On -- ### π [Let's get started: https://bit.ly/3x6DWYv](https://bit.ly/3x6DWYv) π --- name: end-slide class: end-slide, inverse # Thank you for your attention. <p>R version 4.0.3 (2020-10-10)<br><p>Platform: x86_64-w64-mingw32/x64 (64-bit)</p><p>OS: Windows 10 x64 (build 19042)</p><br> Built on : <i class='fa fa-calendar' aria-hidden='true'></i> 09-Jun-2021 at <i class='fa fa-clock-o' aria-hidden='true'></i> 11:54:07 __2021__ β’ [Julia Schulte-Cloos](jschultecloos.github.io) β’ [
jschultecloos](https://twitter.com/jschultecloos)