Ethan Wood
Medical student with a keyboard


Making Tables with gtsummary

Posted on

Tags: data analysis

Creating Publication-Ready Scientific Tables with gtsummary

When writing a scientific manuscript or abstract, clean and consistent tables are critical. Traditionally, researchers spend hours formatting tables in Word or Excel. With R, we can automate this process and produce beautiful, reproducible, and publication-ready tables.

One of my favorite tools for this is the gtsummary package. It integrates smoothly with the tidyverse, plays nicely with gt, and makes it easy to summarize and present clinical data. In this post, I’ll walk through my workflow for preparing patient-level data and producing tables suitable for scientific manuscripts.


Data Preparation

Before making tables, it’s important to clean and standardize the dataset. I often work with clinical datasets that have raw numeric codes, missing values, or 0/1 indicators that need to be labeled.

Here’s a small snippet of how I handle categorical recoding:

# Convert 0/1 binary variables to Yes/No factors
to_yesno <- function(x) {
  factor(x, levels = c(0, 1), labels = c("No", "Yes"))
}

binary_vars <- c("surgery_method_open", "surgery_method_endoscopic", "mapping_eeg")
data[binary_vars] <- lapply(data[binary_vars], to_yesno)

# Sex and race recoding
data$sex <- factor(data$sex, levels = c(1, 2), labels = c("Female", "Male"))
data$race <- factor(data$race, levels = 1:5,
                    labels = c("African-American", "White", "Hispanic or Latino",
                               "Arabic", "Asian"))

I also use the labelled package to attach human-readable labels that gtsummary will automatically pick up:

library(labelled)
library(stringr)

var_label(data) <- setNames(
  str_to_sentence(str_replace_all(names(data), "_", " ")),
  names(data)
)

Table 1: Baseline Characteristics

The bread-and-butter of any clinical paper is the Table 1: patient demographics and baseline characteristics.

With gtsummary, this becomes a one-liner:

library(gtsummary)

table1 <- tbl_summary(
  data,
  by = bisynchronous_disruption,  # stratify by EEG findings
  include = c(age, sex, race, stage_of_cc, extent_of_callosotomy_collapsed),
  statistic = list(all_continuous() ~ "{mean} ({sd})")
) %>%
  add_overall() %>%
  add_p() %>%
  modify_caption("**Baseline Characteristics** N = {N}")

Example Table 1

Note: I added custom variable labels (e.g., "Extent of Callosotomy") so that the final output looks clean without manual editing.


Outcome Tables

For clinical outcomes at different timepoints, I create parallel tables and then merge them into a side-by-side comparison.

outcomes_1yr <- tbl_summary(
  data = data,
  by = bisynchronous_disruption,
  include = c(ilae_total_seizure_outcome_1yr, engel_total_seizure_outcome_1yr),
  label = list(
    ilae_total_seizure_outcome_1yr ~ "ILAE Seizure Outcome (1 yr)",
    engel_total_seizure_outcome_1yr ~ "Engel Seizure Outcome (1 yr)"
  )
) %>% add_p()

outcomes_last <- tbl_summary(
  data = data,
  by = bisynchronous_disruption,
  include = c(ilae_total_seizure_outcome_last_f_u, engel_total_seizure_outcome_last_f_u),
  label = list(
    ilae_total_seizure_outcome_last_f_u ~ "ILAE Seizure Outcome (Last FU)",
    engel_total_seizure_outcome_last_f_u ~ "Engel Seizure Outcome (Last FU)"
  )
) %>% add_p()

table_outcomes <- tbl_merge(
  tbls = list(outcomes_1yr, outcomes_last),
  tab_spanner = c("**1 Year**", "**Last Follow-up**")
)

This produces a merged outcome table that’s easy to interpret and manuscript-ready.


Exporting Tables

Once satisfied, I use gt to export my tables directly to Word for insertion into a manuscript:

library(gt)

table1 %>%
  as_gt() %>%
  gtsave("tables/table1.docx")

table_outcomes %>%
  as_gt() %>%
  gtsave("tables/table_outcomes.docx")

This way, the tables are reproducible, version-controlled, and require minimal post-processing.


Why I Like gtsummary

For anyone writing clinical or epidemiological manuscripts, I highly recommend trying gtsummary. It’s streamlined my workflow and saved countless hours of table formatting.