Introduction
When starting out in research, one of the first tasks often assigned to students is creating tables. Organizing and formatting data is essential—but it can also be tedious. Anyone who has spent hours aligning columns in Excel (or worse, MS Word) knows the frustration. The formatting rules for these tools are often opaque and finicky, and getting something to look exactly right can be quite tricky, especially when text or data is long or doesn't fit neatly. Also, dealing with fonts, text styling, super- and subscripts, and equations can be infuriating. Lastly, even if you get something looking exactly right, replicating that style later on with a new table often means retracing your steps and trying to remember what you did to achieve your previous results.
A tool that has removed a lot of that frustration and been a huge time-saver for me is the gtsummary package in R. With gtsummary, you can automate table creation and generate clean, reproducible, publication-ready results. It integrates seamlessly with the tidyverse, works well with gt, and makes it straightforward to summarize and present clinical data.
In this post, I’ll walk you through my workflow for preparing patient-level data and producing tables for scientific manuscripts. What once could take days can now be done in just a few hours—and everything can be tracked in version control. Even better, once you’ve set up the workflow, future analyses become much faster and more consistent.
Data Preparation
Before creating tables, it’s important to first clean and standardize the dataset. Most of the data I work with comes from Excel files that were hand-coded. This usually means it needs some cleanup—trimming whitespace, recoding values like 0/1 into factors such as No/Yes, and making sure variable types are consistent.
I also use the labelled package to attach human-readable labels. These labels are automatically recognized by gtsummary, which saves time by eliminating the need to manually specify them later during table creation.
Here’s a short example of how I typically handle these steps.
library(openxlsx)
library(labelled)
library(gt)
library(gtsummary)
library(tidyverse)
data <- read.xlsx("data.xlsx", detectDates = TRUE)
data <- janitor::clean_names(data)
# initially reset the data
data[] <- lapply(data, function(x) {
if (is.character(x) || is.factor(x)) {
x <- as.character(x) # convert factors to character
}
return(x)
})
# Convert 0/1 dichotomous variables to Yes/No factors
to_yesno <- function(x) {
factor(x, levels = c(0, 1), labels = c("No", "Yes"))
}
# list out any vars you know to be dichotomous (Yes/No) in nature
binary_vars <- c("surgery_method_open", "surgery_method_endoscopic", "mapping_eeg")
data[binary_vars] <- lapply(data[binary_vars], to_yesno)
# Sex and race recoding
data$sex <- factor(data$sex, levels = c(1, 2), labels = c("Female", "Male"))
data$race <- factor(data$race, levels = 1:5,
labels = c("African-American", "White", "Hispanic or Latino",
"Arabic", "Asian"))
# add human readable labels using `labelled`
var_label(data) <- setNames(
str_to_sentence(str_replace_all(names(data), "_", " ")),
names(data)
)
Table 1: Baseline Characteristics
Most clinical papers start off with a patient demographics and baseline characteristics table, often as the first table. Using gtsummary, producing this and other tables is relatively simple.
table1 <- tbl_summary(
data,
include = c(
"age",
"sex",
"race",
"extent_of_callosotomy_collapsed",
"follow_up_time"
),
label = list(
extent_of_callosotomy_collapsed ~ "Extent of Callosotomy",
follow_up_time ~ "Average Followup"
)
) |> modify_caption("**Table 1: Patient Characteristics**")
Here's what the rendered table looks like.
| Characteristic | N = 471 |
|---|---|
| 1 Median (Q1, Q3); n (%) | |
Table 2: Outcomes
In addition to creating simple tables like the patient characteristics table, gtsummary can also produce stratified tables. An example would be a table showing outcomes stratified by some characteristic (here represented by EEG findings).
# simple version of table 2
table2 <- tbl_summary(
data = data,
by = bisynchronous_disruption,
include = ilae_total_seizure_outcome,
label = list(
ilae_total_seizure_outcome ~ "ILAE Seizure Outcome"
)
) |> modify_caption("**Table 2: Outcomes**")
Here's the simple version of Table 2.
| Characteristic | >50% blockade N = 321 |
<50% blockade N = 61 |
Absent bisynchronization N = 91 |
|---|---|---|---|
| 1 n (%) | |||
You can also modify the table further by chaining other elements on to it, such as an overall group, P values, confidence intervals, significance stars, etc.
table2 <- table2 |> add_p() |> add_overall() |> add_significance_stars()
| Characteristic | Overall N = 471 |
>50% blockade N = 321 |
<50% blockade N = 61 |
Absent bisynchronization N = 91 |
p-value2,3 |
|---|---|---|---|---|---|
| 1 n (%) | |||||
| 2 Fisher’s exact test | |||||
| 3 *p<0.05; **p<0.01; ***p<0.001 | |||||
There are many more modifiers and styling tools to be found in the gtsummary reference section.
Exporting Tables
Once satisfied, I use gt to export my tables directly to Word for insertion into a manuscript. Alternatively, you could export to other formats such as Excel, TeX, or HTML. Exporting to HTML also allows you to inline CSS, which is how I included the rendered tables shown in this blog post.
table1 |> as_gt() |> gtsave("tables/table1.docx")
table2 |> as_gt() |> gtsave("tables/table2.docx")
Why I Prefer gtsummary Over the Alternatives
Several other R packages provide similar functionality, such as tableone and table1. I’ve used both, but in my experience gtsummary is the most fully featured, flexible, and easiest to get started with.
One of its biggest strengths is how easily a gtsummary table can be converted into a gt table. This unlocks all the tools available in gt, such as gtsave for exporting tables in different formats, while still keeping the convenience of gtsummary’s streamlined syntax.
For anyone doing clinical research, I can’t recommend gtsummary enough. It has dramatically simplified my workflow and saved me countless hours of formatting. An added bonus is that my entire table creation process is stored in version control—no more struggling with Word tables or trying to remember six months later how I managed to get a column to align just right. It's one of those 10x productivity enhancers that makes you wonder how you ever did things without it.