Appendix B — Converting data structures to flextable

The as_flextable() function is a generic function that converts various R objects into flextable objects. This chapter focuses on the most common use cases in pharmaceutical reporting: creating demographic tables with summarizor(), frequency tables with table() or proc_freq().

B.1 Understanding as_flextable()

The as_flextable() function is designed to work with different types of R objects:

Data frames created by summarizor() for summary statistics
Tables created by table() for frequency distributions
Statistical models (lm, glm, gam, merMod, htest, etc.)
Other tabular objects

Each method has its own set of parameters to customize the output. The function automatically handles the structure of these objects to create publication-ready tables.

B.2 Demographic tables with summarizor()

The summarizor() function computes descriptive statistics for each variable, optionally grouped by categories. It returns a data frame specifically structured to work with as_flextable().

B.2.1 Basic workflow

The process involves two steps:

Use summarizor() to compute summary statistics
Use as_flextable() to convert the result into a formatted table

Code

ex_adsl <- formatters::ex_adsl

set_flextable_defaults(
  border.color = "#AAAAAA",
  font.family = "Arial",
  font.size = 10,
  padding = 3,
  line_spacing = 1.4
)

# Select relevant variables for demographics
adsl <- select(ex_adsl, AGE, SEX, COUNTRY, ARM)

# Extract variable labels from attributes
col_labels <- map_chr(adsl, function(x) attr(x, "label"))

# Create summary statistics by treatment arm
ft <- summarizor(adsl, by = "ARM") |>
  as_flextable(
    sep_w = 0,
    separate_with = "variable",
    spread_first_col = TRUE
  ) |>
  align(i = ~ !is.na(variable), align = "left") |>
  prepend_chunks(i = ~ is.na(variable), j = "stat", as_chunk("\t")) |>
  labelizor(
    j = "stat",
    labels = col_labels,
    part = "all"
  ) |>
  autofit() |>
  add_header_lines(
    c(
      "x.x: Study Subject Data",
      "x.x.x: Demographic Characteristics",
      "Table x.x.x.x: Demographic Characteristics - Full Analysis Set"
    )
  ) |>
  add_footer_lines("Source: ADSL DDMMYYYY hh:mm; Listing x.xx; SDTM package: DDMMYYYY")

ft

x.x: Study Subject Data
x.x.x: Demographic Characteristics
Table x.x.x.x: Demographic Characteristics - Full Analysis Set
	A: Drug X (N=134)	B: Placebo (N=134)	C: Combination (N=132)
Age
Mean (SD)	33.77 (6.55)	35.43 (7.90)	35.43 (7.72)
Median (IQR)	33.00 (11.00)	35.00 (10.00)	35.00 (10.00)
Range	21.00 - 50.00	21.00 - 62.00	20.00 - 69.00
Sex
F	79 (59.0%)	77 (57.5%)	66 (50.0%)
M	51 (38.1%)	55 (41.0%)	60 (45.5%)
U	3 (2.2%)	2 (1.5%)	4 (3.0%)
UNDIFFERENTIATED	1 (0.7%)	0 (0.0%)	2 (1.5%)
Country
CHN	74 (55.2%)	81 (60.4%)	64 (48.5%)
USA	10 (7.5%)	13 (9.7%)	17 (12.9%)
BRA	13 (9.7%)	7 (5.2%)	10 (7.6%)
PAK	12 (9.0%)	9 (6.7%)	10 (7.6%)
NGA	8 (6.0%)	7 (5.2%)	11 (8.3%)
RUS	5 (3.7%)	8 (6.0%)	6 (4.5%)
JPN	5 (3.7%)	4 (3.0%)	9 (6.8%)
GBR	4 (3.0%)	3 (2.2%)	2 (1.5%)
CAN	3 (2.2%)	2 (1.5%)	3 (2.3%)
CHE	0 (0.0%)	0 (0.0%)	0 (0.0%)
Source: ADSL DDMMYYYY hh:mm; Listing x.xx; SDTM package: DDMMYYYY

The by argument controls grouping:

by = "ARM": Groups statistics by treatment arm
by = NULL: No grouping, overall statistics only

B.2.2 Customizing numeric statistics

You can select which statistics to display for numeric variables:

Code

# Select specific statistics to display
summary_custom <- summarizor(
  adsl,
  by = "ARM",
  num_stats = c("range", "median_iqr")
)

as_flextable(summary_custom, spread_first_col = TRUE) |>
  autofit() |>
  labelizor(
    j = "stat",
    labels = c(
      AGE = "Age (years)",
      COUNTRY = "Country",
      SEX = "Sex"
    ),
    part = "all"
  )

	A: Drug X (N=134)	B: Placebo (N=134)	C: Combination (N=132)
Age (years)
Median (IQR)	33.00 (11.00)	35.00 (10.00)	35.00 (10.00)
Range	21.00 - 50.00	21.00 - 62.00	20.00 - 69.00
Sex
F	79 (59.0%)	77 (57.5%)	66 (50.0%)
M	51 (38.1%)	55 (41.0%)	60 (45.5%)
U	3 (2.2%)	2 (1.5%)	4 (3.0%)
UNDIFFERENTIATED	1 (0.7%)	0 (0.0%)	2 (1.5%)
Country
CHN	74 (55.2%)	81 (60.4%)	64 (48.5%)
USA	10 (7.5%)	13 (9.7%)	17 (12.9%)
BRA	13 (9.7%)	7 (5.2%)	10 (7.6%)
PAK	12 (9.0%)	9 (6.7%)	10 (7.6%)
NGA	8 (6.0%)	7 (5.2%)	11 (8.3%)
RUS	5 (3.7%)	8 (6.0%)	6 (4.5%)
JPN	5 (3.7%)	4 (3.0%)	9 (6.8%)
GBR	4 (3.0%)	3 (2.2%)	2 (1.5%)
CAN	3 (2.2%)	2 (1.5%)	3 (2.3%)
CHE	0 (0.0%)	0 (0.0%)	0 (0.0%)

Available numeric statistics:

"mean_sd": Mean (Standard Deviation)
"median_iqr": Median [Q1, Q3]
"range": Min - Max

B.2.3 Understanding as_flextable() parameters for summarizor

The as_flextable() method for summarizor data frames has specific parameters:

spread_first_col: When TRUE, spreads the grouping variable across columns instead of rows
sep_w: Width of separation space (0 = no space)

Code

# Compare different layouts
summary_data <- summarizor(adsl, by = "ARM")

# Layout 1: Spread groups across columns
ft1 <- as_flextable(
  summary_data,
  spread_first_col = TRUE,
  sep_w = 0
) |>
  autofit()

ft1

	A: Drug X (N=134)	B: Placebo (N=134)	C: Combination (N=132)
AGE
Mean (SD)	33.77 (6.55)	35.43 (7.90)	35.43 (7.72)
Median (IQR)	33.00 (11.00)	35.00 (10.00)	35.00 (10.00)
Range	21.00 - 50.00	21.00 - 62.00	20.00 - 69.00
SEX
F	79 (59.0%)	77 (57.5%)	66 (50.0%)
M	51 (38.1%)	55 (41.0%)	60 (45.5%)
U	3 (2.2%)	2 (1.5%)	4 (3.0%)
UNDIFFERENTIATED	1 (0.7%)	0 (0.0%)	2 (1.5%)
COUNTRY
CHN	74 (55.2%)	81 (60.4%)	64 (48.5%)
USA	10 (7.5%)	13 (9.7%)	17 (12.9%)
BRA	13 (9.7%)	7 (5.2%)	10 (7.6%)
PAK	12 (9.0%)	9 (6.7%)	10 (7.6%)
NGA	8 (6.0%)	7 (5.2%)	11 (8.3%)
RUS	5 (3.7%)	8 (6.0%)	6 (4.5%)
JPN	5 (3.7%)	4 (3.0%)	9 (6.8%)
GBR	4 (3.0%)	3 (2.2%)	2 (1.5%)
CAN	3 (2.2%)	2 (1.5%)	3 (2.3%)
CHE	0 (0.0%)	0 (0.0%)	0 (0.0%)

Code

# Layout 2: Groups as rows
ft2 <- as_flextable(
  summary_data,
  spread_first_col = FALSE
) |>
  autofit() |>
  add_header_lines("Layout 2: Groups as rows")

ft2

Layout 2: Groups as rows
		A: Drug X (N=134)	B: Placebo (N=134)	C: Combination (N=132)
Age	Mean (SD)	33.77 (6.55)	35.43 (7.90)	35.43 (7.72)
	Median (IQR)	33.00 (11.00)	35.00 (10.00)	35.00 (10.00)
	Range	21.00 - 50.00	21.00 - 62.00	20.00 - 69.00
Sex	F	79 (59.0%)	77 (57.5%)	66 (50.0%)
	M	51 (38.1%)	55 (41.0%)	60 (45.5%)
	U	3 (2.2%)	2 (1.5%)	4 (3.0%)
	UNDIFFERENTIATED	1 (0.7%)	0 (0.0%)	2 (1.5%)
Country	CHN	74 (55.2%)	81 (60.4%)	64 (48.5%)
	USA	10 (7.5%)	13 (9.7%)	17 (12.9%)
	BRA	13 (9.7%)	7 (5.2%)	10 (7.6%)
	PAK	12 (9.0%)	9 (6.7%)	10 (7.6%)
	NGA	8 (6.0%)	7 (5.2%)	11 (8.3%)
	RUS	5 (3.7%)	8 (6.0%)	6 (4.5%)
	JPN	5 (3.7%)	4 (3.0%)	9 (6.8%)
	GBR	4 (3.0%)	3 (2.2%)	2 (1.5%)
	CAN	3 (2.2%)	2 (1.5%)	3 (2.3%)
	CHE	0 (0.0%)	0 (0.0%)	0 (0.0%)

B.3 Frequency tables with table() and as_flextable()

The base R table() function creates contingency tables that can be converted to flextables. This is useful for displaying categorical data distributions.

B.3.1 One-way frequency tables

A one-way table shows the distribution of a single categorical variable:

Code

# Create a simple frequency table
sex_table <- table(ex_adsl$SEX)

# Convert to flextable
as_flextable(sex_table) |>
  set_header_labels(value = "Sex", stat = "Count") |>
  autofit() |>
  add_header_lines("Distribution by Sex")

Distribution by Sex
Var1	Count	Percent
F	222	55.5%
M	166	41.5%
U	9	2.2%
UNDIFFERENTIATED	3	0.8%
Total	400	100.0%

B.3.2 Two-way frequency tables

Two-way tables show the cross-tabulation of two categorical variables:

Code

# Create a two-way contingency table
sex_arm_table <- table(
  Sex = ex_adsl$SEX,
  Treatment = ex_adsl$ARM
)

# Convert to flextable
as_flextable(sex_arm_table) |>
  autofit() |>
  add_header_lines("Sex Distribution by Treatment Arm") |>
  align(j = 1, align = "left", part = "body") |>
  align(j = -1, align = "center", part = "all")

Sex Distribution by Treatment Arm
Sex		Treatment
Sex		A: Drug X	B: Placebo	C: Combination	Total
F	Count	79 (19.8%)	77 (19.2%)	66 (16.5%)	222 (55.5%)
F	Mar. pct (1)	59.0% ; 35.6%	57.5% ; 34.7%	50.0% ; 29.7%
M	Count	51 (12.8%)	55 (13.8%)	60 (15.0%)	166 (41.5%)
M	Mar. pct	38.1% ; 30.7%	41.0% ; 33.1%	45.5% ; 36.1%
U	Count	3 (0.8%)	2 (0.5%)	4 (1.0%)	9 (2.2%)
U	Mar. pct	2.2% ; 33.3%	1.5% ; 22.2%	3.0% ; 44.4%
UNDIFFERENTIATED	Count	1 (0.2%)	0 (0.0%)	2 (0.5%)	3 (0.8%)
UNDIFFERENTIATED	Mar. pct	0.7% ; 33.3%	0.0% ; 0.0%	1.5% ; 66.7%
Total	Count	134 (33.5%)	134 (33.5%)	132 (33.0%)	400 (100.0%)
(1) Columns and rows percentages