Appendix B — Converting data structures to flextable

The as_flextable() function is a generic function that converts various R objects into flextable objects. This chapter focuses on the most common use cases in pharmaceutical reporting: creating demographic tables with summarizor(), frequency tables with table() or proc_freq().

B.1 Understanding as_flextable()

The as_flextable() function is designed to work with different types of R objects:

  • Data frames created by summarizor() for summary statistics
  • Tables created by table() for frequency distributions
  • Statistical models (lm, glm, gam, merMod, htest, etc.)
  • Other tabular objects

Each method has its own set of parameters to customize the output. The function automatically handles the structure of these objects to create publication-ready tables.

B.2 Demographic tables with summarizor()

The summarizor() function computes descriptive statistics for each variable, optionally grouped by categories. It returns a data frame specifically structured to work with as_flextable().

B.2.1 Basic workflow

The process involves two steps:

  1. Use summarizor() to compute summary statistics
  2. Use as_flextable() to convert the result into a formatted table
Code
ex_adsl <- formatters::ex_adsl

set_flextable_defaults(
  border.color = "#AAAAAA",
  font.family = "Arial",
  font.size = 10,
  padding = 3,
  line_spacing = 1.4
)

# Select relevant variables for demographics
adsl <- select(ex_adsl, AGE, SEX, COUNTRY, ARM)

# Extract variable labels from attributes
col_labels <- map_chr(adsl, function(x) attr(x, "label"))

# Create summary statistics by treatment arm
ft <- summarizor(adsl, by = "ARM") |>
  as_flextable(
    sep_w = 0,
    separate_with = "variable",
    spread_first_col = TRUE
  ) |>
  align(i = ~ !is.na(variable), align = "left") |>
  prepend_chunks(i = ~ is.na(variable), j = "stat", as_chunk("\t")) |>
  labelizor(
    j = "stat",
    labels = col_labels,
    part = "all"
  ) |>
  autofit() |>
  add_header_lines(
    c(
      "x.x: Study Subject Data",
      "x.x.x: Demographic Characteristics",
      "Table x.x.x.x: Demographic Characteristics - Full Analysis Set"
    )
  ) |>
  add_footer_lines("Source: ADSL DDMMYYYY hh:mm; Listing x.xx; SDTM package: DDMMYYYY")

ft

x.x: Study Subject Data

x.x.x: Demographic Characteristics

Table x.x.x.x: Demographic Characteristics - Full Analysis Set

A: Drug X
(N=134)

B: Placebo
(N=134)

C: Combination
(N=132)

Age

Mean (SD)

33.77 (6.55)

35.43 (7.90)

35.43 (7.72)

Median (IQR)

33.00 (11.00)

35.00 (10.00)

35.00 (10.00)

Range

21.00 - 50.00

21.00 - 62.00

20.00 - 69.00

Sex

F

79 (59.0%)

77 (57.5%)

66 (50.0%)

M

51 (38.1%)

55 (41.0%)

60 (45.5%)

U

3 (2.2%)

2 (1.5%)

4 (3.0%)

UNDIFFERENTIATED

1 (0.7%)

0 (0.0%)

2 (1.5%)

Country

CHN

74 (55.2%)

81 (60.4%)

64 (48.5%)

USA

10 (7.5%)

13 (9.7%)

17 (12.9%)

BRA

13 (9.7%)

7 (5.2%)

10 (7.6%)

PAK

12 (9.0%)

9 (6.7%)

10 (7.6%)

NGA

8 (6.0%)

7 (5.2%)

11 (8.3%)

RUS

5 (3.7%)

8 (6.0%)

6 (4.5%)

JPN

5 (3.7%)

4 (3.0%)

9 (6.8%)

GBR

4 (3.0%)

3 (2.2%)

2 (1.5%)

CAN

3 (2.2%)

2 (1.5%)

3 (2.3%)

CHE

0 (0.0%)

0 (0.0%)

0 (0.0%)

Source: ADSL DDMMYYYY hh:mm; Listing x.xx; SDTM package: DDMMYYYY

The by argument controls grouping:

  • by = "ARM": Groups statistics by treatment arm
  • by = NULL: No grouping, overall statistics only

B.2.2 Customizing numeric statistics

You can select which statistics to display for numeric variables:

Code
# Select specific statistics to display
summary_custom <- summarizor(
  adsl,
  by = "ARM",
  num_stats = c("range", "median_iqr")
)

as_flextable(summary_custom, spread_first_col = TRUE) |>
  autofit() |>
  labelizor(
    j = "stat",
    labels = c(
      AGE = "Age (years)",
      COUNTRY = "Country",
      SEX = "Sex"
    ),
    part = "all"
  )

A: Drug X
(N=134)

B: Placebo
(N=134)

C: Combination
(N=132)

Age (years)

Median (IQR)

33.00 (11.00)

35.00 (10.00)

35.00 (10.00)

Range

21.00 - 50.00

21.00 - 62.00

20.00 - 69.00

Sex

F

79 (59.0%)

77 (57.5%)

66 (50.0%)

M

51 (38.1%)

55 (41.0%)

60 (45.5%)

U

3 (2.2%)

2 (1.5%)

4 (3.0%)

UNDIFFERENTIATED

1 (0.7%)

0 (0.0%)

2 (1.5%)

Country

CHN

74 (55.2%)

81 (60.4%)

64 (48.5%)

USA

10 (7.5%)

13 (9.7%)

17 (12.9%)

BRA

13 (9.7%)

7 (5.2%)

10 (7.6%)

PAK

12 (9.0%)

9 (6.7%)

10 (7.6%)

NGA

8 (6.0%)

7 (5.2%)

11 (8.3%)

RUS

5 (3.7%)

8 (6.0%)

6 (4.5%)

JPN

5 (3.7%)

4 (3.0%)

9 (6.8%)

GBR

4 (3.0%)

3 (2.2%)

2 (1.5%)

CAN

3 (2.2%)

2 (1.5%)

3 (2.3%)

CHE

0 (0.0%)

0 (0.0%)

0 (0.0%)

Available numeric statistics:

  • "mean_sd": Mean (Standard Deviation)
  • "median_iqr": Median [Q1, Q3]
  • "range": Min - Max

B.2.3 Understanding as_flextable() parameters for summarizor

The as_flextable() method for summarizor data frames has specific parameters:

  • spread_first_col: When TRUE, spreads the grouping variable across columns instead of rows
  • sep_w: Width of separation space (0 = no space)
Code
# Compare different layouts
summary_data <- summarizor(adsl, by = "ARM")

# Layout 1: Spread groups across columns
ft1 <- as_flextable(
  summary_data,
  spread_first_col = TRUE,
  sep_w = 0
) |>
  autofit()

ft1

A: Drug X
(N=134)

B: Placebo
(N=134)

C: Combination
(N=132)

AGE

Mean (SD)

33.77 (6.55)

35.43 (7.90)

35.43 (7.72)

Median (IQR)

33.00 (11.00)

35.00 (10.00)

35.00 (10.00)

Range

21.00 - 50.00

21.00 - 62.00

20.00 - 69.00

SEX

F

79 (59.0%)

77 (57.5%)

66 (50.0%)

M

51 (38.1%)

55 (41.0%)

60 (45.5%)

U

3 (2.2%)

2 (1.5%)

4 (3.0%)

UNDIFFERENTIATED

1 (0.7%)

0 (0.0%)

2 (1.5%)

COUNTRY

CHN

74 (55.2%)

81 (60.4%)

64 (48.5%)

USA

10 (7.5%)

13 (9.7%)

17 (12.9%)

BRA

13 (9.7%)

7 (5.2%)

10 (7.6%)

PAK

12 (9.0%)

9 (6.7%)

10 (7.6%)

NGA

8 (6.0%)

7 (5.2%)

11 (8.3%)

RUS

5 (3.7%)

8 (6.0%)

6 (4.5%)

JPN

5 (3.7%)

4 (3.0%)

9 (6.8%)

GBR

4 (3.0%)

3 (2.2%)

2 (1.5%)

CAN

3 (2.2%)

2 (1.5%)

3 (2.3%)

CHE

0 (0.0%)

0 (0.0%)

0 (0.0%)

Code
# Layout 2: Groups as rows
ft2 <- as_flextable(
  summary_data,
  spread_first_col = FALSE
) |>
  autofit() |>
  add_header_lines("Layout 2: Groups as rows")

ft2

Layout 2: Groups as rows

A: Drug X
(N=134)

B: Placebo
(N=134)

C: Combination
(N=132)

Age

Mean (SD)

33.77 (6.55)

35.43 (7.90)

35.43 (7.72)

Median (IQR)

33.00 (11.00)

35.00 (10.00)

35.00 (10.00)

Range

21.00 - 50.00

21.00 - 62.00

20.00 - 69.00

Sex

F

79 (59.0%)

77 (57.5%)

66 (50.0%)

M

51 (38.1%)

55 (41.0%)

60 (45.5%)

U

3 (2.2%)

2 (1.5%)

4 (3.0%)

UNDIFFERENTIATED

1 (0.7%)

0 (0.0%)

2 (1.5%)

Country

CHN

74 (55.2%)

81 (60.4%)

64 (48.5%)

USA

10 (7.5%)

13 (9.7%)

17 (12.9%)

BRA

13 (9.7%)

7 (5.2%)

10 (7.6%)

PAK

12 (9.0%)

9 (6.7%)

10 (7.6%)

NGA

8 (6.0%)

7 (5.2%)

11 (8.3%)

RUS

5 (3.7%)

8 (6.0%)

6 (4.5%)

JPN

5 (3.7%)

4 (3.0%)

9 (6.8%)

GBR

4 (3.0%)

3 (2.2%)

2 (1.5%)

CAN

3 (2.2%)

2 (1.5%)

3 (2.3%)

CHE

0 (0.0%)

0 (0.0%)

0 (0.0%)

B.3 Frequency tables with table() and as_flextable()

The base R table() function creates contingency tables that can be converted to flextables. This is useful for displaying categorical data distributions.

B.3.1 One-way frequency tables

A one-way table shows the distribution of a single categorical variable:

Code
# Create a simple frequency table
sex_table <- table(ex_adsl$SEX)

# Convert to flextable
as_flextable(sex_table) |>
  set_header_labels(value = "Sex", stat = "Count") |>
  autofit() |>
  add_header_lines("Distribution by Sex")

Distribution by Sex

Var1

Count

Percent

F

222

55.5%

M

166

41.5%

U

9

2.2%

UNDIFFERENTIATED

3

0.8%

Total

400

100.0%

B.3.2 Two-way frequency tables

Two-way tables show the cross-tabulation of two categorical variables:

Code
# Create a two-way contingency table
sex_arm_table <- table(
  Sex = ex_adsl$SEX,
  Treatment = ex_adsl$ARM
)

# Convert to flextable
as_flextable(sex_arm_table) |>
  autofit() |>
  add_header_lines("Sex Distribution by Treatment Arm") |>
  align(j = 1, align = "left", part = "body") |>
  align(j = -1, align = "center", part = "all")

Sex Distribution by Treatment Arm

Sex

Treatment

A: Drug X

B: Placebo

C: Combination

Total

F

Count

79 (19.8%)

77 (19.2%)

66 (16.5%)

222 (55.5%)

Mar. pct (1)

59.0% ; 35.6%

57.5% ; 34.7%

50.0% ; 29.7%

M

Count

51 (12.8%)

55 (13.8%)

60 (15.0%)

166 (41.5%)

Mar. pct

38.1% ; 30.7%

41.0% ; 33.1%

45.5% ; 36.1%

U

Count

3 (0.8%)

2 (0.5%)

4 (1.0%)

9 (2.2%)

Mar. pct

2.2% ; 33.3%

1.5% ; 22.2%

3.0% ; 44.4%

UNDIFFERENTIATED

Count

1 (0.2%)

0 (0.0%)

2 (0.5%)

3 (0.8%)

Mar. pct

0.7% ; 33.3%

0.0% ; 0.0%

1.5% ; 66.7%

Total

Count

134 (33.5%)

134 (33.5%)

132 (33.0%)

400 (100.0%)

(1) Columns and rows percentages