/usr/lib/R/site-library/tibble/doc/extending.Rmd is in r-cran-tibble 1.4.1-1ubuntu1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 | ---
title: "Extending tibble"
author: "Kirill Müller, Hadley Wickham"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Extending tibble}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:
* `type_sum()` determines what goes into the column header.
* `pillar_shaft()` determines what goes into the body of the column.
* `is_vector_s3()` and `obj_sum()` are used when rendering list columns.
If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the `pillar` package that powers the printing of tibbles. Either add `pillar` to the `Imports:` section of your `DESCRIPTION`, or simply call:
```{r, eval = FALSE}
usethis::use_package("pillar")
```
This short vignette assumes a package that implements an S3 class `"latlon"` and uses `roxygen2` to create documentation and the `NAMESPACE` file. For this vignette to work we need to attach pillar:
## Prerequisites
We define a class `"latlon"` that encodes geographic coordinates in a complex number. For simplicity, the values are printed as degrees and minutes only.
```{r}
#' @export
latlon <- function(lat, lon) {
as_latlon(complex(real = lon, imaginary = lat))
}
#' @export
as_latlon <- function(x) {
structure(x, class = "latlon")
}
#' @export
c.latlon <- function(x, ...) {
as_latlon(NextMethod())
}
#' @export
`[.latlon` <- function(x, i) {
as_latlon(NextMethod())
}
#' @export
format.latlon <- function(x, ..., formatter = deg_min) {
x_valid <- which(!is.na(x))
lat <- unclass(Im(x[x_valid]))
lon <- unclass(Re(x[x_valid]))
ret <- rep("<NA>", length(x))
ret[x_valid] <- paste(
formatter(lat, c("N", "S")),
formatter(lon, c("E", "W"))
)
format(ret, justify = "right")
}
deg_min <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
min <- round(x * 60)
ret <- sprintf("%d°%.2d'%s", deg, min, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
#' @export
print.latlon <- function(x, ...) {
cat(format(x), sep = "\n")
invisible(x)
}
latlon(32.7102978, -117.1704058)
```
More methods are needed to make this class fully compatible with data frames, see e.g. the [hms](https://github.com/tidyverse/hms/) package for a more complete example.
## Using in a tibble
Columns on this class can be used in a tibble right away, but the output will be less than ideal:
```{r}
library(tibble)
data <- tibble(
venue = "rstudio::conf",
year = 2017:2019,
loc = latlon(
c(28.3411783, 32.7102978, NA),
c(-81.5480348, -117.1704058, NA)
),
paths = list(
loc[1],
c(loc[1], loc[2]),
loc[2]
)
)
data
```
(The `paths` column is a list that contains arbitrary data, in our case `latlon` vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)
The output has three main problems:
1. The column type of the `loc` column is displayed as `<S3: latlon>`. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.
1. The values in the `loc` column are formatted as complex numbers (the underlying storage), without using the `format()` method we have defined. This is by design.
1. The cells in the `paths` column are also displayed as `<S3: latlon>`.
In the remainder I'll show how to fix these problems, and also how to implement rendering that adapts to the available width.
## Fixing the data type
To display `<geo>` as data type, we need to override the `type_sum()` method. This method should return a string that can be used in a column header. For your own classes, strive for an evocative abbreviation that's under 6 characters.
```{r include=FALSE}
import::from(pillar, type_sum)
```
```{r}
#' @importFrom pillar type_sum
#' @export
type_sum.latlon <- function(x) {
"geo"
}
```
Because the value shown there doesn't depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see [#53](https://github.com/r-lib/pillar/pull/53).)
```{r}
data
```
## Rendering the value
To use our format method for rendering, we implement the `pillar_shaft()` method for our class. (A [*pillar*](https://en.wikipedia.org/wiki/Column#Nomenclature) is mainly a *shaft* (decorated with an *ornament*), with a *capital* above and a *base* below. Multiple pillars form a *colonnade*, which can be stacked in multiple *tiers*. This is the motivation behind the names in our API.)
```{r include=FALSE}
import::from(pillar, pillar_shaft)
```
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right")
}
```
The simplest variant calls our `format()` method, everything else is handled by pillar, in particular by the `new_pillar_shaft_simple()` helper. Note how the `align` argument affects the alignment of NA values and of the column name and type.
```{r}
data
```
We could also use left alignment and indent only the `NA` values:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
data
```
## Adaptive rendering
If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn't currently apply to our class, because we haven't specified a minimum width for our values:
```{r}
print(data, width = 35)
```
If we specify a minimum width when constructing the shaft, the `loc` column will be truncated:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right", min_width = 10)
}
print(data, width = 35)
```
This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the `pillar_shaft()` method is required to achieve this:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
deg <- format(x, formatter = deg)
deg[is.na(x)] <- pillar::style_na("NA")
deg_min <- format(x)
deg_min[is.na(x)] <- pillar::style_na("NA")
pillar::new_pillar_shaft(
list(deg = deg, deg_min = deg_min),
width = pillar::get_max_extent(deg_min),
min_width = pillar::get_max_extent(deg),
subclass = "pillar_shaft_latlon"
)
}
```
Here, `pillar_shaft()` returns an object of the `"pillar_shaft_latlon"` class created by the generic `new_pillar_shaft()` constructor. This object contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of `NA` values explicitly. (`get_max_extent()` is a helper that computes the maximum display width occupied by the values in a character vector.)
For completeness, the code that implements the degree-only formatting looks like this:
```{r}
deg <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- round(x)
ret <- sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
```
All that's left to do is to implement a `format()` method for our new `"pillar_shaft_latlon"` class. This method will be called with a `width` argument, which then determines which of the formattings to choose:
```{r}
#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
if (all(crayon::col_nchar(x$deg_min) <= width)) {
ornament <- x$deg_min
} else {
ornament <- x$deg
}
pillar::new_ornament(ornament)
}
data
print(data, width = 35)
```
## Adding color
Both `new_pillar_shaft_simple()` and `new_ornament()` accept ANSI escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. `style_subtle()` displays text in a light gray. For default data types, this style is used for insignificant digits. We'll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the [crayon](https://cran.r-project.org/package=crayon) package to add custom formattings to your output.
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x, formatter = deg_min_color)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
deg_min_color <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
rad <- round(x * 60)
ret <- sprintf(
"%d%s%.2d%s%s",
deg,
pillar::style_subtle("°"),
rad,
pillar::style_subtle("'"),
pm[ifelse(sign >= 0, 1, 2)]
)
ret[is.na(x)] <- ""
format(ret, justify = "right")
}
data
```
Currently, ANSI escapes are not rendered in vignettes, so the display here isn't much different from earlier examples. This may change in the future.
## Fixing list columns
To tweak the output in the `paths` column, we simply need to indicate that our class is an S3 vector:
```{r include=FALSE}
import::from(pillar, is_vector_s3)
```
```{r}
#' @importFrom pillar is_vector_s3
#' @export
is_vector_s3.latlon <- function(x) TRUE
data
```
This is picked up by the default implementation of `obj_sum()`, which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an `obj_sum()` method for your class if your object is vectorised and built on top of a list.
An example of an object of this type in base R is `POSIXlt`: it is a list with 9 components.
```{r}
x <- as.POSIXlt(Sys.time() + c(0, 60, 3600))
str(unclass(x))
```
But it pretends to be a vector with 3 elements:
```{r}
x
length(x)
str(x)
```
So we need to define a method that returns a character vector the same length as `x`:
```{r include=FALSE}
import::from(pillar, obj_sum)
```
```{r}
#' @importFrom pillar obj_sum
#' @export
obj_sum.POSIXlt <- function(x) {
rep("POSIXlt", length(x))
}
```
## Testing
If you want to test the output of your code, you can compare it with a known state recorded in a text file. For this, pillar offers the `expect_known_display()` expectation which requires and works best with the testthat package. Make sure that the output is generated only by your package to avoid inconsistencies when external code is updated. Here, this means that you test only the shaft portion of the pillar, and not the entire pillar or even a tibble that contains a column with your data type!
The tests work best with the testthat package:
```{r}
library(testthat)
```
```{r include = FALSE}
unlink("latlon.txt")
unlink("latlon-bw.txt")
```
The code below will compare the output of `pillar_shaft(data$loc)` with known output stored in the `latlon.txt` file. The first run warns because the file doesn't exist yet.
```{r error = TRUE, warning = TRUE}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
```
From the second run on, the printing will be compared with the file:
```{r}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
```
However, if we look at the file we'll notice strange things: The output contains ANSI escapes!
```{r}
readLines("latlon.txt")
```
We can turn them off by passing `crayon = FALSE` to the expectation, but we need to run twice again:
```{r error = TRUE, warning = TRUE}
library(testthat)
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
```
```{r}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
readLines("latlon.txt")
```
You may want to create a series of output files for different scenarios:
- Colored vs. plain (to simplify viewing differences)
- With or without special Unicode characters (if your output uses them)
- Different widths
For this it is helpful to create your own expectation function. Use the tidy evaluation framework to make sure that construction and printing happens at the right time:
```{r}
expect_known_latlon_display <- function(x, file_base) {
quo <- rlang::quo(pillar::pillar_shaft(x))
pillar::expect_known_display(
!! quo,
file = paste0(file_base, ".txt")
)
pillar::expect_known_display(
!! quo,
file = paste0(file_base, "-bw.txt"),
crayon = FALSE
)
}
```
```{r error = TRUE, warning = TRUE}
test_that("latlon pillar matches known output", {
expect_known_latlon_display(data$loc, file_base = "latlon")
})
```
```{r}
readLines("latlon.txt")
readLines("latlon-bw.txt")
```
Learn more about the tidyeval framework in the [dplyr vignette](http://dplyr.tidyverse.org/articles/programming.html).
```{r include = FALSE}
unlink("latlon.txt")
unlink("latlon-bw.txt")
```
|