/usr/lib/R/site-library/tibble/doc/tibble.Rmd is in r-cran-tibble 1.4.1-1ubuntu1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
title: "Tibbles"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Tibbles}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(tibble)
set.seed(1014)
```
Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).
## Creating
`tibble()` is a nice way to create data frames. It encapsulates best practices for data frames:
* It never changes an input's type (i.e., no more `stringsAsFactors = FALSE`!).
```{r}
tibble(x = letters)
```
This makes it easier to use with list-columns:
```{r}
tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
```
List-columns are most commonly created by `do()`, but they can be useful to
create by hand.
* It never adjusts the names of variables:
```{r}
names(data.frame(`crazy name` = 1))
names(tibble(`crazy name` = 1))
```
* It evaluates its arguments lazily and sequentially:
```{r}
tibble(x = 1:5, y = x ^ 2)
```
* It never uses `row.names()`. The whole point of tidy data is to
store variables in a consistent way. So it never stores a variable as
special attribute.
* It only recycles vectors of length 1. This is because recycling vectors of greater lengths
is a frequent source of bugs.
## Coercion
To complement `tibble()`, tibble provides `as_tibble()` to coerce objects into tibbles. Generally, `as_tibble()` methods are much simpler than `as.data.frame()` methods, and in fact, it's precisely what `as.data.frame()` does, but it's similar to `do.call(cbind, lapply(x, data.frame))` - i.e. it coerces each component to a data frame and then `cbinds()` them all together.
`as_tibble()` has been written with an eye for performance:
```{r}
if (requireNamespace("microbenchmark", quiet = TRUE)) {
l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters
microbenchmark::microbenchmark(
as_tibble(l),
as.data.frame(l)
)
}
```
The speed of `as.data.frame()` is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.
## Tibbles vs data frames
There are three key differences between tibbles and data frames: printing, subsetting, and recycling rules.
### Printing
When you print a tibble, it only shows the first ten rows and all the columns that fit on one screen. It also prints an abbreviated description of the column type:
```{r}
tibble(x = 1:1000)
```
You can control the default appearance with options:
* `options(tibble.print_max = n, tibble.print_min = m)`: if there are more
than `n` rows, print only the first `m` rows. Use
`options(tibble.print_max = Inf)` to always show all rows.
* `options(tibble.width = Inf)` will always print all columns, regardless
of the width of the screen.
### Subsetting
Tibbles are quite strict about subsetting. `[` always returns another tibble. Contrast this with a data frame: sometimes `[` returns a data frame and sometimes it just returns a vector:
```{r}
df1 <- data.frame(x = 1:3, y = 3:1)
class(df1[, 1:2])
class(df1[, 1])
df2 <- tibble(x = 1:3, y = 3:1)
class(df2[, 1:2])
class(df2[, 1])
```
To extract a single column use `[[` or `$`:
```{r}
class(df2[[1]])
class(df2$x)
```
Tibbles are also stricter with `$`. Tibbles never do partial matching, and will throw a warning and return `NULL` if the column does not exist:
```{r, error = TRUE}
df <- data.frame(abc = 1)
df$a
df2 <- tibble(abc = 1)
df2$a
```
tibbles also ignore the `drop` argument:
```{r}
tibble(a = 1:3)[, "a", drop = TRUE]
```
### Recycling
When constructing a tibble, only values of length 1 are recycled. The first column with length different to one determines the number of rows in the tibble, conflicts lead to an error. This also extends to tibbles with *zero* rows, which is sometimes important for programming:
```{r, error = TRUE}
tibble(a = 1, b = 1:3)
tibble(a = 1:3, b = 1)
tibble(a = 1:3, c = 1:2)
tibble(a = 1, b = integer())
tibble(a = integer(), b = 1)
```
|