forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbase-types.Rmd
91 lines (61 loc) · 3.89 KB
/
base-types.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# Base types {#base-types}
Underlying every R object is a C structure (or struct) that describes how that object is stored in memory. The struct includes the contents of the object, the information needed for memory management, and, most importantly for this section, a __type__. This is the __base type__ of an R object.
Base types are not really an object system because only the R core team can create new types. Functions that behave differently for different base types are almost always written in C, where dispatch occurs using switch statements (e.g., `switch(TYPEOF(x))`). As a result, new base types are added very rarely: the most recent change, in 2011, added two exotic types that you never see in R, but are useful for diagnosing memory problems (`NEWSXP` and `FREESXP`). Prior to that, the last type added was a special base type for S4 objects (`S4SXP`) in 2005. \indexc{SEXP} \index{base types} \index{objects!base types}
Even if you never write C code, it's important to understand base types because everything else is built on top of them: S3 objects can be built on top of any base type, S4 objects use a special base type, and R6 objects are a combination of S3 and environments (another base type). To see if an object is a pure base type, i.e., it doesn't also have S3, S4, or R6 behaviour, check that `is.object(x)` returns `FALSE`.
## Types
In [data structures] you learned about the most important set of base types, the vectors (logical, integer, real, complex, character, list, and raw). Other important types are:
Show type hierarchy again. Numeric, atomic vector and vector, are concepts or implicit virtual super classes.
* Functions, including primitive and special functions.
* Environments.
* Symbols and pairlists, used to represent the "abstract syntax tree"
of parsed R code..
In total, there are XXX base types, but it's not important to understand most of them unless you are writing C code.
You can determine an object's base type with `typeof()`: \indexc{typeof()}
```{r}
f <- function() {}
typeof(f)
typeof(sum)
```
Be careful with the `is.` functions as they use a different set of naming conventions:
```{r}
is.function(f)
is.primitive(sum)
```
Generally, "is" functions that check if an object is a specific type are ok. "is" functions that check for a family of types are often surprising. For example, `is.atomic(NULL)` is true, and as well as checking than an object is a vector, `is.vector()` also checks that it has no attributes apart from names.
Instead use rlang.
You may have heard of `mode()` and `storage.mode()`. I recommend ignoring these functions because they're just aliases of the names returned by `typeof()`, and exist solely for S compatibility. Read their source code if you want to understand exactly what they do. \indexc{mode()}
## Switchpatch
```{r}
bytes <- function(x) {
# stopifnot(rlang::is_vector(x))
switch(
typeof(x),
integer = 4,
numeric = 8,
logical = 4,
NA
)
}
```
## Numeric
We need a little extra discussion of the numeric "type" because it's used in three different ways in different places in R.
1. In some places it's used as an alias for "double". For example
`as.numeric()` is identical to `as.double()`, and `numeric()` is
identical to `double()`.
1. In a couple of places it means a base type of either integer or double.
```{r, error = TRUE}
mode(1)
mode(1L)
mode(factor("x"))
```
As well as mode, this convention is used for the implicit class and
for S4 classes.
1. In some places it means an object build on a base type of integer or
double that has numeric behaviour (i.e. arithmetic makes sense and you
can order by numeric values)
```{r}
is.numeric(1)
is.numeric(1L)
is.numeric(factor("x"))
```
Finally, there are few places in which R uses "real" instead of double; `NA_real_` is the one place that you're likely to encounter this in practice.