Friday, March 11, 2016

New R package: a dictionary with arbitrary keys and values

Coming from Python, the absence of a real dictionary in R has annoyed me for quite some time. Now, I actually needed to use vectors as keys in R:
library(dict)

d <- dict()

d[[1]] <- 42
d[[c(2, 3)]] <- "Hello!"
d[["foo"]] <- "bar"
d[[1]]
d[[c(2, 3)]]
d$get("not here", "default")

d$keys()
d$values()
d$items()

# [[ ]] gives an error for unknown keys
d[["?"]]

Under the hood, separate C++ dictionaries (unordered_map) are created for the different types of keys. Using R's flexible types that are reflected in Rcpp (as SEXP), such a dictionary can store both numbers and strings (and other objects) at the same time.

The package is available on GitHub: https://github.com/mkuhn/dict

Tuesday, March 8, 2016

Avoiding unnecessary memory allocations in R

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:
The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x - ys), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar.
In my case, I want to count the number of items in a vector below a certain threshold. R will allocate a new vector for the result of the comparison, and then sum over that vector. It's possible to speed that up about ten-fold by directly counting in C++:


Often this won't be the bottleneck, but may be useful at some point.