I disagree. Doing data manipulation one action at a time in a piped sequence is easiest to reason about because the state right before you apply a new operation is always clear.
data.table, on the other hand, is a fancy clever gadget with many knobs and buttons you have to turn and press just so to get the desired result. It's only simple if all you do is filter, group by, and summarize.
To illustrate, let's look at what you have to do in data.table in order to achieve the equivalent of a grouped filter in dplyr (from the dtplyr translation vignette):
dplyr:
df %>%
group_by(a) %>%
filter(b < mean(b))
data.table:
DT[DT[, .I[b < mean(b)],
by = .(a)]$V1]
Compared to the simple, declarative feel of the dplyr, there's a lot of weird stuff going on in the data.table version. You have to put DT inside itself? What is .I? Where did V1 come from? Janky stuff.
(And yes I know precisely what is going on in the data.table version, I just think it's ugly and illustrates my point about composability and legibility extremely well.)
The reason data.table has all these independent knobs is because it wants you to cram your entire query into a single command, so it can optimize the query more easily and squeeze every drop of performance. NOT because it's more understandable, because it isn't.
The best of both worlds -- an optimizable query and one-action-at-a-time syntax -- can be achieved with a lazy system like Apache Spark or dtplyr.
data.table, on the other hand, is a fancy clever gadget with many knobs and buttons you have to turn and press just so to get the desired result. It's only simple if all you do is filter, group by, and summarize.
To illustrate, let's look at what you have to do in data.table in order to achieve the equivalent of a grouped filter in dplyr (from the dtplyr translation vignette):
dplyr:
data.table: Compared to the simple, declarative feel of the dplyr, there's a lot of weird stuff going on in the data.table version. You have to put DT inside itself? What is .I? Where did V1 come from? Janky stuff.(And yes I know precisely what is going on in the data.table version, I just think it's ugly and illustrates my point about composability and legibility extremely well.)
The reason data.table has all these independent knobs is because it wants you to cram your entire query into a single command, so it can optimize the query more easily and squeeze every drop of performance. NOT because it's more understandable, because it isn't.
The best of both worlds -- an optimizable query and one-action-at-a-time syntax -- can be achieved with a lazy system like Apache Spark or dtplyr.