r - Weighted sum of variables by groups with data.table -


i looking solution compute weighted sum of variables groups data.table. hope example clear enough.

require(data.table)  dt <- data.table(matrix(1:200, nrow = 10)) dt[, gr := c(rep(1,5), rep(2,5))] dt[, w := 2]  # error: object 'w' not found dt[, lapply(.sd, function(x) sum(x * w)),    .sdcols = paste0("v", 1:4)]  # error: object 'w' not found dt[, lapply(.sd * w, sum),    .sdcols = paste0("v", 1:4)]  # works out groups dt[, lapply(.sd, function(x) sum(x * dt$w)),    .sdcols = paste0("v", 1:4)]  # not work groups dt[, lapply(.sd, function(x) sum(x * dt$w)),    .sdcols = paste0("v", 1:4), keyby = gr]  # result expected dt[, list(v1 = sum(v1 * w),           v2 = sum(v2 * w),           v3 = sum(v3 * w),           v4 = sum(v4 * w)), keyby = gr]  ### aruns answer dt[, lapply(.sd[, paste0("v", 1:4), = f],             function(x) sum(x*w)), by=gr] 

final attempt (copying roland's answer :))

copying @roland's excellent answer:

print(dt[, lapply(.sd, function(x, w) sum(x*w), w=w), by=gr][, w := null]) 

still not efficient one: (second attempt)

following @roland's comment, it's indeed faster operation on columns , remove unwanted ones (as long operation not time consuming, case here).

dt[, {lapply(.sd, function(x) sum(x*w))}, by=gr][, w := null][] 

for reason, w seems not found when don't use {}.. no idea why though.


old (inefficient) answer:

(subsetting can costly if there many groups)

you can without using .sdcols , removing while providing lapply follows:

dt[, lapply(.sd[, -1, with=false], function(x) sum(x*w)), by=gr] #    gr v1  v2  v3  v4 # 1:  1 20 120 220 320 # 2:  2 70 170 270 370 

.sdcols makes .sd without w column. so, it's not possible multiply w doesn't exist within scope of .sd environment then.


Comments