> tmp<-data.frame(a=rep(c(1,2,3),each=4),b=1:12) > tmp a b 1 1 1 2 1 2 3 1 3 4 1 4 5 2 5 6 2 6 7 2 7 8 2 8 9 3 9 10 3 10 11 3 11 12 3 12 > ddply(tmp,.(a),function(x) c(bb=sum(x$b))) a bb 1 1 10 2 2 26 3 3 42For this trivial example, you'd probably use summarise() - but when you need to manipulate multiple columns in more complex ways, writing your own function can be more efficient.
If you try to translate this into dplyr in a naive and direct way, you will get a silently wrong result:
> tmp %>% group_by(a) %>% (function(x) data.frame(bb=sum(x$b))) bb 1 78WTF, it didn't group!
So how can you do thi? You have to wrap your function rather awkwardly in do():
> tmp %>% group_by(a) %>% do((function(x) data.frame(bb=sum(x$b)))(.)) Source: local data frame [3 x 2] Groups: a [3] a bb (dbl) (int) 1 1 10 2 2 26 3 3 42Also note that to be used in dplyr, your function must return a data frame. This will fail:
> tmp %>% group_by(a) %>% do((function(x) c(bb=sum(x$b)))(.)) Error: Results are not data frames at positions: 1, 2, 3