Home > Uncategorized > Summary statistics by group with R

Summary statistics by group with R

I was working on profiling some code today and wanted to obtain some summary statistics by groups with two factors. The original source was a log4j file that included entries from an aspect based logger I had enabled. I had already written a small perl script to extract the pertinent information and generate a CSV file with (clazz,method,elapsed) entries, so I was looking for some standard statistics like mean, median, etc. based on clazz+method combinations.

My initial approach looked like:

metrics <- read.csv('some_metrics.csv',header=T)
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), median) -> medians
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), mean) -> means
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), min) -> mins
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), max) -> maxes
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), length) -> lengths
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), sum) -> sums
s <- mins
s$MIN <- s$x
s$x <- NULL
s$MAX = maxes$x
s$MEAN = means$x
s$MEDIAN = medians$x
s$NUM = lengths$x
s$SUM = sums$x
rm(mins,means,maxes,medians,sums)

This was obviously less than ideal, although I could wrap this in a function it is a bit ugly and cumbersome. I searched the R-help mailing list and found some references to the doBy package, which “grew out of a need to calculate groupwise summary statistics in a simple way”. The summaryBy function in this package turned out to be exactly what I needed and simplified by code to:

summarize <- function(csvfile) {
	require(doBy)
	metrics.csv <- read.csv(csvfile,header=T)
	metrics <- summaryBy(elapsed ~ clazz + method, data=metrics.csv, FUN=c(mean,median,min,max,sum,length))
	write.csv(metrics, file='export.csv', quote=F, row.names=F)
	metrics
}
metrics <- summarize('some_metrics.csv')
Categories: Uncategorized Tags:
  1. June 5th, 2009 at 21:54 | #1

    Hi, Congratulations to the site owner for this marvelous work you’ve done. It has lots of useful and interesting data.

  1. No trackbacks yet.