<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>devjason &#187; R</title>
	<atom:link href="http://www.devjason.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.devjason.com</link>
	<description>Code, Statistics, Maps</description>
	<lastBuildDate>Tue, 22 Sep 2009 17:04:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Visualizing Changes With Levelplot Heatmap</title>
		<link>http://www.devjason.com/2009/06/08/visualizing-changes-with-levelplot-heatmap/</link>
		<comments>http://www.devjason.com/2009/06/08/visualizing-changes-with-levelplot-heatmap/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 23:56:59 +0000</pubDate>
		<dc:creator>jsmith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.devjason.com/?p=108</guid>
		<description><![CDATA[

I recently worked on configuring and enabling Ehcache with our Hibernate objects for common domain objects we query on almost every page request.  I was interested in whether this would improve performance, and how best to illustrate the effect of the pages across threads and actions.

I had written some JMeter tests I am able [...]]]></description>
			<content:encoded><![CDATA[<p><a class="shutterset_" href="http://www.devjason.com/wp-content/gallery/r_generated/heatmap_diff.png"><br />
<img class="ngg-singlepic ngg-left" src="http://www.devjason.com/wp-content/gallery/r_generated/thumbs/thumbs_heatmap_diff.png" alt="heatmap_diff.png" /></a></p>
<p>I recently worked on configuring and enabling <a href="http://ehcache.sourceforge.net/">Ehcache</a> with our <a href="http://www.hibernate.org">Hibernate</a> objects for common domain objects we query on almost every page request.  I was interested in whether this would improve performance, and how best to illustrate the effect of the pages across threads and actions.<br />
<span id="more-108"></span></p>
<p>I had written some JMeter tests I am able to automatically run with different thread loads (5-60 threads stepping by 5) so I generated some response time data for before and after enabling the cache.  Although there is actually many more rows and columns, in general the data looks like:<br />
label elapsed threads<br />
<code><br />
Action    Elapsed  Threads<br />
NtSP      25    10<br />
DLgn      11   10<br />
FtDV       6   10<br />
NtMM      16   10<br />
NtSS      20   15<br />
SrfS     157   10<br />
</code></p>
<p>I then read the log files into two data frames, r1 for the pre-cache times and r2 for the post-cache times.  Using tapply I then generated two matrices of mean response times by thread and action (r1 becomes m1, r2 becomes m2).  From that point it is just a matter of subtracting the m1 matrix from m2 to obtain the differences measured.</p>
<pre class="brush: plain;">
z &lt;- r1
m1 &lt;- tapply(z$elapsed, list(z$label, z$threads), mean)
z &lt;- r2
m2 &lt;- tapply(z$elapsed, list(z$label, z$threads), mean)
m.diff &lt;- m2 - m1
</pre>
<p>Because the final output of the previous calculations is a two-dimensional matrix, a heatmap or similar visualization style seemed like it might be appropriate.  In the real version the action names are not obfuscated.</p>
<pre class="brush: plain;">
# Calculate breaks using midpoints of matrix histogram with initial extension
x &lt;- c(-20000, hist(m.diff)$mids)
levelplot(t(m.diff), scales=list(cex=0.7), aspect=&quot;iso&quot;, col.regions=heat.colors, pretty=F,at=x,xlab='Threads',ylab='Action',main=&quot;Time Difference With Cache&quot;)
</pre>
<p><img class="ngg-singlepic ngg-none" src="http://www.devjason.com/wp-content/gallery/r_generated/heatmap_diff.png" alt="heatmap_diff.png" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.devjason.com/2009/06/08/visualizing-changes-with-levelplot-heatmap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Summary statistics by group with R</title>
		<link>http://www.devjason.com/2009/04/29/summary-statistics-by-group-with-r/</link>
		<comments>http://www.devjason.com/2009/04/29/summary-statistics-by-group-with-r/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 22:28:44 +0000</pubDate>
		<dc:creator>jsmith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.devjason.com/?p=96</guid>
		<description><![CDATA[I was working on profiling some code today and wanted to obtain some summary statistics by groups with two factors.  The original source was a log4j file that included entries from an aspect based logger I had enabled.  I had already written a small perl script to extract the pertinent information and generate [...]]]></description>
			<content:encoded><![CDATA[<p>I was working on profiling some code today and wanted to obtain some summary statistics by groups with two factors.  The original source was a log4j file that included entries from an aspect based logger I had enabled.  I had already written a small perl script to extract the pertinent information and generate a CSV file with (clazz,method,elapsed) entries, so I was looking for some standard statistics like mean, median, etc. based on clazz+method combinations.<br />
<span id="more-96"></span><br />
My initial approach looked like:</p>
<pre class="brush: plain;">
metrics &lt;- read.csv('some_metrics.csv',header=T)
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), median) -&gt; medians
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), mean) -&gt; means
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), min) -&gt; mins
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), max) -&gt; maxes
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), length) -&gt; lengths
aggregate(dce$elapsed, by=list(CLAZZ=dce$clazz,METHOD=dce$method), sum) -&gt; sums
s &lt;- mins
s$MIN &lt;- s$x
s$x &lt;- NULL
s$MAX = maxes$x
s$MEAN = means$x
s$MEDIAN = medians$x
s$NUM = lengths$x
s$SUM = sums$x
rm(mins,means,maxes,medians,sums)
</pre>
<p>This was obviously less than ideal, although I could wrap this in a function it is a bit ugly and cumbersome.  I searched the R-help mailing list and found some references to the doBy package, which &#8220;grew out of a need to calculate groupwise summary statistics in a simple way&#8221;.  The summaryBy function in this package turned out to be exactly what I needed and simplified by code to:</p>
<pre class="brush: plain;">
summarize &lt;- function(csvfile) {
	require(doBy)
	metrics.csv &lt;- read.csv(csvfile,header=T)
	metrics &lt;- summaryBy(elapsed ~ clazz + method, data=metrics.csv, FUN=c(mean,median,min,max,sum,length))
	write.csv(metrics, file='export.csv', quote=F, row.names=F)
	metrics
}
metrics &lt;- summarize('some_metrics.csv')
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.devjason.com/2009/04/29/summary-statistics-by-group-with-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More Quick R Maps: Country View of WHO Confirmed Cases</title>
		<link>http://www.devjason.com/2009/04/28/more-quick-r-maps-country-view-of-who-confirmed-cases/</link>
		<comments>http://www.devjason.com/2009/04/28/more-quick-r-maps-country-view-of-who-confirmed-cases/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 15:38:25 +0000</pubDate>
		<dc:creator>jsmith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cartography]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.devjason.com/?p=87</guid>
		<description><![CDATA[
I realized it should be pretty easy to use the approach as the previous R map I generated to make a smaller scale map at a country level.  In this example I&#8217;m setting up an initial non-plotted map with limits on the X and Y ranges I want to display.  Next I assign [...]]]></description>
			<content:encoded><![CDATA[<p><a class="shutterset_" href='http://www.devjason.com/2009/04/28/more-quick-r-maps-country-view-of-who-confirmed-cases/' title=''><img src='http://www.devjason.com/wp-content/gallery/r_generated/thumbs/thumbs_who_world_swineflu_04282009.png' alt='who_world_swineflu_04282009.png' class='ngg-singlepic ngg-right' /></a></p>
<p>I realized it should be pretty easy to use the approach as the previous R map I generated to make a smaller scale map at a country level.  In this example I&#8217;m setting up an initial non-plotted map with limits on the X and Y ranges I want to display.  Next I assign colors to the observations of this subset map, plot these filled areas, then plot all boundaries.<br />
<span id="more-87"></span><br />
I&#8217;m not really happy with this map because (1) I don&#8217;t really like just throwing out maps with just geographic coordinates, and (2) using area coloring can be a bit misleading about the magnitude and dispersal of the data presented: in this case there are really small clusters of reported cases, not country wide.  I think this map might be a good index map to get an overview of the spread and number of cases, but would need to be backed by more detailed maps that do not generalize the location of outbreaks as much.</p>
<p><img src='http://www.devjason.com/wp-content/gallery/r_generated/who_world_swineflu_04282009.png' alt='who_world_swineflu_04282009.png' class='ngg-singlepic ngg-none' /></p>
<p>Here&#8217;s the source:</p>
<pre class="brush: plain;">
# load required libraries
require(maps)
require(RColorBrewer)

# Create a dataframe with the reported observations
loc &amp;lt;- c('USA', 'Mexico', 'Canada', 'Spain')
cases &amp;lt;- c(40, 26, 6, 1)
flu &amp;lt;- data.frame(loc,cases)

# Setup the coordinate system
m &amp;lt;- map(&amp;quot;world&amp;quot;,plot=F, xlim=c(-180,5),ylim=c(10,90), fill=T)

# Match up our observations
stm &amp;lt;- match.map(m, flu$loc)

# Rank the cases and assign colors using the RColorBrewer YlOrRd palette
flu$rank &amp;lt;- rank(flu$cases, ties=&amp;quot;min&amp;quot;)
pal &amp;lt;- brewer.pal(max(flu$rank),&amp;quot;YlOrRd&amp;quot;)
color &amp;lt;- pal[flu$rank]
flu.color &amp;lt;- color[stm]

# Do the drawing
map(m,col=flu.color,fill=T, lty=0,boundary=F,interior=F) # fill regions
map('world',interior=T,add=T,col=&amp;quot;grey30&amp;quot;) # plot boundaries
map.axes()
grid(col=&amp;quot;grey50&amp;quot;)
title(&amp;quot;WHO Confirmed Cases of Swine Flu by Country (28 April 2009)&amp;quot;)
legend('bottomleft', legend=paste(flu$loc,flu$cases),
	fill=color, bg=&amp;quot;white&amp;quot;, horiz=T, cex=0.75,
	title=paste(&amp;quot;Jason B. Smith | 28 April 2009 | Source: CNN.com&amp;quot;))
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.devjason.com/2009/04/28/more-quick-r-maps-country-view-of-who-confirmed-cases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using R for Quick Mapping of Swine Flu</title>
		<link>http://www.devjason.com/2009/04/27/using-r-for-quick-mapping-of-swine-flu/</link>
		<comments>http://www.devjason.com/2009/04/27/using-r-for-quick-mapping-of-swine-flu/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 03:39:03 +0000</pubDate>
		<dc:creator>jsmith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cartography]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.devjason.com/?p=71</guid>
		<description><![CDATA[
I haven&#8217;t had much experience using R for spatial visualization, so I thought I would give the &#8220;maps&#8221; packages a go tonight and create a quick thematic map of confirmed cases of swine flue by state.  It doesn&#8217;t have all the elements I would want on a production map, but I was going for [...]]]></description>
			<content:encoded><![CDATA[<p><a class="shutterset_" href="http://www.devjason.com/2009/04/27/using-r-for-quick-mapping-of-swine-flu"><img class="ngg-singlepic ngg-left" src="http://www.devjason.com/wp-content/gallery/r_generated/thumbs/thumbs_swineflu_04272009.png" alt="swineflu_04272009.png" /></a><br />
I haven&#8217;t had much experience using R for spatial visualization, so I thought I would give the &#8220;maps&#8221; packages a go tonight and create a quick thematic map of confirmed cases of swine flue by state.  It doesn&#8217;t have all the elements I would want on a production map, but I was going for speed of generation.</p>
<p><span id="more-71"></span></p>
<p><img class="ngg-singlepic ngg-none" src="http://www.devjason.com/wp-content/gallery/r_generated/swineflu_04272009.png" alt="swineflu_04272009.png" /></p>
<p>Here is the R code I used to generate the map.</p>
<pre class="brush: plain;">
draw.flumap &lt;- function() {
	# load required libraries
	require(maps)
	require(RColorBrewer)

	# Create a dataframe with the reported observations
	states &lt;- c('California','Kansas','New York','Ohio','Texas')
	cases &lt;- c(7,2,28,1,2)
	flu &lt;- data.frame(states,cases)

	# Match up our observations with the &quot;state&quot; database
	stm &lt;- match.map(&quot;state&quot;, states)

	# Rank the cases and assign colors using the RColorBrewer YlOrRd palette
	flu$rank &lt;- rank(flu$cases, ties=&quot;min&quot;)
	pal &lt;- brewer.pal(max(flu$rank),&quot;YlOrRd&quot;)
	color &lt;- pal[flu$rank]
	flu.color &lt;- color[stm]

	# Actually do the drawing
	map(&quot;state&quot;,proj=&quot;albers&quot;,col=flu.color, parameters=c(30,40),fill=T,lwd=0.7)
	title(&quot;US Confirmed Cases of Swine Flu by State (27 April 2009)&quot;)
	legend('bottomleft', legend=paste(flu$states,flu$cases),fill=color, cex=0.75)
	text(0.26, -1.62, labels=paste(&quot;Albers Equal Area\n&quot;,&quot;Jason B. Smith\n&quot;, &quot;27 April 2009&quot;), cex=0.75)
}
# Just saves it out as a file
png(filename=&quot;~/Desktop/flu.png&quot;, width=600, height=500, bg=&quot;white&quot;)
print(draw.flumap())
dev.off()
</pre>
<p>The draw.flumap function has everything you would need to do this in an interactive R console, I really just wrapped it at the end to make it easier to save the image to a file.  I had to make use of the locator() function for placement of the text.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.devjason.com/2009/04/27/using-r-for-quick-mapping-of-swine-flu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
