The module-specific charts are good (though it'd be nice if the barchart.html page showed all of the charts there (with titles).
But the real value is going to be a single page that has one graph for each test, where each graph contained the results from all of the tested modules. The comparison between tested modules is the valuable part of this. The utils/modeshape-connector-benchmark had such an output, and it's very valuable. It has a single HTML template with a $DIV$ section that was replaced with the chart data in the code.
Best of all, Google charts now has support for box charts, which is exactly what we want! The question really becomes: what's the best way to produce the charts within the performance test framework? Since each test outputs the data (in the 'perf-test.txt' file), perhaps the aggregate HTML page can be generated by simply reading those files (whichever ones are there).
(BTW, I don't think we need to put the outliers on our charts. There's no easy way to do that with Google Charts, since you need a separate data series for each pair of outliers for each candlestick.)