This is a ~J~ program which does some statistical analysis on computer benchmark results. This is also a collection of notes so that I may recall what I learned while writing this script.
I wanted a way to know which changes I was making to J programs were
actually effective. The built in time function x 6!:2 y, which
averages the time taken to run y x times, isn’t great for this
purpose as averaging destroys information and it takes a hit from
initial interprative overhead. Light programs y thus require a large
x, before the results converge.
~bonsai~ uses is statistical bootstrapping, which enables us to estimate statistical parameters on nonparametric samples. I got the sense of what to look for from haskell’s criterion package, and learned the material for how to implement it through Efron and Hastie’s textbook Computer Age Statistical Inference. See chapters 10 and 11 from there to get the original material.
We take a sample
Some of these benchmarks will be expensive to run and in general it
won’t be possible to gather a large sample. To that end, the process
of statistical bootsrapping allows us to extrapolate better estimates
on the desired statistics through resampling uniformly from
A single bootstrap resample of the original sample
$$\hat {\text{se}} = \sqrt{\frac {∑_b \big(\hat\thetab - \hat\theta⋅\big)^2}{B-1}}$$
can be computed. Basically, the process is getting a sample
The final step in the above process is akin to running another
algorithm
Before describing the bootstrap confidence calculations, a diversion on configuration.
The basic configuration is the amount of time alloted for the initial benchmarks, the minimum number of runs, and the maximum number runs. Additional configuration includes the target coverage for confidence intervals and the number of bootstrap trials.
time =: 1 NB. time alloted (upper bound on)
lo =: 5 NB. minimum sample
hi =: 2000 NB. maximum sample
alpha =: 0.05 NB. coverage
B =: 2500 NB. bootstrap resampleWe gather an initial sample from dobench by first running the
program once and run it a number of times based on the
configuration. It’s also possible to specify exactly how many times to
run a benchmark by using it dyadically.
dobench=: 1 : 0
NB. u dobench y: run sentence y a number of times based on the
NB. configuration. u is the locale where the sentence was called from,
NB. which is captured in the bonsai verb.
cocurrent u
t0 =. (hi_bonsai_ <. lo_bonsai_ >. >. time_bonsai_ % 1e_6 >. 6!:2 y)
xs =. 6!:2"1 t0 # ,: y
cocurrent 'bonsai' NB. apparently u would otherwise stick during
NB. execution of other verbs (eg summarize)
xs
)
dobootstrap=: 1 : 0
NB. u doboostrap: redraw uniformly from sample y u times.
y {~ ? u # ,: $~ #y
)Discrete percentiles and quantiles are not in J’s stats addon and can be computed as follows:
qtile =: 4 : 0
ws=. (%+/)"1 -. | xs -"0 1 is=. (<.,>.)"0 xs=. x * <:#y
ws (+/"1 @: *) is { /:~ y
)
cdf =: (+/ @: (<:/~) % #@]) :. qtile
IQR =: -/ @: (0.75 0.25&qtile)The local variables is and ws in qtile are used to interpolate
between values at neighboring indices so that for example 0.5
(cdf^:_1) 0 3 and median 0 3 agree and are both 1.5. The obverse
counts how many elements of y are less than or equal to x. IQR
is the interquartile range.
For visualization purposes, here is some basic functionality for calculating kernel density estimates given a benchmark.
bandwidth =: 0.9 * (stddevp * 5 %: 4r3 % #) NB. <. (0.746269 * IQR)
kde =: 2 : 0
NB. n sample, u kernel, y point
(h%#n) * +/ u (y - n) * h =. % bandwidth n
)
NB. standard normal pdf & epanechnikov kernels
phi =: (%:%2p1) * [: ^ _0.5 * *:
epanechnikov =: 3r4 * 0 >. 1 - *:Corresponds to Chapter 11 of casi textbook. Throughout, goal is to
estimate the unseen statistic
The simplest but least accurate way of stamping a condience interval
on the resampled statistics
bssi=: 1 : 0
NB. x u bspi y: verb u is statistic, y is sample, x is resample.
(mean s) -`[`+`:0 (stddev s=. u"1 x) * qnorm -. -: alpha
)In other words for 95% coverage the estimate for alpha.
The next best way to go is to use percentiles on the emperical resamples to find our confidence.
bspi=: 1 : 0
NB. x u bspi y: verb u is statistic, y is sample, x is resample.
((-:i.3) + (i:_1) * -:alpha) cdf^:_1 u"1 x
)In other words, we estimate (,-.) -: alpha.
The resamples may skew more heavily to one side or the other of
bsbc=: 1 : 0
NB. x u bsbc y: verb u is statistic, y is sample, x is resample.
that =. u samp =. y
z0=. qnorm that cdf resamp =. u"1 x
I=. pnorm (+: z0) + qnorm (,-.) -: alpha
({.,that,{:) I (cdf^:_1) resamp
)The above corresponds to $$p_0=\frac{\#\{\hat\theta*b ≤ \hat θ\}}{B}$$ $$z_0=Φ-1 (p_0)$$ $$\hat\theta\text{BC}[α] = \hat F-1 [Φ (2⋅ z_0 + z(α))]$$
When the bootstrap resamples are median unbiased (ie
The previous method assumes the existence of a monotone transform
bsbca=: 1 : 0
NB. x u bsbca y: verb u is statistic, y is sample, and x is resample.
thati=. (1 u \. y) - that =. u y
ahat=. 1r6 * (+/thati^3) % (+/*:thati)^3r2
z0qt=. that cdf resamp=. u"1 x
ab =. (,-.) -: alpha
if. 1 ~: ab I. z0qt do. x u bspi y
else. z0=. qnorm z0qt
zabh=. z0 + (% 1 - ahat&*) z0 + qnorm ab
({.,that,{:) (pnorm zabh) cdf^:_1 resamp
end.
)The above corresponds to calculating
where the
J programs don’t tend to have much overhead, but this is a nice idea
from criterion. One way to estimate the performance of a program is
to do a linear regression on the sample. Presumably the overhead will
be captured in the constant term, giving a clearer picture of typical
execution times. Here, we sum of the execution times to get n
snapshots of performance.
regress_bench=: +/\ %. 1 ,. i.@#
rsquare_bench=: 3 : 0
b=. (y=.+/\y) %. v=. 1,.i.#y
(sst-+/*:y-v +/ . * b)% sst=. +/*:y-(+/y) % n=. #y
)Find confidence for
Bootstrap-t instead estimates distribution of
In J:
se2_t=: +&%&# * +&ssdev % +&#-2:
se_t=: %:@:se2_t
bs_t=: 4 : 0
NB. x bs_t y: use bootstrap-t to compare distributions of benchmark
NB. results form sentences x and y.
that=. x -&mean y
sehat=. x se_t y
sx =. B dobootstrap x
sy =. B dobootstrap y
xbar =. sx mean bs_est x
ybar =. sy mean bs_est y
dsamp=. sx ((that -~ -&mean) % se_t)"1 sy
ths =. ({.,that,{:) that - sehat * ((,~-.) -: alpha) cdf^:_1 dsamp
xvy =. xbar (-~%[) ybar
ths , xbar , ybar ,: xvy
)
bs_compare=: bs_t & dobenchThe idea is we can get some confidence on the parameter
Verb bonsai defaults to using bsbca and estimate some descriptive
statistics in summarize. bonsai is ambivalent and when used as a
dyad benchmarks two sentences, comparing their mean execution times
via bootstrap-t. As a monad, it outputs some descriptive statistics.
NB. use bs bias corrected accelerated by default
bs_est =: bsbca
mu =: u: 16b3bc
delta =: u: 16b3c3
summarize =: 3 : 0
NB. Report some descriptive statistics about a list y of benchmark results.
resamp=. B dobootstrap samp=. y
xbarc=. resamp mean bs_est samp
sdevc=. resamp stddev bs_est samp
NB. regac=. resamp ({:@regress_bench) bs_est samp
NB. rsqrc=. resamp rsquare_bench bs_est samp
NB. skwnc=. resamp skewness bs_est samp
NB. kurtc=. resamp kurtosis bs_est samp
ests=. <"0 xbarc ,: sdevc NB. , regac ,: rsqrc
ests=. (;: 'lower estimate upper') , ests
rows=. ('N = ',":#samp);mu;delta NB. ;'ols';('R',(u:16bb2),' (ols)')
rows ,. ests
)
bonsai3=: 1 : 'summarize u dobench_bonsai_ y'
bonsai4 =: 1 : 0
table =. ;: 'comparison lower estimate upper'
x =. u dobench_bonsai_ x
y =. u dobench_bonsai_ y
x_y =. x bs_t y
table =. table , (('- & ',mu);(mu,'(x)');(mu,'(y)');'x (-~%[) y') ,. <"0 x_y
)
bonsai_z_ =: 3 : 0
NB. bonsai y: estimate performance of sentence y
loc =. coname''
loc bonsai3_bonsai_ y
:
NB. x bonsai y: compare mean execution time of two sentences. a
NB. positive estimate means sentence x takes longer to execute than
NB. sentence y.
loc =. coname ''
x loc bonsai4_bonsai_ y
)
bsppns =: 'ns' ,~ [: ": [: <. 0.5 + 1e9&*
bsppus =: ('s',~u:16b3bc) ,~ [: ": [: <. 0.5 + 1e6&*
bsppms =: 'ms' ,~ [: ": [: <. 0.5 + 1e3&*
bspps =: 's' ,~ [: ": (100 %~ [: <. 0.5 + 100&*)
bsppa =: bsppns`bsppus`bsppms`bspps@.(_6 _3 0 I. 10&^.)
bsnump =: 1 4 8 e.~ 3!:0
bsucp =: 131072 262144 e.~ 3!:0
bspp =: bsppa ^: bsnump
bonsaipp =: 3 : 0
NB. with monadic bonsai usage, pretty print the results
res =. bonsai y
(u:@":) ^: (-.@bsucp) &.> ({: res) ,~ bspp &.> }: res
)bonsaiplot_z_ =: 3 : 0
NB. plot kde of benchmark of sentence y
require 'plot'
pd 'reset; visible 0;title bonsai'
pd 'xcaption time;ycaption kde & sample over time'
'a b' =. (<./,>./) samp =. (coname'') dobench_bonsai_ y
den =. (epanechnikov kde samp)"0 pts =. (-:a+b) + ((%~i:)1000) * 0.6*b-a
pd 'type dot;pensize 0.3;color 80 100 200'
pd samp;(>./den)*(%~i.)#samp
pd 'type line; pensize 2;color 0 120 240'
pd pts;den
pd 'key sample density; keycolor 80 100 200,0 120 240;keypos top right'
pd'show'
)coclass 'bonsai'
load 'stats/base stats/distribs'
<<configuration>>
<<sampling>>
<<quantile>>
<<kde>>
<<standard-interval>>
<<percentile>>
<<bias-percentile>>
<<bias-and-accelerated>>
<<regression>>
<<bootstrap-t>>
<<printing>>
<<plotting>>
<<analysis>>