This is the lhstats Reference Manual, version 1.1.1, generated automatically by Declt version 4.0 beta 2 "William Riker" on Sun Dec 15 06:36:33 2024 GMT+0.
The main system appears first, followed by any subsystem dependency.
lhstats
Statistical functions by Larry Hunter and Jeff Shrager.
Matt Curtis <matt.r.curtis@gmail.com>
Larry Hunter, Jeff Shrager
GNU General Public License version 2 (GPLv2)
1.1.1
package.lisp
(file).
lhstats.lisp
(file).
Files are sorted by type and then listed depth-first from the systems components trees.
lhstats/lhstats.lisp
package.lisp
(file).
lhstats
(system).
bin-and-count
(function).
binomial-cumulative-probability
(function).
binomial-ge-probability
(function).
binomial-probability
(function).
binomial-probability-ci
(function).
binomial-test-one-sample
(function).
binomial-test-one-sample-sse
(function).
binomial-test-paired-sse
(function).
binomial-test-two-sample
(function).
binomial-test-two-sample-sse
(function).
chi-square
(function).
chi-square-cdf
(function).
chi-square-test-for-trend
(function).
chi-square-test-one-sample
(function).
chi-square-test-rxc
(function).
choose
(function).
coefficient-of-variation
(function).
convert-to-standard-normal
(function).
correlation-coefficient
(function).
correlation-sse
(function).
correlation-test-two-sample
(function).
correlation-test-two-sample-on-sequences
(function).
f-significance
(function).
f-test
(function).
false-discovery-correction
(function).
fisher-exact-test
(function).
fisher-z-transform
(function).
geometric-mean
(function).
linear-regression
(function).
mcnemars-test
(function).
mean
(function).
mean-sd-n
(function).
median
(function).
mode
(function).
normal-mean-ci
(function).
normal-mean-ci-on-sequence
(function).
normal-pdf
(function).
normal-sd-ci
(function).
normal-sd-ci-on-sequence
(function).
normal-variance-ci
(function).
normal-variance-ci-on-sequence
(function).
percentile
(function).
permutations
(function).
phi
(function).
poisson-cumulative-probability
(function).
poisson-ge-probability
(function).
poisson-mu-ci
(function).
poisson-probability
(function).
poisson-test-one-sample
(function).
random-normal
(function).
random-pick
(function).
random-sample
(function).
range
(function).
round-float
(function).
sd
(function).
sign-test
(function).
sign-test-on-sequences
(function).
spearman-rank-correlation
(function).
square
(macro).
standard-deviation
(function).
standard-error-of-the-mean
(function).
t-distribution
(function).
t-significance
(function).
t-test-one-sample
(function).
t-test-one-sample-on-sequence
(function).
t-test-one-sample-sse
(function).
t-test-paired
(function).
t-test-paired-on-sequences
(function).
t-test-paired-sse
(function).
t-test-two-sample
(function).
t-test-two-sample-on-sequences
(function).
t-test-two-sample-sse
(function).
test-variables
(macro).
variance
(function).
wilcoxon-signed-rank-test
(function).
wilcoxon-signed-rank-test-on-sequences
(function).
z
(function).
z-test
(function).
z-test-on-sequence
(function).
*critical-values-of-r*
(special variable).
*critical-values-of-r-two-tailed-column-interpretaion*
(special variable).
*f0.05*
(special variable).
*f0.10*
(special variable).
*q-table*
(special variable).
*t-cdf-critical-points-table-for-.05*
(special variable).
2-tailed-correlation-significance
(function).
all-squares
(function).
anova1
(function).
anova2
(function).
anova2r
(function).
average-rank
(function).
beta-incomplete
(function).
binomial-le-probability
(function).
chi-square-1
(function).
chi-square-2
(function).
correlate
(function).
cross-mean
(function).
display
(macro).
dumplot
(function).
error-function
(function).
error-function-complement
(function).
even-power-of-two?
(function).
f-score>p-limit?
(function).
factorial
(function).
find-critical-value
(function).
gamma-incomplete
(function).
gamma-ln
(function).
harmonic-mean
(function).
histovalues
(function).
lmean
(function).
max*
(function).
min*
(function).
n-random
(function).
normalize
(function).
p2
(function).
protected-mean
(function).
pround
(function).
regress
(function).
round-up
(function).
s2
(function).
safe-exp
(function).
sign
(function).
sqr
(function).
standard-error
(function).
sum
(function).
t-p-value
(function).
t1-test
(function).
t1-value
(function).
t2-test
(function).
t2-value
(function).
testanova2
(function).
tukey-q
(function).
underflow-goes-to-zero
(macro).
wilcoxon-1
(function).
x2test
(function).
z/protect
(macro).
Packages are listed by definition order.
statistics
Statistical functions
stats
common-lisp
.
bin-and-count
(function).
binomial-cumulative-probability
(function).
binomial-ge-probability
(function).
binomial-probability
(function).
binomial-probability-ci
(function).
binomial-test-one-sample
(function).
binomial-test-one-sample-sse
(function).
binomial-test-paired-sse
(function).
binomial-test-two-sample
(function).
binomial-test-two-sample-sse
(function).
chi-square
(function).
chi-square-cdf
(function).
chi-square-test-for-trend
(function).
chi-square-test-one-sample
(function).
chi-square-test-rxc
(function).
choose
(function).
coefficient-of-variation
(function).
convert-to-standard-normal
(function).
correlation-coefficient
(function).
correlation-sse
(function).
correlation-test-two-sample
(function).
correlation-test-two-sample-on-sequences
(function).
f-significance
(function).
f-test
(function).
false-discovery-correction
(function).
fisher-exact-test
(function).
fisher-z-transform
(function).
geometric-mean
(function).
linear-regression
(function).
mcnemars-test
(function).
mean
(function).
mean-sd-n
(function).
median
(function).
mode
(function).
normal-mean-ci
(function).
normal-mean-ci-on-sequence
(function).
normal-pdf
(function).
normal-sd-ci
(function).
normal-sd-ci-on-sequence
(function).
normal-variance-ci
(function).
normal-variance-ci-on-sequence
(function).
percentile
(function).
permutations
(function).
phi
(function).
poisson-cumulative-probability
(function).
poisson-ge-probability
(function).
poisson-mu-ci
(function).
poisson-probability
(function).
poisson-test-one-sample
(function).
random-normal
(function).
random-pick
(function).
random-sample
(function).
range
(function).
round-float
(function).
sd
(function).
sign-test
(function).
sign-test-on-sequences
(function).
spearman-rank-correlation
(function).
square
(macro).
standard-deviation
(function).
standard-error-of-the-mean
(function).
t-distribution
(function).
t-significance
(function).
t-test-one-sample
(function).
t-test-one-sample-on-sequence
(function).
t-test-one-sample-sse
(function).
t-test-paired
(function).
t-test-paired-on-sequences
(function).
t-test-paired-sse
(function).
t-test-two-sample
(function).
t-test-two-sample-on-sequences
(function).
t-test-two-sample-sse
(function).
test-variables
(macro).
variance
(function).
wilcoxon-signed-rank-test
(function).
wilcoxon-signed-rank-test-on-sequences
(function).
z
(function).
z-test
(function).
z-test-on-sequence
(function).
*critical-values-of-r*
(special variable).
*critical-values-of-r-two-tailed-column-interpretaion*
(special variable).
*f0.05*
(special variable).
*f0.10*
(special variable).
*q-table*
(special variable).
*t-cdf-critical-points-table-for-.05*
(special variable).
2-tailed-correlation-significance
(function).
all-squares
(function).
anova1
(function).
anova2
(function).
anova2r
(function).
average-rank
(function).
beta-incomplete
(function).
binomial-le-probability
(function).
chi-square-1
(function).
chi-square-2
(function).
correlate
(function).
cross-mean
(function).
display
(macro).
dumplot
(function).
error-function
(function).
error-function-complement
(function).
even-power-of-two?
(function).
f-score>p-limit?
(function).
factorial
(function).
find-critical-value
(function).
gamma-incomplete
(function).
gamma-ln
(function).
harmonic-mean
(function).
histovalues
(function).
lmean
(function).
max*
(function).
min*
(function).
n-random
(function).
normalize
(function).
p2
(function).
protected-mean
(function).
pround
(function).
regress
(function).
round-up
(function).
s2
(function).
safe-exp
(function).
sign
(function).
sqr
(function).
standard-error
(function).
sum
(function).
t-p-value
(function).
t1-test
(function).
t1-value
(function).
t2-test
(function).
t2-value
(function).
testanova2
(function).
tukey-q
(function).
underflow-goes-to-zero
(macro).
wilcoxon-1
(function).
x2test
(function).
z/protect
(macro).
Definitions are sorted by export status, category, package, and then by lexicographic order.
Make N equal width bins and count the number of elements of sequence that belong in each.
P(X<k) for X a binomial random variable with parameters n &
p. Bionomial expecations for fewer than k events in N trials, each
having probability p.
The probability of k or more occurances in N events, each with probability p.
P(X=k) for X a binomial random variable with parameters n &
p. Binomial expectations for seeing k events in N trials, each having
probability p. Use the Poisson approximation if N>100 and P<0.01.
Confidence intervals on a binomial probability. If a binomial probability of p has been observed in N trials, what is the 1-alpha confidence interval around p? Approximate (using normal theory approximation) when npq >= 10 unless told otherwise
The significance of a one sample test for the equality of an observed probability p-hat to an expected probability p under a binomial distribution with N observations. Use the normal theory approximation if n*p*(1-p) > 10 (unless the exact flag is true).
Returns the number of subjects needed to test whether an observed probability is significantly different from a particular binomial null hypothesis with a significance alpha and a power 1-beta.
Sample size estimate for the McNemar (discordant pairs) test. Pd is the projected proportion of discordant pairs among all pairs, and Pa is the projected proportion of type A pairs among discordant pairs. alpha, 1-beta and tails are as binomal-test-two-sample-sse.
Returns the number of individuals necessary; that is twice the number of matched pairs necessary.
Are the observed probabilities of an event (p-hat1 and p-hat2) in N1/N2 trials different? The normal theory method implemented here. The exact test is Fisher’s contingency table method, below.
The number of subjects needed to test if two binomial probabilities are different at a given significance alpha and power 1-beta. The sample sizes can be unequal; the p2 sample is sample-sse-ratio * the size of the p1 sample. It can be a one tailed or two tailed test.
Returns the point which is the indicated percentile in the Chi Square distribution with dof degrees of freedom.
Computes the left hand tail area under the chi square distribution under dof degrees of freedom up to X. Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
This test works on a 2xk table and assesses if there is an increasing or decreasing trend. Arguments are equal sized lists counts. Optionally, provide a list of scores, which represent some numeric attribute of the group. If not provided, scores are assumed to be 1 to k.
The significance of a one sample Chi square test for the variance of a normal distribution. Variance is the observed variance, N is the number of observations, and sigma-squared is the test variance.
Takes contingency-table, an RxC array, and returns the significance of the relationship between the row variable and the column variable. Any difference in proportion will cause this test to be significant – consider using the test for trend instead if you are looking for a consistent change.
How may ways to take n things taken k at a time, when order doesn’t matter
Convert X from a Normal distribution with mean mu and variance sigma to standard normal
just r from linear-regression. Also called Pearson Correlation
Returns the size of a sample necessary to find a correlation of expected value rho with significance alpha and power 1-beta.
Test if two correlation coefficients are different. Users Fisher’s Z test.
Adopted from CLASP, but changed to handle F < 1 correctly in the
one-tailed case. The ‘f-statistic’ must be a positive number. The
degrees of freedom arguments must be positive integers. The
‘one-tailed-p’ argument is treated as a boolean.
This implementation follows Numerical Recipes in C, section 6.3 and the ‘ftest’ function in section 13.4.
F test for the equality of two variances
A multiple testing correction that is less conservative than Bonferroni.
Takes a list of p-values and a false discovery rate, and returns the
number of p-values that are likely to be good enough to reject the
null at that rate. Returns a second value which is the p-value
cutoff. See
Benjamini Y and Hochberg Y. "Controlling the false discovery rate: a practical and powerful approach to multiple testing." J R Stat Soc Ser B 57: 289 300, 1995.
Fisher’s exact test. Gives a p value for a particular 2x2 contingency table
Transforms the correlation coefficient to an approximately normal distribution.
Computes the regression equation for a least squares fit of a line to a sequence of points (each a list of two numbers, e.g. ’((1.0 0.1) (2.0 0.2))) and report the intercept, slope, correlation coefficient r, R^2, and the significance of the difference of the slope from 0.
McNemar’s test for correlated proportions, used for longitudinal studies. Look only at the number of discordant pairs (one treatment is effective and the other is not). If the two treatments are A and B, a-discordant-count is the number where A worked and B did not, and b-discordant-count is the number where B worked and A did not.
A combined calculation that is often useful. Takes a sequence and returns three values: mean, standard deviation and N.
Returns two values: a list of the modes and the number of times they occur.
Confidence interval for the mean of a normal distribution
The 1-alpha percent confidence interval on the mean of a normal distribution with parameters mean, sd & n.
The 1-alpha confidence interval on the mean of a sequence of numbers drawn from a Normal distribution.
The probability density function (PDF) for a normal distribution with mean mu and variance sigma at point x.
As normal-variance-ci-on-sequence, but a confidence inverval for the standard deviation.
The 1-alpha confidence interval on the variance of a sequence of numbers drawn from a Normal distribution.
How many ways to take n things taken k at a time, when order matters
the CDF of standard normal distribution. Adopted from CLASP 1.4.3, see copyright notice at http://eksl-www.cs.umass.edu/clasp.html
Probability of seeing fewer than K events over a time period when the expected number events over that time is mu.
Probability of X or more events when expected is mu.
Confidence interval for the Poisson parameter mu
Given x observations in a unit of time, what is the 1-alpha confidence
interval on the Poisson parameter mu (= lambda*T)?
Since find-critical-value assumes that the function is monotonic increasing, adjust the value we are looking for taking advantage of reflectiveness.
Probability of seeing k events over a time period when the expected number of events over that time is mu.
The significance of a one sample test for the equality of an observed number of events (observed) and an expected number mu under the poisson distribution. Normal theory approximation is not that great, so don’t use it unless told.
returns a random number with mean and standard-distribution as specified.
Random selection from sequence
Return a random sample of size N from sequence, without replacement. If N is equal to or greater than the length of the sequence, return the entire sequence.
Rounds a floating point number to a specified number of digits precision.
Really just a special case of the binomial one sample test with p = 1/2. The normal theory version has a correction factor to make it a better approximation.
Same as sign-test, but takes two sequences and tests whether the entries in one are different (greater or less) than the other.
Spearman rank correlation computes the relationship between a pair of variables when one or both are either ordinal or have a distribution that is far from normal. It takes a list of points (same format as linear-regression) and returns the spearman rank correlation coefficient and its significance.
Returns the point which is the indicated percentile in the T distribution with dof degrees of freedom. Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Lookup table in Rosner; this is adopted from CLASP/Numeric Recipes (CLASP 1.4.3), http://eksl-www.cs.umass.edu/clasp.html
The significance of a one sample T test for the mean of a normal
distribution with unknown variance. X-bar is the observed mean, sd is
the observed standard deviation, N is the number of observations and
mu is the test mean.
See also t-test-one-sample-on-sequence
As t-test-one-sample, but calculates the observed values from a sequence of numbers.
Returns the number of subjects needed to test whether the mean of a normally distributed sample mu is different from a null hypothesis mean mu-null and variance variance, with alpha, 1-beta and tails as specified.
The significance of a paired t test for the means of two normal distributions in a longitudinal study. D-bar is the mean difference, sd is the standard deviation of the differences, N is the number of pairs.
The significance of a paired t test for means of two normal distributions in a longitudinal study. Before is a sequence of before values, after is the sequence of paired after values (which must be the same length as the before sequence).
Returns the number of subjects needed to test whether the differences with mean difference-mu and variance difference-variance, with alpha, 1-beta and tails as specified.
The significance of the difference of two means (x-bar1 and x-bar2) with standard deviations sd1 and sd2, and sample sizes n1 and n2 respectively. The form of the two sample t test depends on whether the sample variances are equal or not. If the variable variances-equal? is :test, then we use an F test and the variance-significance-cutoff to determine if they are equal. If the variances are equal, then we use the two sample t test for equal variances. If they are not equal, we use the Satterthwaite method, which has good type I error properties (at the loss of some power).
Same as t-test-two-sample, but providing the sequences rather than the summaries.
Returns the number of subjects needed to test whether the mean mu1 of a normally distributed sample (with variance variance1) is different from a second sample with mean mu2 and variance variance2, with alpha, 1-beta and tails as specified. It is also possible to set a sample size ratio of sample 1 to sample 2.
A test on the ranking of positive and negative differences (are the
positive differences significantly larger/smaller than the negative
ones). Assumes a continuous and symmetric distribution of differences,
although not a normal one. This is the normal theory approximation,
which is only valid when N > 15.
This test is completely equivalent to the Mann-Whitney test.
The inverse normal function, P(X<Zu) = u where X is distributed as the standard normal. Uses binary search.
The significance of a one sample Z test for the mean of a normal
distribution with known variance.
mu is the null hypothesis mean, x-bar is the observed mean, sigma is the standard deviation and N is the number of observations. If tails is :both, the significance of a difference between x-bar and mu. If tails is :positive, the significance of x-bar is greater than mu, and if tails is :negative, the significance of x-bar being less than mu.
Returns a p value.
Protects against floating point underflow errors and sets the value to 0.0 instead.
Macro to protect from division by zero.
We use the first line for anything less than 5, and the last line for anything over 500. Otherwise, find the nearest value (maybe we should interpolate ... too much bother!)
One way simple ANOVA, from Neter, et al. p677+. Data is give as a list of lists, each one representing a treatment, and each containing the observations.
Two-Way Anova. (From Misanin & Hinderliter, 1991, p. 367-) This is specialized for four groups of equal n, called by their plot location names: left1 left2 right1 right2.
Two way ANOVA with repeated measures on one dimension. From Ferguson & Takane, 1989, p. 359. Data is organized differently for this test. Each group (g1 g2) contains list of all subjects’ repeated measures, and same for B. So, A: ((t1s1g1 t2s1g1 ...) (t1s2g2 t2s2g2 ...) ...) Have to have the same number of test repeats for each subject, and this assumes the same number of subject in each group.
Average rank calculation for non-parametric tests. Ranks are 1 based, but lisp is 0 based, so add 1!
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Correlation of two sequences, as in Ferguson & Takane, 1989, p. 125. Assumes NO MISSING VALUES!
Cross mean takes a list of lists, as ((1 2 3) (4 3 2 1) ...) and
produces a list with mean and standard error for each VERTICLE entry,
so, as: ((2.5 . 1) ...) where the first pair is computed from the nth
1 of all the sublists in the input set, etc. This is useful in some
cases of data cruching.
Note that missing data is assumed to be always at the END of lists. If it isn’t, you’ve got to do something previously to interpolate.
A dumb terminal way of plotting data.
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
See: http://mathworld.wolfram.com/HarmonicMean.html
Take a set of values and produce a histogram binned into n groups, so that you can get a report of the distribution of values. There’s a large chance for off-by-one errores here!
Lmean takes the mean of entries in a list of lists vertically. So: (lmean ’((1 2) (5 6))) -> (3 4) The args have to be the same length.
Select n random sublists from a list, without replacement. This copies the list and then destroys the copy. N better be less than or equal to (length l).
Normalize a vector by dividing it through by subtracting its min and then dividing through by its range (max-min). If the numbers are all the same, this would screw up, so we check that first and just return a long list of 0.5 if so!
Computes a mean protected where there will be a divide by zero, and gives us n/a in that case.
Returns a string that is rounded to the appropriate number of digits, but the only thing you can do with it is print it. It’s just a convenience hack for rounding recursive lists.
Simple linear regression.
Eliminates floating point underflow for the exponential function. Instead, it just returns 0.0d0
One way t-test to see if a group differs from a numerical mean target value. From Misanin & Hinderliter p. 248.
T2-test calculates an UNPAIRED t-test.
From Misanin & Hinderliter p. 268. The t-cdf part is inherent in xlispstat, and I’m not entirely sure that it’s really the right computation since it doens’t agree entirely with Table 5 of M&H, but it’s close, so I assume that M&H have round-off error.
Finds the Q table for the appopriate K, and then walks BACKWARDS through it (in a kind of ugly way!) to find the appropriate place in the table for the DFwg, and then uses the level (which must be 0.01 or 0.05, indicating the first, or second col of the table) to determine if the Q value reaches significance, and gives us a + or - final result.
Nonparametric one-sample (signed) rank test (Wilcoxon).
From http://www.graphpad.com/instatman/HowtheWilcoxonranksumtestworks.htm
Simple Chi-Squares From Clarke & Cooke p. 431; should = ~7.0
Jump to: | 2
A B C D E F G H L M N P R S T U V W X Z |
---|
Jump to: | 2
A B C D E F G H L M N P R S T U V W X Z |
---|
Jump to: | *
S |
---|
Jump to: | *
S |
---|
Jump to: | F L P S |
---|
Jump to: | F L P S |
---|