## Summary of Statistical TestsMarch 2008 (Ph.D. student)
Summary
As I attempt to learn statistics to assist my research, I've found it useful to document the various tests that I've learned. My goal in creating this page is to provide a quick summary of what each test can be used for and when it can be applied. I'm not a statistics expert by any means, so please email me if there are errors.
## Testing whether the population mean is equal to some valueYou collect a simple random sample from a population of ratio-level or interval-level values. ## Any population distribution (more realistic scenario)You must have large sample size (N >= 30) to proceed. You can use the
## Normally-distributed population- Population standard deviation unknown
- Large sample size (N >= 30)
**z-test**(approximate but easier to calculate)**t-test**(exact but might be harder to calculate using tables)
- Small sample size (N < 30)
**t-test**(exact)
- Large sample size (N >= 30)
- Population standard deviation known (unlikely)
**z-test**(exact) - this is the ideal case but unlikely to actually occur
## Testing whether the means of two populations are identical, given two sets of independent samplesYou collect two sets of independent simple random samples from populations of ratio-level or interval-level values (e.g., subjects are randomly-picked to receive either one of two experimental treatments). ## Any population distribution (more realistic scenario)- Large sample size (N >= 30)
**two-sample z-test****Mann-Whitney U test**a.k.a.*Wilcoxon rank-sum test*(non-parametric, and will even work on ordinal-level measurements) - tests whether two samples are drawn from the same population (and, by implication, their distributions and means are equal)
- Small sample size (N < 30)
**Mann-Whitney U test**a.k.a.*Wilcoxon rank-sum test*
## Normally-distributed population- Population standard deviations unknown but assumed to be equal
(called
*homogeneity of variances*)- Large sample size (N1 >= 30, N2 >= 30)
**two-sample z-test**
- Small sample size (N1 < 30, N2 < 30)
**two-sample t-test**
- Large sample size (N1 >= 30, N2 >= 30)
- Population standard deviations known (unlikely)
**two-sample z-test**
## Testing whether one population has values that are consistently greater than or less than those of the other population, given two sets of paired samplesYou collect a set of paired samples from two populations of ratio-level or interval-level values (e.g., each subject is given both experimental treatments). ## Any population distribution (more realistic scenario)The following non-parametric tests can even be used for ordinal-level values. **Wilcoxon signed-rank test**- tests whether the*median*difference between pairs of observations is zero**Sign test**- tests whether there are equal numbers of pairs of observations that exhibit increases and decreases in value
## The population of differences between pairs are normally-distributedThat's right, you read that correctly! The following test assumes that the differences in the pairs amongst the population are normally-distributed, which might make it difficult to apply. **paired t-test**- tests whether the*mean*difference between pairs of observations is zero
## Testing whether the means of more than two populations are identicalYou collect the same sets of ratio-level or interval-level measurements from several different groups (e.g., people's heights) and want to determine whether the means of all of the groups' respective populations are identical. ## Any population distribution (more realistic scenario)The following non-parametric test can even be used for
ordinal-level values, but it assumes that the observations in each group
come from distributions with the same **Kruskal-Wallis test**- tests whether the mean*ranks*of samples are identical (not exactly the same as testing whether the means are identical)
## Normally-distributed and homoscedastic populationHomoscedastic means that the within-group variances for all groups are identical (e.g., the variance in heights within each group of people are identical). **One-way anova**- tests whether the means of all populations are identical
## Testing whether two variables are correlatedYou collect pairs of ratio-level or interval-level measurements from a population, where each of the two elements in each pair measures a different property (e.g., height and weight). **Pearson correlation test**- tests the degree of*linear*correlation**Spearman rank correlation test**- (non-parametric, and will even work on ordinal-level measurements) - tests the degree of (not necessarily linear) correlation
## Testing whether the observed frequencies of categorical (nominal) variables deviate significantly from their expected frequenciesThe following are These tests can be computationally expensive, so they are not recommended for N > 1000: **Exact binomial test**- can only be used to test the frequencies of two categorical values such as male vs. female (use exact multinomial test for > 2 values)**Randomization test**- should give the same result as the exact test if run enough times, but is intuitively easier to explain
These tests require that the expected counts in each category not be too small (the smallest expected count greater than 5 will suffice): **Pearson's chi-square test**- can be used to test the frequencies of two or more categorical values
## Testing whether the proportions in two different groups are identicalYou have two categorical variables, each of which have two or more possible values. **Chi-square test of independence**- doesn't work well if the smallest expected count is too small, say less than 5**Fisher's exact test of independence**
## Sources- The Cartoon Guide to Statistics by Larry Gonick and Wollcott Smith
- Schaum's Outline of Elements of Statistics II: Inferential Statistics by Stephen Bernstein and Ruth Bernstein
- HyperStat Online
- Handbook of Biological Statistics
- Choosing a statistical test - (way better than my lame attempt here!)
Created: 2008-03-28
Last modified: 2008-03-30 |