Foundational statistical techniques
Statistical analysis is a method of aggregating numeric data and drawing inferences about variables. Statistical procedures may be broadly classified into (1) statistics that describe data—descriptive statistics; and (2) statistics that make inferences about more general situations beyond the actual data set—inferential statistics.
Descriptive statistics
Descriptive statistics aggregate data that are grouped into variables to examine typical values and the spread of values for each variable in a data set. Statistics summarising typical values are referred to as measures of central tendency and include the mean, median and mode. The spread of values is represented through measures of variability, including the variance, SD and range. Together, descriptive statistics provide indicators of the distribution of data, or the frequency of values through the data set as in a histogram plot. Table 1 summarises commonly used descriptive statistics. For consistency, I use the terms independent variable and dependent variable, but in some fields and types of research such as correlational studies the preferred terms may be predictor and outcome variable. An independent variable influences, affects or predicts a dependent variable.
Inferential statistics: comparing groups with t tests and ANOVA
Inferential statistics are another broad category of techniques that go beyond describing a data set. Inferential statistics can help researchers draw conclusions from a sample to a population.1 We can use inferential statistics to examine differences among groups and the relationships among variables. Table 2 presents a menu of common, fundamental inferential tests. Remember that even more complex statistics rely on these as a foundation.
The t test is used to compare two group means by determining whether group differences are likely to have occurred randomly by chance or systematically indicating a real difference. Two common forms are the independent samples t test, which compares means of two unrelated groups, such as means for a treatment group relative to a control group, and the paired samples t test, which compares means of related groups, such as the pretest and post-test scores for the same individuals before and after a treatment. A t test is essentially determining whether the difference in means between groups is larger than the variability within the groups themselves.
Another fundamental set of inferential statistics falls under the general linear model and includes analysis of variance (ANOVA), correlation and regression. To determine whether group means are different, use the t test or the ANOVA. Note that the t test is limited to two groups, but the ANOVA is applicable to two or more groups. For example, an ANOVA could examine whether a primary outcome measure—dependent variable—is significantly different for groups assigned to one of three different interventions. The ANOVA result comes in an F statistic along with a p value or confidence interval (CI), which tells whether there is some significant difference among groups. We then need to use other statistics (eg, planned comparisons or a Bonferroni comparison, to give two possibilities) to determine which of those groups are significantly different from one another. Planned comparisons are established before conducting the analysis to contrast the groups, while other tests like the Bonferroni comparison are conducted post-hoc (ie, after analysis).
Examining relationships using correlation and regression
The general linear model contains two other major methods of analysis, correlation and regression. Correlation reveals whether values between two variables tend to systematically change together. Correlation analysis has three general outcomes: (1) the two variables rise and fall together; (2) as values in one variable rise, the other falls; and (3) the two variables do not appear to be systematically related. To make those determinations, we use the correlation coefficient (r) and related p value or CI. First, use the p value or CI, as compared with established significance criteria (eg, p<0.05), to determine whether a relationship is even statistically significant. If it is not, stop as there is no point in looking at the coefficients. If so, move to the correlation coefficient.
A correlation coefficient provides two very important pieces of information—the strength and direction of the relationship. An r statistic can range from −1.0 to +1.0. Strength is determined by how close the value is to −1.0 or 1.0. Either extreme indicates a perfect relationship, while a value of 0 indicates no relationship. Cohen provides guidance for interpretations: 0.1 is a weak correlation, 0.3 is a medium correlation and 0.5 is a large correlation.1 2 These interpretations must be considered in the context of the study and relative to the literature. The valence (+ or −) coefficient reveals the direction of the relationship. A negative correlation means one value rises, while the other tends to fall, and a positive coefficient means that the values of the two variables tend to rise and fall together.
Regression adds an additional layer beyond correlation that allows predicting one value from another. Assume we are trying to predict a dependent variable (Y) from an independent variable (X). Simple linear regression gives an equation (Y = b0 + b1X) for a line that we can use to predict one value from another. The three major components of that prediction are the constant (ie, the intercept represented by b0), the systematic explanation of variation (b1), and the error, which is a residual value not accounted for in the equation3 but available as part of our regression output. To assess a regression model (ie, model fit), examine key pieces of the regression output: (1) F statistic and its significance to determine whether the model systematically accounts for variance in the dependent variable; (2) the r square value for a measure of how much variance in the dependent variable is accounted for by the model; (3) the significance of coefficients for each independent variable in the model; and (4) residuals to examine random error in the model. Other factors, such as outliers, are potentially important (see Field4).
The aforementioned inferential tests are foundational to many other advanced statistics that are beyond the scope of this article. Inferential tests rely on foundational assumptions, including that data are normally distributed, observations are independent, and generally that our dependent or outcome variable is continuous. When data do not meet these assumptions, we turn to non-parametric statistics (see Field4).
A brief history of foundational statistics
Prominent statisticians Karl Pearson and Ronald A Fisher developed and popularised many of the basic statistics that remain a foundation for statistics today. Fisher’s ideas formed the basis of null hypothesis significance testing that sets a criterion for confidence or probability of an event.4 Among his contributions, Fisher also developed the ANOVA. Pearson’s correlation coefficient provides a way to examine whether two variables are related. The correlation coefficient is denoted by r for a relationship between two variables or R for relationships among more than two variables as in multiple correlation or regression.4 William Gosset developed the t distribution and later the t test as a way to examine whether two values of means were statistically different.5
Statistical software
While the aforementioned statistics can be calculated manually, researchers typically use statistical software that process data, calculate statistics and p values, and supply a summary output from the analysis. However, the programs still require an informed researcher to run the correct analysis and interpret the output. Several available programs include SAS, Stata, SPSS and R. Try using the programs through a demonstration or trial period before deciding which one to use. It also helps to know or have access to others using the program should you have questions.
Example study
The remainder of this article presents steps in statistical analysis that apply to many techniques. A recently published study on communication skills to break bad news to a patient with cancer provides an exemplar to illustrate these steps.6 In that study, the team examined the validity of a competence assessment of communication skills, hypothesising that after receiving training, post-test scores would be statistically improved from pretest scores on the same measure. Another analysis was to examine pretest sensitisation, tested through a hypothesis that a group randomly assigned to receive a pretest and post-test would not be significantly different from a post-test-only group. To test the hypotheses, Guetterman et al 6 examined whether mean differences were statistically significant by applying t tests and ANOVA.