What’s the appropriate test for my cross tabulation?

Last Updated on July 15, 2020 by Ayla Myrick

R Andrews, PhD


Let’s start by defining cross tabulation, more commonly called crosstab. A cross tabulation, also known as contingency table, is a two-dimensional table that reports the number of participants (i.e., observations) whose characteristics fall in each cell of the table. It is widely used in market research and also in scientific research. It represents the relationship between two categorical variables, which can be of a nominal or ordinal nature. In a nominal variable, categories can’t be ranked or ordered (e.g., gender) as opposed to the categories in an ordinal variable (e.g., number of siblings).

The most widely used statistical test when you have a cross tabulation between two categorical variables (nominal or ordinal) is the Chi-square test, or test for independence. The null hypothesis for this test is that the occurrence of both outcomes measured by the categorical variables is statistically independent. For example if we have a contingency table between gender (male, female) and smoking (non-smoker, occasional smoker, regular smoker), you could use the chi-square test to test the Ho: There is no relationship or association between gender and smoking versus the Ha: There is an association between gender and smoking.

The Fisher’s exact test is useful when there are very low frequencies (even zero in some cells) due to having a small sample size or a category with rare occurrence (e.g., a particular type of complication during surgery). In this case the Chi-square test would not be appropriate.

However, the Chi-square test will only tell you whether the relationship between two categorical variables is significant. There are other tests that you can use to measure not only the association but the strength of this association. Lambda, used with nominal variables, ranges from 0 (no relationship) to 1 (perfect association). You can interpret its score as a percentage of how much of one variable can be explained by knowing the values of the other. One potential problem with Lambda is that it has a tendency to underestimate the relationship, that’s why using it together with the Chi-square test is always recommended.

When the two variables studied are ordinal the following tests will measure significance, strength and direction of the relationship: Gamma, Sommer’s D, Kendall’s tau.

Finally, there is one test that you need to use when you have before and after data. The McNemar test can be understood as a paired version of the Chi-square test which can only be run for 2×2 tables. With this test you want to assess whether the outcome variable (e.g. acceptance of a new app)  has significantly changed between before and after an  experiment/intervention.  The McNemar test can be also extended for higher order tables, a.k.a. symmetry and marginal homogeneity tests.

I hope this post will have helped the audience to be more informed when it comes to analyzing contingency tables.

Ayla Myrick