This guest post is written by Sam Koslowsky, vice president of modeling solutions at Harte Hanks:  

At a recent conference, I joined a panel we were discussing the challenges and opportunities of intelligent email marketing. In response to an observation by a participant, I commented that there was an association between email responses and customer spending. I was challenged to prove that more email responses cause increased levels of spending. I was rather surprised that the audience did not initially see the difference between correlation and causation. I said that there is an ‘association’ — in research vernacular– a correlation. I never suggested that there was a cause and effect relationship. This led me to believe that there may be some misunderstanding among both laymen and researchers alike when discussing these critical terms.

Causality, in exploratory studies, may very well be one of the most misunderstood notions in research. Often, we jump to a conclusion because it supports our point of view, or because we like what we have heard. What these individuals fail to recognize is that a correlation, does not in itself, demonstrate any inkling of causality. One researcher pointed out, based on published analysis, that there is an association between murder rate and Internet Explorer usage! But to suggest that one causes the other would be absurd.

It is common to examine analytic relationships by investigating how changes in one variable cause fluctuations in the other. A marketer may demonstrate that less education correlates with an increased response rate, but may not extend this observation to conclude that less education causes increased response rates. Correlational analysis provides a tenuous ground for casual explanations. Changes in one variable may not be directly caused by the autonomous fluctuations of the other variable. It is possible for two variations to be associated, even though one does not cause the other.

A contributing factor to this anomaly is ‘concern for direction’. In our earlier illustration concerning email receptivity and spending, we can conclude that these two actions are related. However, one cannot easily determine whether email receptivity causes increased spending or, whether increased spending causes email receptivity.

The ‘concern for direction’ is not always problematic. In my previous example of lower education and increased responsiveness, it is not possible to conclude that increased responsiveness resulted in lower education levels. After all, educational levels typically are determined before the potential to respond has an opportunity to materialize. The ‘concern for direction’ may not be a challenge at all, when one event must, by definition, precede the corresponding action. Accordingly, we would be forced here to infer that lower education causes increased response rate.

Another issue to contend with is the ‘concern for the third variable’. Referring back to our initial example, we cannot conclude that email responsiveness causes more spending, nor does it have to mean that more spending causes email responsiveness. There may very well be a third lurking variable that causes both email receptivity and increased spending. For example, it is possible that the email responsive people come from a distinct background or social group. They also, by nature, spend a lot. A third variable may have influenced the results.

The ‘concern for the third variable’ frequently results in what we refer to as spurious correlation, a mathematical relationship in which two events or variables have no direct causal connection, yet may be wrongly inferred to, due to either coincidence or the presence of a certain third, unseen factor.

Researchers are often concerned about the possibility of a ‘third lurking variable’. This is a legitimate concern and strengthens the argument that this is a serious problem. Correlation studies must be undertaken carefully, and interpreted with much care.

So how then can we conclude that correlation does indeed result in a causal relationship? More vigorous relationships suggest an increased comfort level in concluding that causation may exist. Take texting while driving and accidents. Analyses have demonstrated that there is an association between these events. It now appears that these relationships are convincing enough that most investigators believe that a causal relationship exists.

Designing experiments carefully may also be of value when doing causal studies.

The correlation causation conundrum is a really serious challenge. Frankly, I get uncomfortable when I read headlines that imply cause and effect relationships. Researchers and laymen alike need to be more alert when they draw conclusions. They may be interesting, but they also may very well be misleading or untrue.

# # #

Sam Koslowsky is vice president of modeling solutions at Harte Hanks, a targeted marketing services company offering a wide array of integrated, multichannel and data-driven solutions. His responsibilities include developing state-of- the-art quantitative and analytic solutions for a wide variety of industries. Sam is a frequent speaker at industry and technical conferences, and a recurrent contributor to industry journals.