The correlation coefficient is the degree of connection between the two variables. Its calculation gives an idea of whether there is a relationship between two data sets. Unlike regression, the correlation does not allow predicting the values of the quantities. However, the calculation of the coefficient is an important stage in the preliminary statistical analysis. For example, we found that the correlation coefficient between the level of foreign direct investment and the rate of GDP growth is high. This gives us an idea that in order to ensure well-being, it is necessary to create a favorable climate for foreign entrepreneurs. Not such an obvious conclusion at first glance!
Correlation and causality
Perhaps, there is not a single sphere of statistics that would so firmly enter our life. Correlation coefficient is used in all areas of social knowledge. Its main danger lies in the fact that often its high values are speculated in order to convince people and make them believe in some conclusions. However, in fact, a strong correlation does not at all indicate a causal relationship between the quantities.
Correlation coefficient: the Pearson and Spearman formula
There are several main indicators that characterize the relationship between the two variables. Historically, the Pearson coefficient of linear correlation is the first. It is still in school. It was developed by K. Pearson and J. Yul on the basis of the works of Fr. Galton. This coefficient allows us to see the relationship between rational numbers that change rationally. It is always greater than -1 and less than 1. A negative number indicates an inversely proportional relationship. If the coefficient is zero, then there is no relationship between the variables. It is equal to a positive number - there is a directly proportional relationship between the investigated quantities. Spearman's rank correlation coefficient makes it possible to simplify calculations by constructing a hierarchy of variable values.
Relations between variables
Correlation helps to find an answer to two questions. First, is the relationship between the variables positive or negative. Secondly, how strong is the dependence. Correlation analysis is a powerful tool with which you can get this important information. It is easy to see that family incomes and expenditures fall and grow proportionally. Such a connection is considered positive. On the contrary, with the increase in the price of the commodity, the demand for it falls. Such a connection is called negative. The values of the correlation coefficient are between -1 and 1. Zero means that there is no relationship between the investigated quantities. The closer the result to the extreme values, the stronger the connection (negative or positive). The absence of dependence is indicated by a coefficient from -0.1 to 0.1. It should be understood that such a value indicates only the absence of a linear connection.
The use of both indicators is subject to certain assumptions. First, the presence of a strong link does not cause the fact that one quantity determines the other. There may well be a third value that defines each of them. Secondly, the high correlation coefficient of Pearson does not indicate a causal relationship between the variables studied. Thirdly, it shows an exclusively linear relationship. Correlation can be used to estimate meaningful quantitative data (for example, atmospheric pressure, air temperature), rather than categories such as sex or a favorite color.
Multiple correlation coefficient
Pearson and Spearman investigated the relationship between two variables. But how to proceed if there are three or more. Here, a multiple correlation coefficient comes to the rescue. For example, the gross national product is affected not only by foreign direct investment, but also by the monetary and fiscal policies of the state, as well as by the level of exports. The growth rate and the volume of GDP are the result of the interaction of a number of factors. However, one must understand that the model of multiple correlation is based on a number of simplifications and assumptions. First, multicollinearity between quantities is eliminated. Secondly, the relationship between the dependent and the influencing variables is considered linear.
Areas of use of correlation-regression analysis
This method of finding the relationship between values is widely used in statistics. It is most often resorted to in three main cases:
- To test the causal relationships between the values of two variables. As a result, the researcher hopes to detect linear dependence and derive a formula that describes the relationship between variables. Units of measurement may be different.
- To check if there is a relationship between the quantities. In this case, no one determines which variable is dependent. It may turn out that the value of both quantities is due to some other factor.
- To derive the equation. In this case, you can simply substitute the numbers in it and find out the values of the unknown variable.
Man in search of a cause-effect relationship
Consciousness is arranged in such a way that we need to explain the events that are taking place around us. A person always seeks a connection between the picture of the world in which he lives and the information he receives. Often the brain creates order from chaos. He can easily see the cause-and-effect relationship where it does not exist. Scientists have to specifically learn to overcome this trend. The ability to assess the relationship between data is objectively necessary in an academic career.
Bias of the media
Consider how the presence of correlation may be misinterpreted. A group of British students, different bad behavior, questioned as to smoke if their parents. Then the test was published in the newspaper. The result showed a strong correlation between Smoking parents and the offences of their children. The Professor who conducted the study, even offered to put on packs of cigarettes warning of this. However, there are a number of problems with this conclusion. First, correlation does not show which of the variables is independent. Therefore, it is possible to assume that the pernicious habit of the parents caused the children of disobedience. Secondly, it is impossible to say with certainty that the two problems are appeared due to some third factor. For example, low-income families. It should be noted the emotional aspect of the initial findings of the Professor who conducted the study. He was a staunch opponent of Smoking. It is therefore not surprising that he interpreted the results of their study that way.
An incorrect interpretation of the correlation as a cause-effect relationship between the two variables can lead to shameful errors in the studies. The problem is that it lies at the very heart of human consciousness. Many marketing tricks are built on this particular feature. Understanding the difference between the causal relationship and the correlation allows us to rationally analyze information both in everyday life and in a professional career.