Correlation, its types- (Positive Correlation, Perfect Positive Correlation, Negative correlation, Perfect negative correlation, Zero correlation ) and Causation

 



In statistics, we study the movement or variability in variables and attributes. The effect of their past values on present and future. Also, sometimes the effect of other variables or attributes on them. Suppose we want to study the
future price of car. Then, just studying the past trend values might not be enough. There can be other factors affecting the price, like price of fuel, raw materials etc. So, in such cases, we study the relationship between/among variables.


Data are basically information collected in various forms (numbers, texts, audio, video etc…) about various individuals or objects. These can be both quantitative and qualitative type. Quantitative data are measurable numerically while qualitative data can’t be measured numerically. Quantitative data are called variables and qualitative data are called attributes. Height is an example of variables and beauty is an example of attributes. The study of relationship between/ among them is called association.

The study of linear relationship between/among variables or attributes is called correlation. If there exists a linear relationship, then they are said to be correlated or else uncorrelated. There are three types of correlation. They are-

    

a. Positive Correlation- When increase in a variable/attribute leads to an increase in other variables/attributes or decrease in one lead to a decrease in other then they are said to be positively correlated. For example, the more the density/frequency of rain is, the more will be the sale of umbrella.

 

b.      Negative Correlation- When increase in a variable/attribute leads to a decrease in other variables/attributes or decrease in one lead to an increase in other then they are said to be negatively correlated. For example, the more the price of a commodity is, the lesser the demand for it will be.

 

c. Zero Correlation- When increase/decrease in a variable/attribute leads to no change in other variables/attributes, then they are said to be zero correlated or uncorrelated. For example, height and intelligence.


 

There are five types of correlation based on Strength of correlation. They are-

 

Range of value of correlation coefficient

Strength

0.70-1.00

Very Strong

0.50-0.69

Strong

0.30-0.49

Moderate

0.10-0.29

Weak

0.01-0.09

Very Weak[1]

Correlation can be expressed in two ways either by calculating the numerical value or in form of scatter diagram.

 The study of linear relationship (straight line relationship, y=mx + c) between two variables can be studied in two ways. One of which is scatter diagram. Scatter diagram/ plot is a two-dimensional diagram using cartesian co-ordinates. Where x and y co-ordinates represent the corresponding values of two variables.

The closer the points are, more is the correlation and the wider spread the points are, lesser is the correlation. If the points are upward rising from left to right, then the correlation is positive correlation. If the points are downward from right to left, then the correlation is negative correlation. If no particular pattern is identified then the correlation is zero correlation.

 

The other way is correlation coefficient. One important term to be understand here is covariance. 

Covariance measures the relationship between two variable and the extend of change in one variable due to a linear change in another variable.


Variance-covariance matrix is a two-dimensional symmetrical (i.e., aij=aji) array consisting of covariance between each pair of columns in the dataset along the non-diagonal values and variance along the diagonal. Covariance values can positive, negative or 0 and variance values are always positive.


Let us consider the same example as covariance

Here,

Var(X)= 180223.1

Var(Y) = 122

Cov(X,Y)= 4494.8

 

Then Variance Covariance matrix is-

 Var (X)                            Cov(X,Y)                       180223.1                     4494.8

Cov(Y,X)                               Var(Y)               =         4494.8                            122 

There are four types of correlation coefficient.

1.      Pearson Correlation Coefficient: Karl Pearson’s coefficient of correlation measures the linear relationship between two variables.

Karl Pearson’s correlation coefficient is very useful. We may assume a relationship between two variables but that might not be the case. For example, the cause of inflation in India, everyone has a separate opinion. Some blames govt. policies, some blames Ukrain-Russsia war But without proper data and use of correlation these all would be meaningless arguments. We need to first find whether there exists any relationship between inflation and any of these factors.

2.      Partial Correlation Coefficient: Partial Correlation coefficient measures the linear relationship between two variables keeping other as constant.

Now, for the inflation, many factors can work together rather than a single factor. And to identify the effect of one particular factor, we can use partial correlation. For example, war and inflation while we keep effect of policy as constant.

 


3.      Multiple Correlation Coefficient: Multiple Correlation coefficient measures the linear relationship among a dependent and multiple independent variables.

Multiple correlation will help us to study the joint effect of all the factors on inflation.

 

4.      Rank Correlation Coefficient: Rank Correlation coefficient measures the degree of association between ranked variables. The observations in variables are ranked in descending order. And then correlation is calculated for these ranked observations.


Where di is the difference between x and y values for ith observation.

If two or more observations are equal/tied. Then these values are added up and divided by the number of tied observations. The calculated result becomes the corresponding rank for all these observations. Then we proceed with the next corresponding rank.

Suppose, we want to study the relationship between students’ score in literature and maths. In such cases we would use rank correlation. Or suppose we want to study the relationship between IQ and activeness, in such cases we would use rank correlation.



 

Correlation only studies the linear relationship between/ among variables. But there can be other types of relationships existing in between/among variables. So, if variables are uncorrelated then it means they can be otherwise related. So, if variables are independent then the correlation will be 0. If they are uncorrelated yet they may not be independent.

Correlation implies linear relationship and not cause and effect relationship. Causation on the other hand is the study of cause-and-effect relationship. If two variables are correlated, they might not be because of each other. There can exist a third factor which might be the cause and the changes in other two factors might be effect. For example, when demand for sweater increases, the demand for ice cream falls. But it is not that the demand for sweater affects the demand for ice-cream. Rather, there is a third factor “Weather or Season” which affects the changes in both demand for sweater and ice-cream. So, sale of ice-cream and sweater are correlated but not causated. Thus correlation does not  imply causation.

-Jags


Comments

Popular posts from this blog

WHY STATISTICS?

Everyone is a born Statistician!

STORY TELLING WITH STATISTICS