Determining Correlation


Correlation, also known as the measure of the correlation coefficient, shows the relationship between two variables, and exactly how strong the two variables relate to one another and what the direction of the relationship is. (http://en.wikipedia.org/wiki/Correlation, 2008)
Correlation must be used for quantitative data - basically data that deals with numbers - it cannot be used for data such as gender, favorite color, etc. (http://www.surveysystem.com/correlation.htm, 2007-2008)

Correlation Coefficient


The correlation coefficient, or r, is the measure used to determine the correlation between two variables. The correlation coefficient must land between -1.0 and +1.0. As r gets closer to one of these values, it means the relationship between the two variables is strong. When r=0, it means that there is no relationship whatsoever between the two variables. The graphs below show perfect postive correlation, perfect negative correlation, and no correlation (http://www.mste.uiuc.edu/courses/ci330ms/youtsey/scatterinfo.html, 2005):
external image scatter1.gifexternal image Scatter1.gifexternal image scatter0.gif
When r is positive, it means that when one variable increases, the other variables increases. Just opposite, when r is negative, it means that as one variables increases, the other decreases - this is often referred to as the "inverse" correlation. When the correlation is neither positive nor negative, there is said to be no correlation, or no relationship between the two said variables. (http://www.surveysystem.com/correlation.htm, 2007-2008)


How Correlation Applies to Real Life


Now we're going to take a look at an example to show how correlation might be applied to real life, and how someone would calculate it. First, we need a set of data for two variables. Below is a table which shows the data for two variables, Height and Self Esteem.
Person
Height
Self Esteem
1
68
4.1
2
71
4.6
3
62
3.8
4
75
4.4
5
58
3.2
6
60
3.1
7
67
3.8
8
68
4.1
9
71
4.3
10
69
3.7
11
68
3.5
12
67
3.2
13
63
3.7
14
62
3.3
15
60
3.4
16
63
4.0
17
65
4.1
18
67
3.8
19
63
3.4
20
61
3.6

After getting the data into a table like above, you can graph the data into two histograms; one for each variable - height and self esteem - as shown below.
hist1.gif (3391 bytes)
hist1.gif (3391 bytes)

hist2.gif (3476 bytes)
hist2.gif (3476 bytes)

From these histograms, you can find the descriptive data, which includes things like mean, mode, minimum and maximum values, etc. A table for the descriptive date of these two graphs is below.
Variable
Mean
StDev
Variance
Sum
Minimum
Maximum
Range
Height
65.4
4.40574
19.4105
1308
58
75
17
Self Esteem
3.755
0.426090
0.181553
75.1
3.1
4.6
1.5

This data for descriptive data is what you need in order to actually compute the correlation between height and self esteem. Without this data, you would not be able to plug in values for the different variables in the correlation equation.
Next we have a two-variable plot, or scatter plot, which will visually show the relationship between the two variables mentioned.
corrbv.gif (2807 bytes)
corrbv.gif (2807 bytes)

From this scatterplot alone, we can already see that the correlation between height and self esteem is postive. According to William M.K. Trochim (2006), "if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right", which would thus indicate a postive correlation.
Since we've already found the direction of the correlation, it is now time to determine the strength. In order to do so, you must use the formula for the correlation:
corrform1.gif (3131 bytes)
corrform1.gif (3131 bytes)

The symbol r is the variable used for correlation coeffecient, which is the measure of correlation. Because we do not have some of the above listed information yet, we need a table which displays this necessary information (for example, the sum of the products of paired scores, sum of x scores, etc.) All of this necessary data is simply displayed in the table below:
Person
Height (x)
Self Esteem (y)
x*y
x*x
y*y
1
68
4.1
278.8
4624
16.81
2
71
4.6
326.6
5041
21.16
3
62
3.8
235.6
3844
14.44
4
75
4.4
330
5625
19.36
5
58
3.2
185.6
3364
10.24
6
60
3.1
186
3600
9.61
7
67
3.8
254.6
4489
14.44
8
68
4.1
278.8
4624
16.81
9
71
4.3
305.3
5041
18.49
10
69
3.7
255.3
4761
13.69
11
68
3.5
238
4624
12.25
12
67
3.2
214.4
4489
10.24
13
63
3.7
233.1
3969
13.69
14
62
3.3
204.6
3844
10.89
15
60
3.4
204
3600
11.56
16
63
4
252
3969
16
17
65
4.1
266.5
4225
16.81
18
67
3.8
254.6
4489
14.44
19
63
3.4
214.2
3969
11.56
20
61
3.6
219.6
3721
12.96
Sum =
1308
75.1
4937.6
85912
285.45
The data information in the first three columns is no different from the very first table, while the last three columns are computations of the height and self esteem data (such as x*y, or x times y, etc.). The very last row in the table is all of the sums of each column. This data is what we need in order to plug numbers in for the variables in the correlation formula. The bottom row of sums is more easily displayed below:
corrform2.gif (945 bytes)
corrform2.gif (945 bytes)

From here, we must take the above values and put them into our correlation formula in order to determine the strength of the relationship between height and self esteem. Shown below are the steps taken in order to execute the correlation formula properly.
corrform3.gif (3949 bytes)
corrform3.gif (3949 bytes)

This means that the correlation between height and self esteem is at .73, or is a fairly strong positive correlation. (Trochim, 2006)

Resources

1. Creative Research Systems (2007-2008). Correlation. Retrieved November 6, 2008, from http://www.surveysystem.com/correlation.htm.
2. Graphing Unit (2005). Scatter Plots. Retrieved November 6, 2008, from http://www.mste.uiuc.edu/courses/ci330ms/youtsey/scatterinfo.html.
3. Trochim, William M.K. (2006). Research Methods Knowledge Base. Correlation. Retrieved November 6, 2008, from http://www.socialresearchmethods.net/kb/statcorr.php.
4. Wikipedia (2008). Correlation. Retrieved November 6,2008, from http://en.wikipedia.org/wiki/Correlation.