Correlation, also known as the measure of the correlation coefficient, shows the relationship between two variables, and exactly how strong the two variables relate to one another and what the direction of the relationship is. (http://en.wikipedia.org/wiki/Correlation, 2008)
Correlation must be used for quantitative data - basically data that deals with numbers - it cannot be used for data such as gender, favorite color, etc. (http://www.surveysystem.com/correlation.htm, 2007-2008)

Correlation Coefficient

The correlation coefficient, or r, is the measure used to determine the correlation between two variables. The correlation coefficient must land between -1.0 and +1.0. As r gets closer to one of these values, it means the relationship between the two variables is strong. When r=0, it means that there is no relationship whatsoever between the two variables. The graphs below show perfect postive correlation, perfect negative correlation, and no correlation (http://www.mste.uiuc.edu/courses/ci330ms/youtsey/scatterinfo.html, 2005):

When r is positive, it means that when one variable increases, the other variables increases. Just opposite, when r is negative, it means that as one variables increases, the other decreases - this is often referred to as the "inverse" correlation. When the correlation is neither positive nor negative, there is said to be no correlation, or no relationship between the two said variables. (http://www.surveysystem.com/correlation.htm, 2007-2008)

How Correlation Applies to Real Life

Now we're going to take a look at an example to show how correlation might be applied to real life, and how someone would calculate it. First, we need a set of data for two variables. Below is a table which shows the data for two variables, Height and Self Esteem.

Person

Height

Self Esteem

1

68

4.1

2

71

4.6

3

62

3.8

4

75

4.4

5

58

3.2

6

60

3.1

7

67

3.8

8

68

4.1

9

71

4.3

10

69

3.7

11

68

3.5

12

67

3.2

13

63

3.7

14

62

3.3

15

60

3.4

16

63

4.0

17

65

4.1

18

67

3.8

19

63

3.4

20

61

3.6

After getting the data into a table like above, you can graph the data into two histograms; one for each variable - height and self esteem - as shown below.

hist1.gif (3391 bytes)

hist2.gif (3476 bytes)

From these histograms, you can find the descriptive data, which includes things like mean, mode, minimum and maximum values, etc. A table for the descriptive date of these two graphs is below.

Variable

Mean

StDev

Variance

Sum

Minimum

Maximum

Range

Height

65.4

4.40574

19.4105

1308

58

75

17

Self Esteem

3.755

0.426090

0.181553

75.1

3.1

4.6

1.5

This data for descriptive data is what you need in order to actually compute the correlation between height and self esteem. Without this data, you would not be able to plug in values for the different variables in the correlation equation.
Next we have a two-variable plot, or scatter plot, which will visually show the relationship between the two variables mentioned.

corrbv.gif (2807 bytes)

From this scatterplot alone, we can already see that the correlation between height and self esteem is postive. According to William M.K. Trochim (2006), "if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right", which would thus indicate a postive correlation.
Since we've already found the direction of the correlation, it is now time to determine the strength. In order to do so, you must use the formula for the correlation:

corrform1.gif (3131 bytes)

The symbol r is the variable used for correlation coeffecient, which is the measure of correlation. Because we do not have some of the above listed information yet, we need a table which displays this necessary information (for example, the sum of the products of paired scores, sum of x scores, etc.) All of this necessary data is simply displayed in the table below:

Person

Height (x)

Self Esteem (y)

x*y

x*x

y*y

1

68

4.1

278.8

4624

16.81

2

71

4.6

326.6

5041

21.16

3

62

3.8

235.6

3844

14.44

4

75

4.4

330

5625

19.36

5

58

3.2

185.6

3364

10.24

6

60

3.1

186

3600

9.61

7

67

3.8

254.6

4489

14.44

8

68

4.1

278.8

4624

16.81

9

71

4.3

305.3

5041

18.49

10

69

3.7

255.3

4761

13.69

11

68

3.5

238

4624

12.25

12

67

3.2

214.4

4489

10.24

13

63

3.7

233.1

3969

13.69

14

62

3.3

204.6

3844

10.89

15

60

3.4

204

3600

11.56

16

63

4

252

3969

16

17

65

4.1

266.5

4225

16.81

18

67

3.8

254.6

4489

14.44

19

63

3.4

214.2

3969

11.56

20

61

3.6

219.6

3721

12.96

Sum =

1308

75.1

4937.6

85912

285.45

The data information in the first three columns is no different from the very first table, while the last three columns are computations of the height and self esteem data (such as x*y, or x times y, etc.). The very last row in the table is all of the sums of each column. This data is what we need in order to plug numbers in for the variables in the correlation formula. The bottom row of sums is more easily displayed below:

corrform2.gif (945 bytes)

From here, we must take the above values and put them into our correlation formula in order to determine the strength of the relationship between height and self esteem. Shown below are the steps taken in order to execute the correlation formula properly.

corrform3.gif (3949 bytes)

This means that the correlation between height and self esteem is at .73, or is a fairly strong positive correlation. (Trochim, 2006)

## Determining Correlation

Correlation, also known as the measure of the correlation coefficient, shows the relationship between two variables, and exactly how strong the two variables relate to one another and what the direction of the relationship is. (http://en.wikipedia.org/wiki/Correlation, 2008)

Correlation must be used for quantitative data - basically data that deals with numbers - it cannot be used for data such as gender, favorite color, etc. (http://www.surveysystem.com/correlation.htm, 2007-2008)

## Correlation Coefficient

The correlation coefficient, or r, is the measure used to determine the correlation between two variables. The correlation coefficient must land between -1.0 and +1.0. As r gets closer to one of these values, it means the relationship between the two variables is strong. When r=0, it means that there is no relationship whatsoever between the two variables. The graphs below show perfect postive correlation, perfect negative correlation, and no correlation (http://www.mste.uiuc.edu/courses/ci330ms/youtsey/scatterinfo.html, 2005):

When r is positive, it means that when one variable increases, the other variables increases. Just opposite, when r is negative, it means that as one variables increases, the other decreases - this is often referred to as the "inverse" correlation. When the correlation is neither positive nor negative, there is said to be no correlation, or no relationship between the two said variables. (http://www.surveysystem.com/correlation.htm, 2007-2008)

## How Correlation Applies to Real Life

Now we're going to take a look at an example to show how correlation might be applied to real life, and how someone would calculate it. First, we need a set of data for two variables. Below is a table which shows the data for two variables, Height and Self Esteem.

PersonHeightSelf EsteemAfter getting the data into a table like above, you can graph the data into two histograms; one for each variable - height and self esteem - as shown below.

From these histograms, you can find the descriptive data, which includes things like mean, mode, minimum and maximum values, etc. A table for the descriptive date of these two graphs is below.

VariableMeanStDevVarianceSumMinimumMaximumRangeThis data for descriptive data is what you need in order to actually compute the correlation between height and self esteem. Without this data, you would not be able to plug in values for the different variables in the correlation equation.

Next we have a two-variable plot, or scatter plot, which will visually show the relationship between the two variables mentioned.

From this scatterplot alone, we can already see that the correlation between height and self esteem is postive. According to William M.K. Trochim (2006), "if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right", which would thus indicate a postive correlation.

Since we've already found the direction of the correlation, it is now time to determine the strength. In order to do so, you must use the formula for the correlation:

The symbol r is the variable used for correlation coeffecient, which is the measure of correlation. Because we do not have some of the above listed information yet, we need a table which displays this necessary information (for example, the sum of the products of paired scores, sum of x scores, etc.) All of this necessary data is simply displayed in the table below:

PersonHeight (x)Self Esteem (y)x*yx*xy*ySum =130875.14937.685912285.45From here, we must take the above values and put them into our correlation formula in order to determine the strength of the relationship between height and self esteem. Shown below are the steps taken in order to execute the correlation formula properly.

This means that the correlation between height and self esteem is at .73, or is a fairly strong positive correlation. (Trochim, 2006)

## Resources

1. Creative Research Systems(2007-2008). Correlation. Retrieved November 6, 2008, from http://www.surveysystem.com/correlation.htm.2. Graphing Unit(2005). Scatter Plots. Retrieved November 6, 2008, from http://www.mste.uiuc.edu/courses/ci330ms/youtsey/scatterinfo.html.3. Trochim, William M.K. (2006).

Research Methods Knowledge Base.Correlation. Retrieved November 6, 2008, from http://www.socialresearchmethods.net/kb/statcorr.php.4. Wikipedia(2008). Correlation. Retrieved November 6,2008, from http://en.wikipedia.org/wiki/Correlation.