CroftSoft
 
-About
-Contract
-Library
--Books
---AJGP
--Code
--Courses
--Links
--Media
--Tutorials
-People
--David
---Résumé
--Shannon
-Portfolio
-Update
 

Amazon.com Platinum Visa Card
 
 

CroftSoft / Library / Tutorials

The Cross-product
is the Dot Product

David Wallace Croft

2005-06-01

Abstract

This tutorial shows the relationship between the Pearson product-moment correlation coefficient from statistics and the dot product from linear algebra.

Motivation

The standard deviation seems a bit odd when you compare it to the more intuitive average absolute deviation. Worse, the standard deviation stretches to reach out to highly deviant scores which you might consider throwing out of your data anyway. One wonders why the standard deviation formula is used at all when it seems somewhat arbitrary.

If you record sample data for two variables such as x and y and then compute the Pearson product-moment correlation coefficient, it will always fall between -1 and +1. This bit of magic occurs because the correlation coefficient is the average cross-product of z scores and the z scores are scaled by those stretchy standard deviations. This is a clue that the standard deviation is not as arbitrary as it first seems.

Bear in mind that the cross-product of z scores is not the same as the cross product of two three-dimensional vectors. The cross-product of z scores comes from statistics as used in the correlation coefficient and the other cross product comes from linear algebra as used in computer graphics and physics. It turns out, however, that the cross-product of statistics is the dot product of linear algebra.

The formula for the standard deviation looks almost like the length of multidimensional vector centered at the mean. Could the standard deviation be a way of converting a data vector into a normalized unit vector? Yes, if you note that the square root of the number of samples n in the denominator of the standard deviation cancels out when you compute the correlation coefficient. You then have the projection of one n-dimensional unit vector onto another. This is the cosine of the angle between them and this always falls between -1 and +1.

Proof

  1. Mean

    m x = ( Σ i = 1 n x i ) / n

  2. Variance

    σ x 2 = ( Σ i = 1 n [ x i - m x ] 2 ) / n

  3. Standard Deviation

    σ x = σ x 2

  4. Z Score

    z x,i = ( x i - m x ) / σ x

  5. Correlation Coefficient

    r = Σ i = 1 n [ z x,i * z y,i ] n

  6. Average Cross-product

    r = Σ i = 1 n [ ( x i - m x σ x ) * ( y i - m y σ y ) ] n

  7. Move the constants

    r = 1 σ x * σ y * n * Σ i = 1 n [ ( x i - m x ) * ( y i - m y ) ]

  8. Dot Product

    r = 1 σ x * σ y * n * [ ( x - m x ) · ( y - m y ) ]

  9. Standard Deviation in Terms of Norm

    σ x = ( Σ i = 1 n [ x i - m x ] 2 ) n = x - mx n

  10. The n cancels out

    r = n * n x - mx * y - my * n * [ ( x - m x ) · ( y - m y ) ]

  11. Projection of Unit Vectors

    r = ( x - m x ) x - mx · ( y - m y ) y - my = cos ( θ )

  12. Proportionate Reduction in Error

    r 2 = cos 2 ( θ ) = 1 - sin 2 ( θ )

  13. Perpendicular to Projection

    sin ( θ )

Links

 
 

Creative Commons License
Copyright 2005 CroftSoft Inc.
You may copy this webpage under the terms of the
Creative Commons Attribution License.

Shop at Amazon.com