## Gram-Schmidt orthogonalization

The context here is that we have some desired vector $v^*$ that we want to build out of a set of basis vectors $v_i$ through weighted summation. The case where this is easiest is when all of our vectors $v_i$ are orthogonal with respect to each other. Recalling that a dot product of two vectors gives us a measure of their similarity, two vectors are orthogonal if their dot product is 0. A basic example of this is the set $[1,0],[0,1]$, or the rotation of these vectors 45 degrees, $[.7071, .7071],[-.7071, .7071]$.

If we have an orthogonal basis set of vectors, then to generate the weights for each of the basis vectors we simply take the dot product between each $v_i$ and our desired vector $v^*$. For example, with our basis sets from above, the weights to generate the vector $[.45 -.8]$ can be found as

$w_1 = \langle [.45, -.8] , [1, 0] \rangle = .45 \\ w_2 = [.45, -.8] \cdot [0, 1] = -.8,$

where $\langle \rangle$ denotes dot (or inner) product, and leads to

$w_1 = \langle [.45, -.8] , [.7071, .7071] \rangle = -0.2475 \\ w_2 = \langle [.45, -.8] , [-.7071, .7071] \rangle = -0.8839.$

And now we have weights $w_i$ such that for each basis set $\sum_i w_i v_i = v^*$. Written generally, to find the weights we have $w_i = \frac{v_i \cdot v^*}{||v_i||}$. The denominator here is the norm of $v_i$, introduced for generality. In the example set our basis sets were composed of unit vectors (vectors with magnitude = 1), but in general normalization is required.

Now, what if we don’t have an orthogonal basis set? Trouble, that’s what. With a non-orthogonal basis set, such as $[1, .4], [-.1, 1]$, when we try our dot product business to find our coefficients looks what happens

$w_1 = \frac{\langle [.45, -.8] , [1, .4] \rangle}{||[1,.4]||} = .3682 \\ w_2 = \frac{\langle [.45, -.8] , [-.1, 1] \rangle}{||[-.1, 1]||} = -.8408,$

and

$.1207 \cdot [1,.4] + -.8408 \cdot [-.1, 1] = [0.2048, -0.7925]$,

which is not a good reconstruction of our desired vector, $[.45, -.8]$. And the more the cross contribution to the same dimensions between different basis vectors, the worse this becomes. Of course, we could use a least squares method to find our basis set coefficients, but that involves matrix multiplications and inverses, and generally becomes more complex than we want.

So, let’s say we have a basis set of two different, but non-orthogonal vectors, $v_1$ and $v_2$. We instead want two vectors $u_1$ and $u_2$ which describe the same space, but are orthogonal. By describing the same space, I mean that their span is the same. And by span I mean the set of values that can be generated through weighted summation of the two vectors. So we set $u_1 = v_1$, and the task is now to find the appropriate $u_2$. As a conceptual description, we want $u_2$ to be equal to $v_2$, but only covering the area of space that $u_1$ isn’t covering already. To do this, we can calculate at the overlap between $u_1$ and $v_2$, then subtract out that area from $v_2$. The result should then give us the same area of state space covered by $v_1$ and $v_2$, but in a set of orthogonal vectors $u_1$ and $u_2$.

Mathematically, we calculate the overlap between $u_1$ and $v_2$ with a dot product, $\langle u_1, v_2 \rangle$, normalized by the magnitude of $u_1$, $||u_1||$, and then subtract from $v_2$. All together we have

$u_2 = v_2 - \frac{\langle u_1, v_2 \rangle}{||u_1||}$.

Using our non-orthogonal example above,

$u_1 = [1, .4]$

$u_2 = [-.1, 1] - \frac{\langle [-.1, 1] , [1, .4] \rangle}{||[-.1, 1]||} = [-0.3785, 0.7215].$

Due to roundoff during the calculation of $u_2$, the dot product between $u_1$ and $u_2$ isn’t exactly zero, but it’s close enough for our purposes.

OK, great. But what about if we have 3 vectors in our basis set and want 3 orthogonal vectors (assuming we’ve moved to a 3-dimensional space) that span the same space? In this case, how do we calculate $u_3$? Well, carrying on with our intuitive description, you might assume that the calculation would be the same as for $u_2$, but now you must subtract out from $v_3$ that is covered by $u_1$ and $u_2$. And you would be correct:

$u_3 = v_3 - \frac{\langle u_1, v_3 \rangle}{||u_1||} - \frac{\langle u_2, v_3 \rangle}{||u_2||}$.

In general, we have

$u_i = v_i - \sum_j \frac{\langle u_j, v_i \rangle}{||u_j||}$.

And now we know how to take a given set of basis vectors and generate a set of orthogonal vectors that span the same space, which we can then use to easily calculate the weights over our new basis set of $u_i$ vectors to generate the desired vector, $v^*$.