Sane Explanation of Change of Bases


The following is an explanation how changes of bases work in vector spaces with a focus on building intuition.

I will use the following notation in this text:

  • $[v]_\mathcal{A}$ is the vector $v \in V$ represented in basis $\mathcal{A}$
  • $T_\mathcal{A} ^{B}$ is a basis transformation matrix that takes takes a vector in basis $\mathcal{A}$ and returns the vector in basis $\mathcal{B}$ as follows: $[v]_\mathcal{B} = T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}$
  • For any linear map $M$ we define $[M]_\mathcal{A}$ as the mapping $M$ in the basis $\mathcal{A}$.
  • The standard basis $\mathcal{E}$ is the basis $\{ \begin{pmatrix}1 \\ 0\end{pmatrix}, \begin{pmatrix}0 \\ 1\end{pmatrix} \}$.

Although we will derive the results mathematically please note that the following are not complete mathematical proofs.

Quick Reminder - How do bases work?

It's best to illustrate this with a small example. Assume we have the basis $\mathcal{B} = \{ \begin{pmatrix}1 \\ 1\end{pmatrix}, \begin{pmatrix}2 \\ 0\end{pmatrix} \}$. These are the basis vectors of $\mathcal{B}$ with coefficients in the standard basis.

Assume now we have the vector $ [v]_{\mathcal{B}} = \begin{pmatrix}1 \\ -2\end{pmatrix}$ in basis $\mathcal{B}$ and would like to know it's coordinates in the standard basis.

For this we multiply the first component with the first basis vector and the second component with the second basis vector.

\[\begin{aligned} [v]_{\mathcal{E}} = 1 \cdot \begin{pmatrix} 1 \\ 1 \end{pmatrix} + (-2) \cdot \begin{pmatrix} 2 \\ 0 \end{pmatrix} = \begin{pmatrix} -3 \\ 1 \end{pmatrix} \end{aligned} \]

Hence the vector $\begin{pmatrix}1 \\ -2\end{pmatrix}$ in basis $\mathcal{B}$ is the vector $\begin{pmatrix}-3 \\ 1\end{pmatrix}$ in the standard basis.

Now we are interested in making this a bit more efficient. Let us write the vectors of the basis $\mathcal{B}$ as columns of a matrix we will call $T_{\mathcal{B}}^{\mathcal{E}} $. The $i$th vector of the basis will be denoted by $b_i$. We will shortly see why this matrix has such a weird name.

\[\begin{aligned} T_{\mathcal{B}}^{\mathcal{E}} &= \begin{pmatrix} b_1 & b_2 \end{pmatrix} = \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} \end{aligned} \]

If we now apply the matrix $T_{\mathcal{B}}^{\mathcal{E}}$ to the vector $[v]_{\mathcal{B}}$ we see something interesting.

\[\begin{aligned} \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ -2 \end{pmatrix} &= \begin{pmatrix} -3\\1 \end{pmatrix}\\ T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} &= [v]_{\mathcal{E}} \end{aligned} \]

We have seen that left multiplication of the matrix $T_{\mathcal{B}}^{\mathcal{E}}$ transforms a vector in basis $\mathcal{B}$ into the same vector but in basis $\mathcal{E}$.

This procedure is called change of basis and we have just seen our first basis transformation matrix $T_{\mathcal{B}}^{\mathcal{E}}$. Now we come to the notation: the subscript of $T$ indicates the basis we are transforming from whereas the superscript gives us the basis we are transforming to.

Hence $T_{\mathcal{B}}^{\mathcal{E}}$ transforms vectors from $\mathcal{B}$ to $\mathcal{A}$

We can conclude what we have just learned:

Remark:

Given a basis $\mathcal{B} = \{b_1, b_2, \ldots , b_n\}$ where the basis vectors themselves are given in a basis $\mathcal{A}$ (usually the standard basis), the change of basis matrix $T_{\mathcal{B}} ^ \mathcal{A}$ is given by:

\[\begin{aligned} T_{\mathcal{B}} ^ \mathcal{A} &= \begin{pmatrix} b_1 & b_2 & \ldots & b_n \end{pmatrix} \end{aligned} \]

where the basis vectors are the columns of the matrix $T_{\mathcal{B}} ^ \mathcal{A}$

Changing between multiple non-trivial bases

Now assume we have two bases $\mathcal{A}$ and $\mathcal{B}$. To understand how we can change between multiple bases we would like to find $T_{\mathcal{A}}^{\mathcal{B}}$.

First, we recall that we had $T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} = [v]_{\mathcal{E}}$, hence we have:

\[\begin{aligned} T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} &= [v]_{\mathcal{E}}\\ T_{\mathcal{A}}^{\mathcal{E}} \cdot [v]_{\mathcal{A}} &= [v]_{\mathcal{E}} \end{aligned} \]

From the discussion above we have also learned how $T_{\mathcal{B}}^{\mathcal{E}}$ and $T_{\mathcal{A}}^{\mathcal{E}}$ can be constructed.

Using simple algebra we find:

\[\begin{aligned} T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} = [v]_{\mathcal{E}} \quad &\& \quad T_{\mathcal{A}}^{\mathcal{E}} \cdot [v]_{\mathcal{A}} = [v]_{\mathcal{E}}\\ \implies T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} &= T_{\mathcal{A}}^{\mathcal{E}} \cdot [v]_{\mathcal{A}}\\ \implies [v]_{\mathcal{B}} &= \left(T_{\mathcal{B}}^{\mathcal{E}} \right)^{-1} \cdot T_{\mathcal{A}}^{\mathcal{E}} \cdot [v]_{\mathcal{A}}\\ [v]_{\mathcal{B}} &= \underbrace{\left(T_{\mathcal{B}}^{\mathcal{E}} \right)^{-1} \cdot T_{\mathcal{A}}^{\mathcal{E}}}_{T_{\mathcal{A}} ^{\mathcal{B}} } \cdot [v]_{\mathcal{A}}\\ \end{aligned} \]

The attentive reader will have realized that the matrix $\left(T_{\mathcal{B}}^{\mathcal{E}} \right)^{-1} $ is equals $T_{\mathcal{E} }^\mathcal{B} $. Although this is might be clear for a lot of people let's still see why this is intuitively true:

We have

\[\begin{aligned} T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} &= [v]_{\mathcal{E}}\\ T_{\mathcal{E}}^{\mathcal{B}} \cdot [v]_{\mathcal{E}} &= [v]_{\mathcal{B}} \end{aligned} \]

If we now multiply the first equation by $\left(T_{\mathcal{B}}^{\mathcal{E}}\right)^{-1}$ from the left we get

\[\begin{aligned} \left(T_{\mathcal{B}}^{\mathcal{E}}\right)^{-1} \cdot T_{\mathcal{B}}^{\mathcal{E}} \cdot [v]_{\mathcal{B}} &= \left(T_{\mathcal{B}}^{\mathcal{E}}\right)^{-1} \cdot [v]_{\mathcal{E}}\\ [v]_{\mathcal{B}} &= \left(T_{\mathcal{B}}^{\mathcal{E}}\right)^{-1} \cdot [v]_{\mathcal{E}}\\ \end{aligned} \]

but from above we also had $T_{\mathcal{E}}^{\mathcal{B}} \cdot [v]_{\mathcal{E}} = [v]_{\mathcal{B}}$ which means that $T_{\mathcal{E}}^{\mathcal{B}} = \left(T_{\mathcal{B}}^{\mathcal{E}}\right)^{-1}$ as expected.

This means that the inverse of a basis transformation matrix from $\mathcal{A}$ to $\mathcal{B}$ is a basis transformation matrix from $\mathcal{B}$ to $\mathcal{A}$.

Change of Basis for Linear Transformations

Now we come to the last section. Assume we have a linear map $M$ in a basis $\mathcal{B}$, we would like to find the linear map $M$ in basis $\mathcal{A}$.

If you look up the formula for such changes of bases you will find $[M]_\mathcal{A} = T_\mathcal{B}^\mathcal{A} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A}^\mathcal{B}$.

We would now like to derive this formula.

By definition of a linear mapping we have

\[\begin{aligned} [M]_\mathcal{B} \cdot [v]_\mathcal{B} &= [w]_\mathcal{B}\\ [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [w]_\mathcal{A} \end{aligned} \]

Furthermore we have

\[\begin{aligned} [v]_\mathcal{B} &= T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}\\ \implies &\left(T_\mathcal{A} ^{\mathcal{B}}\right)^{-1} [v]_{\mathcal{B}} = [v]_{\mathcal{A}}\\ = &\left(T_\mathcal{A} ^{\mathcal{B}}\right)^{-1} [v]_{\mathcal{B}} = [v]_{\mathcal{A}} = T_{\mathcal{B}} ^\mathcal{A} [v]_\mathcal{B} \end{aligned} \]

We can now look at what happens when we apply the linear transformation $M$ to a vector $v$, both in basis $\mathcal{B}$.

\[\begin{aligned} [M]_\mathcal{B} \cdot [v]_\mathcal{B} = [w]_\mathcal{B} \end{aligned} \]

Now we substitute $[v]_\mathcal{B}$ with $ T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}$ giving us:

\[\begin{aligned} [M]_\mathcal{B} \cdot [v]_\mathcal{B} &= [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A} = [w]_\mathcal{B} \end{aligned} \]

Assume that $[M]_\mathcal{A}$ is the linear transformation in basis $\mathcal{A}$. We can get the same result $[w]_\mathcal{B}$ using $[M]_\mathcal{A}$. We have $[M]_\mathcal{A} \cdot [v]_\mathcal{A} = [w]_\mathcal{A}$. Using the fact that $[w]_\mathcal{A} = T_{\mathcal{B}}^{\mathcal{A}} \cdot [w]_{\mathcal{B}} $, we can replace $[w]_\mathcal{A}$ with $T_{\mathcal{B}}^{\mathcal{A}} \cdot [w]_{\mathcal{B}}$ giving us:

\[\begin{aligned} [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [w]_\mathcal{A}\\ \implies [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= T_{\mathcal{B}}^{\mathcal{A}} \cdot [w]_{\mathcal{B}}\\ \implies \left( T_{\mathcal{B}}^{\mathcal{A}}\right)^{-1} \cdot [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [w]_{\mathcal{B}}\\ = T_{\mathcal{A}}^{\mathcal{B}} \cdot [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [w]_{\mathcal{B}} \end{aligned} \]

These algebraic transformations might seem a bit arbitrary but let's recap the two formulas we have just derived:

\[\begin{aligned} T_{\mathcal{A}}^{\mathcal{B}} \cdot [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [w]_{\mathcal{B}}\\ [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A} &= [w]_\mathcal{B} \end{aligned} \]

Now we can combine them into a single equation:

\[\begin{aligned} T_{\mathcal{A}}^{\mathcal{B}} \cdot [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}\\ \implies [M]_\mathcal{A} \cdot [v]_\mathcal{A} &=\left(T_{\mathcal{A}}^{\mathcal{B}}\right)^{-1} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}\\ = [M]_\mathcal{A} \cdot [v]_\mathcal{A} &= T_{\mathcal{B}}^{\mathcal{A}} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \cdot [v]_\mathcal{A}\\ \end{aligned} \]

If we now remove the multiplication by $[v]_\mathcal{A}$ (which means we don't apply the linear operator to a vector $[v]_\mathcal{A}$) we get:

\[\begin{aligned} [M]_\mathcal{A} &= T_{\mathcal{B}}^{\mathcal{A}} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}} \end{aligned} \]

This is exactly the expression we wanted to derive. Since we have assumed that $\mathcal{A}, \mathcal{B}$ can be any bases, we can also conclude $[M]_\mathcal{B} = T_{\mathcal{A}}^{\mathcal{B}} \cdot [M]_\mathcal{A} \cdot T_\mathcal{B} ^{\mathcal{A}}$.

But what is the intuition behind this formula?

The formula $[M]_\mathcal{A} = T_{\mathcal{B}}^{\mathcal{A}} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}}$ might still feel a bit unintuitive. Why for example is $[M]_\mathcal{A} = T_{\mathcal{B}}^{\mathcal{A}} \cdot [M]_\mathcal{B}$ not correct?

The easiest way to think about this is that $[M]_{\mathcal{B}}$ expects a vector in basis $\mathcal{B}$ and outputs a vector in basis $\mathcal{B}$. If we want to apply a vector in basis $\mathcal{A}$ to $[M]_{\mathcal{B}}$ we must:

  • first transform the input into basis $\mathcal{B}$, if we apply $[M]_{\mathcal{B}}$ to a vector in basis $\mathcal{A}$ we don't get anything meaningful.
  • secondly transform the output of the transformation $[M]_{\mathcal{B}}$ which is in basis $\mathcal{B}$ back into basis $\mathcal{A}$, if we do not transform it back our output will be in basis $\mathcal{B}$.

Hence, intuitively the change of basis for a linear map must look as follows: $[M]_\mathcal{A} = T_{\mathcal{B}}^{\mathcal{A}} \cdot [M]_\mathcal{B} \cdot T_\mathcal{A} ^{\mathcal{B}}$.

Concluding remarks

I hope this discussion helped you strengthen your understanding and give you some background information why the transformations look the way they do.

As most of my texts this one might also contains some errors. If you do spot mistakes or would simply like to give me some feedback feel free to send me an email.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
© 2020