- Published on

# Linear Algebra 101 for AI/ML – Part 2

- Authors
- Name
- Backprop
- @trybackprop

## Intro

In Part 1 of this **Linear Algebra 101 for AI/ML Series**, we learned about the fundamental building blocks of linear algebra: **vectors** and **matrices**. We learned how element-wise operations apply to the vector/matrix's individual elements **independently**, and then we played around with these mathematical building blocks using an open source ML framework called PyTorch, widely used in academia and industry.

In this article, **Part 2** of the series, we will build on that foundational knowledge. First, we will cover an operation that applies to the vector/matrix's individual elements **as a whole**: the **dot product**. This operation belongs to a class of operations that **do not** view the individual elements **independently**. Then we will visualize the dot product operation to build intuition, and finally, we will learn about **embeddings**, which are special types of vectors that represent concepts, objects, and ideas. **Embeddings** are used throughout modern AI and have applications in large language models, image generation models, and recommendation systems. In the article, you will find **questions**, **a quiz**, and **two interactive playgrounds** (the Interactive Dot Product Playground and the Interactive Embedding Explorer are best viewed on laptop/desktop) that were designed to help you understand the concepts.

In Part 3 of the series, we will use all the fundamental linear algebra we will have learned in Part 1 and Part 2 to build an **image search engine**. Below is a preview of the **image search engine**, which returns images similar to the user's input image.

Without further ado, let's get started!

## Dot Product

We will approach the **dot product** from two perspectives: an algorithmic perspective and a visual one.

## Algorithmic Perspective

Below are two vectors, ${\color{cyan}{\vec{a}}} \in \mathbb{R}^{\color{fuchsia}{4}}$ and ${\color{orange}{\vec{b}}} \in \mathbb{R}^{\color{fuchsia}{4}}$.

$\left. {\color{cyan}{\vec{a}}} = \begin{bmatrix} 1 \\ 2 \\ 4 \\ 8 \\ \end{bmatrix} \quad {\color{orange}{\vec{b}}} = \begin{bmatrix} 1 \\ 0.5 \\ 0.25 \\ 0.125 \\ \end{bmatrix} \quad \right\}{\color{fuchsia}{4}} \text{ rows}$The notation for the dot product is simply a dot between two vectors: ${\color{cyan}{\vec{a}}} \cdot {\color{orange}{\vec{b}}}$. The algorithm for calculating a dot product is simply to sum the products of corresponding pairs between two vectors. **Let's break down what that means.** First, let's calculate the products:

Next, we sum the products:

${\color{cyan}{1}} \cdot {\color{cyan}{1}} + {\color{orange}{2}} \cdot {\color{orange}{0.5}} + {\color{yellow}{4}} \cdot {\color{yellow}{0.25}} + {\color{magenta}{8}} \cdot {\color{magenta}{0.125}} = 4$Thus, ${\color{cyan}{\vec{a}}} \cdot {\color{orange}{\vec{b}}} = 4$. Let's see how to calculate the dot product with PyTorch.

```
>>> a = torch.tensor([1.0, 2.0, 4.0, 8.0])
>>> b = torch.tensor([1.0, 0.5, 0.25, 0.125])
>>> torch.dot(a, b)
tensor(4.)
```

To put it another way, to calculate the dot product, **first** we use the **element-wise multiply** operation, and **then** we sum up the products. Just for fun, let's implement `torch.dot`

with native Python:

```
>>> sum([a[i] * b[i] for i in range(len(a))]) # Python implementation of dot product
tensor(4.)
```

## Visual Perspective

At this point, you're probably wondering: *what does the dot product mean*? Is there any meaning behind

**element-wise multiplying**two vectors and then summing up the products? To aid our understanding of the dot product visually, let's introduce another way to calculate the dot product. This second equation for calculating the dot product is called the

**cosine formula for the dot product**.

(We won't cover why these two values are equivalent, but the proof can be found here.)

Now that we are familiar with the notation, let's revisit the **cosine formula for the dot product**:

The reason why we introduced this new cosine formula is it allows us to interpret the dot product **geometrically**.

## Interactive Dot Product Playground

Before we dive into the explanation of the cosine formula for the dot product, let's first play around with an interactive playground that allows you to **drag the arrowhead** of the vectors around in a 2D number plane (**only works on laptop/desktop!**). Go ahead and try it out. See how the dot product between vector 1 and vector 2 changes (see the dynamic value of the dot product in the panel). Dragging the arrowheads around allows you to change **the angle** between the vectors and their **lengths**. As the cosine formula states, both **angle** and **length** (aka **norm**) have an impact on the dot product.

Vector 1:

(-3.00, 3.00)

Norm 1:

4.24

Vector 2:

(3.00, 2.00)

Norm 2:

3.61

Dot Product:

-3.00

cos(θ):

-0.1961

Tip: Click and drag the arrowheads to move the vectors on laptop/desktop

Have you played around with the interactive playground? If so, now it's time to understand the dot product.

Here, we see two vectors $\color{cyan}{\vec{a}}$ and $\color{orange}{\vec{b}}$ with an angle of $\color{yellow}{\theta}$ between them.

Let's draw a dotted line from the tip of $\color{orange}{\vec{b}}$ to $\color{cyan}{\vec{a}}$ such that the dotted line is at $90^\circ$ angle to $\color{cyan}{\vec{a}}$.

Imagine two forces. The first force moves an object from the origin, $(0,0)$, along **path 1**. The second force then takes over and moves the object along **path 2**. The object would end up at $(3,3)$. Notice that the force that pushes the object along **path 1** is in the same direction as $\color{cyan}{\vec{a}}$. In other words, it's **aligned with** $\color{cyan}{\vec{a}}$. This component is called the **projection of $\color{orange}{\vec{b}}$ onto $\color{cyan}{\vec{a}}$**.

Since **path 1**, **path 2**, and $\color{orange}{\vec{b}}$ make a right triangle, we can calculate the lengths of the paths with trigonometry. We see that the **projection of $\color{orange}{\vec{b}}$ onto $\color{cyan}{\vec{a}}$** has a length of $|{\color{orange}{\vec{b}}}| \cos({\color{yellow}{\theta}})$.

Now we see that multiplying $|{\color{cyan}{\vec{a}}}|$ and $|{\color{orange}{\vec{b}}}| \cos({\color{yellow}{\theta}})$ is essentially the same as multiplying the length of ${\color{cyan}{\vec{a}}}$ and the length of the component of ${\color{orange}{\vec{b}}}$ along the direction of ${\color{cyan}{\vec{a}}}$. That means, the more ${\color{cyan}{\vec{a}}}$ and ${\color{orange}{\vec{b}}}$ are **aligned** and pointing in the same direction, the higher the dot product. The less they are aligned, the smaller the dot product. Let's take a look at some cases below.

In this case, the two vectors are generally aligned and pointing in the same general direction. Formally, the angle between the two vectors is less than 90°. Hence, a positive dot product.

In this case, the two vectors are generally not aligned and pointing in roughly opposite directions. Formally, the angle between the two vectors is greater than 90° and less than 270°. Hence, a negative dot product.

In this case, the two vectors are perpendicular. They are neither aligned nor misaligned. Thus, the dot product is zero.

Don't worry if you're still trying to grasp these concepts. We just covered **a lot of math**. You can go back to the Interactive Dot Product Playground above to build intuition around the relationship between the dot product and the lengths and direction of vectors.

## Embeddings

How does any of this math apply to machine learning? It turns out our new knowledge of vectors and dot products can be applied to large language models like ChatGPT, image generation like DALLE, and movie recommendation systems like Netflix.

As we will learn in a future article, AI applications based on neural networks do not process images, text, video, and audio directly. Instead, these inputs are first converted to vectors and matrices, and then these vectors and matrices are passed into the neural networks, which can perform various mathematical operations on them before producing output such as a chatbot response, a synthetically generated image, or a recommended movie. Even though to human eyes these vectors and matrices might seem like random but organized lists of numbers, to the neural network, they contain concepts. Vectors that represent these concepts are called **embeddings**. Because the seemingly random numbers in the vectors are capable of representing anything from a **bird** to **electric cars** to **globalization**, we say that these **embeddings** capture **semantic meaning**.

To illustrate, let's take a look at three popular movies. Suppose The Avengers: Endgame is represented by a vector that spans from the origin $(0, 0)$ to $(3, 3)$, Spiderman by a vector that points to $(3, 1)$, and La La Land by a vector that points to $(-3, -2)$. Alternatively but subtly, we can view these movies as just the *points* at the end of the vector as opposed to the *entire* vector (e.g., Spiderman is just $(3, 1)$ as opposed to the vector pointing to $(3, 1)$). These are equivalent representations.

Since The Avengers: Endgame and Spiderman are Marvel superhero movies, their vectors would be roughly aligned and thus their dot product would be positive. However, the movie La La Land has less action and a more serious overtone. Thus its dot product with the other two movies would be negative.

We will cover how to produce these coordinates for the movies in Part 3 of this series, but for now, assume these are the points/vectors representing their movies. These vectors are *meaningless* to us if we just *randomly* choose values for the vectors, but if they are chosen in such a way that the vectors for The Avengers: Endgame and Spiderman point to coordinates that are closer together than they are to the coordinate for La La Land, the vectors could be useful. What operation would tell us the degree to which two points are close together or the degree to which two vectors are aligned? The **dot product**.

This is a useful concept in machine learning because we can convert **almost anything** into an embedding if we have a properly trained neural network model. This concept of using the dot product to gauge the similarity between concepts, ideas, and objects will be the basis of the image search engine we'll build in Part 3.

## Embeddings from OpenAI's CLIP Model

The diagram above with the three movies contained a toy example. Let's use a properly trained neural network to produce **embeddings** from words of five different categories. Suppose we have words from the following categories:

- 🌹 flowers
- 🧪 elements of the periodic table
- 🎸 music genres
- ⚽️ sports
- 🗼 European cities

Intuitively, if we had vectors that represented words from these different categories, ideally the vectors representing words from the **same** category would point to coordinates that are **clustered together**. Let's explore this idea. Below is an interactive playground (viewable on laptop/desktop only) that allows you to examine the embeddings of various words from these categories. The embeddings were produced by passing the words into a neural network from OpenAI called CLIP. We'll discuss more about CLIP in Part 3 of this series, but in essence, this model is able to accept either **text** or **images** as **input** and produce **embeddings** as **output**. Hover over each word to see their 2D coordinates. Determine if the words that are visually close together belong to the same category.

We passed in five different categories of words to CLIP, and as expected, five distinct clusters appeared among the embeddings. Notice that the genres of music are clustered together in the center, the types of flowers are together on the left, the sports are in the upper right-hand corner, the European cities are in the bottom right-hand corner, and the elements of the periodic table are on the bottom left. One exception is the word *pop*. While pop is a genre of music, it is also an overloaded term that has multiple meanings, which is probably why it's not clearly clustered together with the other music genres.

## Conclusion

Congratulations! We just made significant progress toward building cool and exciting ML applications in the next part of this series. We learned the algorithm to calculate the **dot product**, and then we gained visual intuition around this operation. Then we learned about a special type of vector called **embeddings**, and we explored the embeddings generated by a neural network called CLIP. In Part 3, we will tie all of this knowledge together to build an **image search engine**.

## Quiz