The Einstein Summation notation is a concise and powerful way to represent tensor operations, often used in physics and machine learning. It allows us to write complex calculations on tensors in a compact form. We will cover the basics on Einstein summation, how to use it in Python with Numpy and Tensorflow, and provide examples to illustrate its use.
Basics of Einstein Summation
The Einstein Summation notation (Einsum) is based on the idea of summing over repeated indices in tensor expressions. It is based on the following two rules:
1. Summation over repeated indices: If an index appears twice in a term, it is summed over
2. Free indices: Indices that appear only once are free indices and represent the axes of the output tensor
Let’s illustrate this with the example of multiplying two matrices A and B: the resulting matrix C is defined as
$$ C_{ik} = \sum\limits_{j}^{}A_{ij}B_{jk} $$
In Python, both the Numpy and Tensorflow libraries provide an einsum function.
Numpy
import numpy as np
# Define two matrices A and B
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Perform matrix multiplication using einsum
C = np.einsum('ij,jk->ik', A, B)
print(C)
# [[19 22]
# [43 50]]
In the example above, ij,jk->ik
is the einsum string:
ij
represents the indices of matrix A
jk
represents the indices of matrix B
->ik
specifies the indices of the output matrix C
The operation sums over the index j
The same code in Tensorflow would look like
import tensorflow as tf
# Define two matrices A and B
A = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
B = tf.constant([[5, 6], [7, 8]], dtype=tf.float32)
# Perform matrix multiplication using einsum
C = tf.einsum('ij,jk->ik', A, B)
print(C)
# tf.Tensor(
# [[19. 22.]
# [43. 50.]], shape=(2, 2), dtype=float32)
More Examples
Inner Product of Vectors
The inner product (dot product) of two vectors a and b is defined as
$$ c = \sum\limits_{i}^{}a_{i}b_{i} $$
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.einsum('i,i->', a, b)
print(c) # Output: 32
Outer Product of Vectors
The outer product of two vectors a and b is given by:
$$ C_{ij} = a_{i}b_{j} $$
C = np.einsum('i,j->ij', a, b)
print(C)
# Output
# [[4 5 6]
# [8 10 12]
# [12 15 18]]
Transpose of a Matrix
The transpose of a matrix A can be obtained by swapping its indices
A_transpose = np.einsum('ij->ji', A)
print(A_transpose)
# Output
# [[1. 3.]
# [2. 4.]]
Trace of a Matrix
The trace of a matrix A is the sum of its diagonal elements:
$$ Tr(A) = \sum\limits_{i}^{}A_{ii}A_{ii} $$
trace = np.einsum('ii->', A)
print(trace)
# Output: 5.0
Batch Matrix Multiplication
Einsum is particularly useful for batch operations. Suppose we have a batch of matrices A and B, and we want to multiply the corresponding matrices in the batch:
A = np.random.rand(3, 2, 2)
B = np.random.rand(3, 2, 2)
# Perform batch matrix multiplication
C = np.einsum('bij,bjk->bik', A, B)
print(C)
Here, b
represents the batch dimension.
Advantages of the Einsum Notation
1. Conciseness: The Einsum notation is compact, and can represent complex operations succinctly
2. Flexibility: It can handle a wide variety of tensor operations without explicitly reshaping or transposing arrays
3. Efficiency: Many libraries optimize the einsum operations internally, potentially leading to better performance.