# Kernel in Linear Algebra / Inner Product Space / Hyperplane / SVM / Kernel Function / Normed vector space / Metric Space

Yao Yao on May 9, 2018

## Kernel in Linear Algebra: Part 1

The null space of matrix $A$ is the set of all vectors $\vec v$ such that $A \vec v = \vec 0$

Wikipedia: Kernel (linear algebra): The kernel (also known as null space or nullspace) of a linear map (i.e. linear transformation) $L : V \rightarrow W$ between two vector spaces $V$ and $W$, is the set of all elements $\mathbf{v}$ of $V$ for which $L(\mathbf{v}) = \mathbf{0}$, where $\mathbf{0}$ denotes the zero vector in $W$. That is, in set-builder notation,

A linear map is essentially a function，so we can define:

• $L(\mathbf{v})$ is the image of $\mathbf{v}$ in $W$ (注意有 $L(\mathbf{v}) \in W$)
• The image of $L$ is $\mathrm{im}(L) = \left \{ L(\mathbf{v}) \mid \mathbf{v} \in V \right \}$ (所以 $\mathrm{im}(L)$ 是 $W$ 的子空间)

Two elements of $V$ have the same image in $W$ if and only if their difference lies in the kernel of $L$:

The image of $L$ is isomorphic to the quotient of $V$ by the kernel:

• In linear algebra, the quotient ([ˈkwəʊʃənt], 商) of a vector space $V$ by a subspace $N$ is a vector space obtained by “collapsing” $N$ to zero. The space obtained is called a quotient space and is denoted $V/N$ (read $V$ mod $N$ or $V$ by $N$).
• Morphism refers to a structure-preserving map from one mathematical structure to another.
• In universal algebra and in model theory, a structure consists of a set along with a collection of finitary operations (an operation that takes a finite number of input values) and relations that are defined on it.
• Structure 的大类有 groups (群), rings (环), fields (域，比如实数域) and vector spaces
• 比如 Vector space 的定义就是 vector 的集合以及定义在 vector 上的 add、multiply 的操作
• structure-preserving 的意思就是你的 mapping 不能跳出 structure 的范围
• Isomorphism is a morphism that can be reversed by an inverse morphism

This implies the rank–nullity theorem:

where, by rank we mean the dimension of $\mathrm{im}(L)$, and by nullity that of $\ker(L)$.

• vector space 的 demension 即 basis vector 的个数

• $V = \left \{ \icol{x \newline y} \mid x^2 + y^2 = 1 \right \}$
• $W = \mathbb{R}$
• $L(\mathbf{v}) = A \mathbf{v} = \icol{1 & 0} \icol{x \newline y} = x$
• 相当于把 $x^2 + y^2 = 1$ 这个圆压扁到 x-axis
• $\ker(L)=\left \{ \mathbf{v} \in V \mid \mathbf{v} = \icol{0 \newline y} \right \}$，即 y-axis 上的所有 vector
• $\mathrm{im}(L) = \left \{ x \mid -1 \le x \le 1 \right \}$
• $\dim(\ker(L)) = 1, \, \dim(\mathrm{im}(L)) = 1, \, \dim(V) = 2$

## Inner Product Space

Given a vector space $V$ on a field $K$, an inner product between the vectors $\mathbf{v}$ and $\mathbf{w}$ in $V$, which we denote by the symbol $\langle \mathbf{v}, \mathbf{w} \rangle$, is any operation of $V \times V \rightarrow K$ such that the following three properties are satisfied for every $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and for every $a, b \in K$:

• (Positivity or Positive Definiteness):
• $\langle \mathbf{v}, \mathbf{v} \rangle \ge 0$
• $\langle \mathbf{v}, \mathbf{v} \rangle = 0 \iff \mathbf{v} = \mathbf{0}$
• (Linearity): $\langle a\mathbf{v} + b\mathbf{v}, \mathbf{w} \rangle = a\langle \mathbf{u}, \mathbf{w} \rangle + b\langle \mathbf{v}, \mathbf{w} \rangle$
• (Conjugate Symmetry): $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$
• conjugate 这个词的意义有很多，比如：
• $x - y$ 是 $x + y$ 的 conjugate
• $a - b\mathrm{i}$ 是 $a + b\mathrm{i}$ 的 conjugate
• $F=Y^{-1}GY$ 是 $G$ 的 conjugate

Two vectors of an inner product space are orthogonal if and only if their inner product is zero.

If $V$ is an inner product space, and $W$ is a subspace of $V$, we define the orthogonal complement of $W$, which we denote by $W^{\bot}$, as the set of all the vectors of $V$ that are orthogonal to every vector of $W$:

## Kernel in Linear Algebra: Part 2

Let $A: V \rightarrow W$ be a linear transformation of finite-dimensional vector spaces. Then

Proof:

1. 如果有 $A: V \rightarrow W$，那么 $A^T$ 就是个 $W \rightarrow V$ 的 linear transformation
2. 这里 inner product 其实就是 dot product 了
3. 这个其实可以扩展到任意的 linear operator $A$，只是不用 $A^{T}$ 而是 adjoint $A^{\ast}$ 来表示逆运算

If $\mathbf{v} \in \mathrm{im}(A^{T}) \subset V$, then $\mathbf{v} = A^{T} \mathbf{w}$ for some $\mathbf{w} \in W$. Now given $\mathbf{u} \in \ker(A)$, then $A \mathbf{u} = \mathbf{0}$. Therefore,

That is $\mathbf{v} \in (\ker(A))^{\bot}$. 后续证明暂略。感觉直接证 adjoint 更简单……

## Hyperplane

Let $\mathrm{R}^n$ be an $n$-dimension vector space and $\mathbf{a} = (a_1, a_2, \dots, a_n)$. $\left \{ a_i \right \}$ are scalars not all equal to 0. Then the set $S = \left \{ \mathbf{x} \in \mathrm{R}^n \mid \mathbf{a} \cdot \mathbf{x} = b \right \}$, where $b$ is a constant, is a subspace of $\mathrm{R}^n$ called a hyperplane.

Hyperplane 的 normal vector 可以由 transformation 的 gradient 给出：$\mathbf{n} = \nabla A(\mathbf{x})$，刚好有 $\nabla A(\mathbf{x}) = \mathbf{a}$，所以 hyperplane 的 normal vector $\mathbf{n} = \mathbf{a}$

Normal vector 的性质是它垂直于 hyperplane。

• 几何学中，hyperplane 是个平面（假设 3-D 情况下）
• 集合论中，hyperplane 是 vector 的集合（vector space）
• 我们说 vector $\mathbf{v} \in H$, where $H$ is a hyperplane，并不是说在几何学中，整个 vector $\mathbf{v}$ 都在 $H$ 这个平面之中，而是说 $\mathbf{v}$ 从原点出发，终点会落在 $H$ 这个平面之中

1. （几何学）Hyperplane 平面上没有 vector，只有 vector 的终点；只是当 hyperplane 平面经过原点时，它恰巧包含 vector
2. （几何学）从原点出发，终点落到 hyperplane 平面上的 vector $\mathbf{x}$ 才满足 $A(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} - b = 0$，并不是 hyperplane 平面上的 vector 满足 $A(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} - b = 0$
• （集合论）Hyperplane 这个 vector space 内的所有 vector $\mathbf{x}$ 满足 $A(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} - b = 0$
3. （几何学）Normal vector 垂直于 hyperplane 平面，并不是说 Normal vector 垂直于 hyperplane 平面上的所有 vector
• （集合论）Normal vector 也并不垂直于 hyperplane 这个 vector space 内的所有 vector（除非 hyperplane 经过原点）
4. （几何学）我们可以把 原点 + hyperplane 平面 的整体结构想象成一个底面积无限大的圆锥：
• 顶点是原点
• 底是 hyperplane 平面
• 顶点到底面任意一点的连线都是 vector
• 顶点到底面几何中心的连线是一个 normal vector
• 这个 normal vector 的长度，即圆锥的高是 $\left | \frac{b}{\vert \mathbf{a} \vert } \right |$
• 这个 normal vector 不一定是 $\mathbf{a}$，但一定与 $\mathbf{a}$ 共线
• 换言之 $\mathbf{a}$ 不一定落在 hyperplane 平面上
• 再换言之 $\mathbf{a}$ 相当于是从原点出发的一个旋转臂，它的方向即圆锥高的方向
• 已知长度和方向，我们可以得知这个 normal vector $\mathbf{n’} = \left | \frac{b}{\vert \mathbf{a} \vert } \right | \frac{\mathbf{a}}{\vert \mathbf{a} \vert} = \left | \frac{b}{\vert \mathbf{a} \vert^2 } \right | \mathbf{a}$
• （集合论）所有顶点到底面的连线的集合构成 hyperplane 的 vector space

Affine transformation $A(\mathbf{x}) = \mathbf{a} \cdot \mathbf{x} - b$ 的几何意义是：

1. 以 $\mathbf{a}$ 为 x-axis，将空间内所有 $\mathbf{x}$ 压缩到一维，即压缩到 $\mathbf{a}$ 这个 x-axis 上
2. $\mathbf{x}$ 压缩后位于 $x = \mathbf{a} \cdot \mathbf{x}$ 位置
3. 将所有压缩后的 $\mathbf{x}$ 左移 $b$
• 此时位于 $x = 0$ 位置的所有 $\mathbf{x}$ 反压缩后即是 hyperplane vector space
• 此时原点位于 $x = -b$ 位置
• 这个长度为 $\vert b \vert$ 的移动距离反压缩后即是圆锥的高 $\left | \frac{b}{\vert \mathbf{a} \vert } \right |$

## SVM

“Minimize $|{\vec{w}}|$ subject to $y_{i}({\vec{w}} \cdot {\vec{x}}_{i}-b) \geq 1$, for $i=1,\,\ldots ,\,n$”.

## Kernel Method / Kernel Function

Definition 2.8 [Kernel function] A kernel is a function $\kappa$ that for all $\mathbf{x}, \mathbf{z} \in X$ satisfies

where $\phi$ is a mapping from $X$ to an (inner product) feature space $F$

1. 你在 vector space $X$ 不是 hyperplane 不可分嘛，我把你 transform 到另外一个 vector space $F$，如果在 $F$ 里 hyperplane 可分，problem solved
2. 我们需要用到 dot product，所以 $F$ 需要是个 inner product vector space
3. 我们其实并不关心 transformation $\phi$ 是怎么个 transform 法，我们只关心 dot product $\phi(x) \cdot \phi(z)$

• 这里 kernel 和 kernel function 是一个意思，而且和 Linear Algebra 里的 kernel 无关！（我宁愿你叫 inner-product method！）
• How to intuitively explain what a kernel is?: Kernel is a way of computing the dot product of two vectors in some (possibly very high dimensional) feature space, which is why kernel functions are sometimes called “generalized dot product”.
• 所以凡是涉及到 inner product 的 ML 算法，应该都可以用 kernel 处理
• 我能最快想到的场景就是 clustering，很多 metric 都可以用 inner product 表示

## Normed vector space / Metric Space

Norms and Metrics, Normed Vector Spaces and Metric Spaces: Let $V$ be a vector space. A function $\vert \cdot \vert : V \to \mathrm{R}_{+}$ is a norm on $V$ if it satisfies:

1. $\forall \mathbf{x} \in V: \vert \mathbf{x} \vert \geq 0$
2. $\forall \mathbf{x} \in V: \vert \mathbf{x} \vert = 0 \Leftrightarrow \mathbf{x} = \mathbf{0}$
3. $\forall \mathbf{x}, \mathbf{y} \in V: \vert \mathbf{x} + \mathbf{y} \vert \leq \vert \mathbf{x} \vert + \vert \mathbf{y} \vert$
4. $\forall \alpha \in \mathrm{R}, \mathbf{x} \in V: \vert \alpha \mathbf{x} \vert = \alpha \vert \mathbf{x} \vert$

A vector space together with a norm is called a normed vector space.

Wikipedia: Metric space: A metric space is an ordered pair $(M,d)$ where $M$ is a set and $d$ is a metric on $M$, i.e., a function

such that for any $x,y,z \in M$, the following holds:

1. (non-negativity or separation axiom): $d(x,y)\geq 0$
2. (identity of indiscernibles): $d(x,y)=0 \Leftrightarrow x=y$
3. (symmetry): $d(x,y)=d(y,x)$
4. (subadditivity or triangle inequality): $d(x,z) \leq d(x,y)+d(y,z)$

In other words, a normed vector space is automatically a metric space, by defining the metric in terms of the norm in the natural way. But a metric space may have no algebraic (vector) structure–i.e., it may not be a vector space–so the concept of a metric space is a generalization of the concept of a normed vector space.