SVM

By keeping $y$ of points at the margin line being $ \pm 1$,let

\[\label{constrain}\min\limits_{x} |wx+b| =1\]

At the same time, make the margin(the smallest distance of points to the line ) as large as possible :

\[\label{wrong} \max\limits_{w} \min\limits_{x} \frac{\lVert wx+b\rVert }{\lVert w\rVert} = \max\limits_{w} \frac{1}{\lVert w \rVert} \\ \text{subject to: }y_i(wx_i+b)\geq 0\]

For easier optimization, we change $\max\limits_{w} \frac{1}{\lVert w \rVert}$ to $\min\limits_{w} \lVert w \rVert$. And in the equation $\ref{wrong}$, we use $\ref{constrain}$ to simplify formula $\ref{wrong}$ but not included in the constrain. So we have the final formula:

\[\min\limits_{w} \frac{1}{2}\lVert w \rVert^2 \\\text{subject to: }y_i(wx_i+b)\geq 1\]

Then we can use algorithm in the convex optimization to minimize our objective(why it’s convex? check out page 66 in the reference). We have

\[\max \limits_{\alpha}\min\limits_{w} \frac{1}{2}\lVert w \rVert^2 - \sum \alpha_i(y_i(wx_i+b)-1) \\ \text{subject to: }\alpha_i \geq 0\]

Calculate the gradient for each variable(note:b is also a variable):

\[\begin{equation} \begin{aligned} \nabla_w F(x) = w + \sum_i \alpha_i(-y_ix_i) =0\\ \nabla_b F(x) = \sum_i -\alpha_iy_i =0\\ \nabla_{\alpha_i}F(x) = -y_i(wx_i+b-1) < 0 \\ \sum_i \alpha_i(-y_i(wx_i+b-1)) = 0 \end{aligned} \end{equation}\]

So:

\[\begin{equation} w = \sum_i \alpha_i y_i x_i \end{equation}\]

Replace \(w\) back to the formula and apply the dual problem:

Then the objective become:

\[\begin{gather} \max\limits_{\alpha} \frac{1}{2}(\sum_i \alpha_i y_i x_i)^2- \sum_i \alpha_i(y_i( \sum_j \alpha_j y_j x_jx_i+b)-1) \\ \text{subject to: } \sum_i -\alpha_iy_i =0 \and \alpha_i \ge 0 \end{gather}\] \[\begin{align} \max\limits_{\alpha} -\frac{1}{2}\sum_i \sum_j \alpha_i \alpha_j y_i y_j x_i x_j - \sum_i \alpha_i y_i b + \sum_i \alpha_i\\ \text{subject to: } \sum_i -\alpha_iy_i =0 \and \alpha_i \ge 0 \end{align}\] \[\begin{align} \max\limits_{\alpha} -\frac{1}{2}\sum_i \sum_j \alpha_i \alpha_j y_i y_j x_i x_j + \sum_i \alpha_i\\ \text{subject to: } \sum_i -\alpha_iy_i =0 \and \alpha_i \ge 0 \end{align}\]

This is the convex function.

And the result is

\[\begin{align} h(x) &= wx+b\\ &= \sum_i \alpha_i y_i x_i x + b \end{align}\]

\(b\) can be obtained by using any support vector \(x_j\)

\[\begin{equation} b=y_i-\sum_i \alpha_i y_i x_i x_j \end{equation}\]
Written on January 26, 2022