【入門機器學習】機器學習基石(上) - Week7

若 $N$ 與 $k$ 夠大，則 : $m_\mathcal{H} (N) \le B(N, k)=\Sigma_{i=0}^{k-1}$ $B (N, k) = Σ_{i = 0}^{k - 1} (\begin{matrix} N \\ i \end{matrix})$ $\le \color{blue}{N^{k-1}}$
判斷學習是否有效 :
1. if $m_\mathcal{H}(N)$ breaks at $k$ (代表good $\mathcal{H}$)
2. if $N$ 夠大 (代表 good $\mathcal{D}$)
  → 則probably genralized $E_{out} \approx E_{in}$
3. if $\mathcal{A}$ picks a $g$ with small $E_{in}$ (代表 good $\mathcal{A}$)
  → 則probably learned

VC Dimension (VC維度) :

EX :

效果 : good $\mathcal{H}$ $\Longrightarrow$ finite $d_{\text{VC}}$ $\Longrightarrow$ 能確保 $E_{out} (g) \approx E_{in} (g)$
- 與 Learning Alogorithm $\mathcal{A}$ 無關
- 與 Input Distribution $P$ 無關
- 與 Target Function $f$ 無關

用VC Dimension之概念套用於2D PLA中 :
證明在$d$-D Perceptrons中 : $d_{\text{VC}}=d+1$
1. 證明 $d_{\text{VC}}\ge d+1$ : There are some $d+1$ inputs we can shatter.
2. 證明 $d_{\text{VC}}\le d+1$ : We can’t shatter any set of $d+2$ inputs.
由VC Bound之公式可推得 : 在機率為 $\ge 1- \delta$ 的狀況下(即 GOOD 狀況)
- Generalization Error : $|E_{in}(g)-E_{out}(g)| \le \sqrt{\frac{8}{N} \ln{(\frac{4(2N)^{d_\text{VC}}}{\delta}})}$
- 進一步可推得 : $E_{out}(g) \le E_{in}(g)+\sqrt{\frac{8}{N} \ln{(\frac{4(2N)^{d_\text{VC}}}{\delta}})}$
- $\sqrt{\cdots} = \color{red}{\Omega{(N,\mathcal{H},\delta)}}$ : 代表 Hypothesis Set 有多強，但同時在 Geralization 時也要付出那麼多代價
- 最好的 $d_{\text{VC}}$ 位於中間值
  - 因此 powerful $\mathcal{H}$ not always good
VC Bound可進而推得 : Sample Complexity
- 理論上 : $N \approx 10,000d_{\text{VC}}$
- 實際應用上 : $N \approx 10d_{\text{VC}}$
- 以上狀況稱為 Looseness of VC Bound
  - 形成理由 :
    - Hoeffding for 未知的 $E_{out}$
    - 使用 $m_\mathcal{H}(N)$，而非 $|\mathcal{H}(x_1,\cdots ,x_N)|$
    - 使用 $N^{d_\text{VC}}$，而非 $m_\mathcal{H}(N)$
    - Union Bound on Worst cases