【入門機器學習】機器學習基石(上) - Week1

前言

此課程來自台大林軒田教授放在Coursera上的課程《機器學習基石上 (Machine Learning Foundations)—Mathematical Foundations》，往後會跟著此課程的進度去更新文章!

Week 1 : The Learning Problem

Learning : Observation → Learning → Skill(提升Performance measure/表現增進)
Machine Learning : Data → Machine Learning → Skill
使用ML時機 : 我們想要讓機器做一些事情但是發現如果只是靠我們的腦力，然後去把這些規則寫成程式的話，可能不容易做到；所以有另外一個方法，就是讓機器自己去分析資料，自己學會怎麼做這些事情。
1. 無法手動設計程式時
2. 無法簡易定義Solution時
3. 需要瞬間決定，但人類無法做到時
4. 服務大量使用者時
關鍵要素 :
1. 問題有underlying pattern可以被學習，讓表現增進可以被提升
2. 規則無法被定義
3. Data當input
符號 :
- input : $x \in \mathcal{X}$
- output : $y \in \mathcal{Y}$
- Target Function(目標函數) : unknown pattern to be learned
  - f : $\mathcal{X} \to \mathcal{Y}$
- Target Example : Data
  - $\mathcal{D} ={(x_1,y_1),(x_2,y_2),…,(x_n,y_n)}$
- Hypothesis : skill with hopefully good performance
  - g : $\mathcal{X} \to \mathcal{Y}$
- 總結 :
  1. Unknown target function : f
  2. trainning examples : $\mathcal{D}$
  3. learning algorithm : $\mathcal{A}$
  4. final hypothesis : $\color{blue}{g \approx \mathsf{f}}$
    → f為unkown
    → g跟f不同，但越像越好
- 可以先將預期的各種function : $h_1,h_2,…,h_n$，
  放入Hypothesisi set $\color{blue}{\mathcal{H}}$中，並透過$\color{blue}{\mathcal{A}}$去判斷說哪個才是我要的$g$
- Learning model = $\mathcal{A}$ and $\mathcal{H}$
ML定義 : 使用Data去計算很接近target f的hypothesis g
差別 :
- Data Mining : 使用大量資料去找到有趣的Property
  - 傳統DM也著重在大型DB中的有效運算
- Artificial Intelligence : 運算出展現智能行為的something
  - ML是實現AI的方法之一
- Statistics : 使用資料去推論一個unknown process
  - Statistics是實現ML的方法之一