A short introduction about machine learning.

Type of machine learning (ML)

Predictive/ supervised learning

Goal: learn a mapping from inputs $x$ to outputs $y$, given a labeled set of input-output pairs $\mathscr{D} = \{ (\mathbf{x}_i, y_i) \}_{i=1}^N$ , a.k.a. training set.
Conditional density estimation, i.e. build models for $p(y_i |\mathbf{x}_i, \Theta)$
Classification
a.k.a. pattern recognition.
Binary classification
Multi-class classification
Multi-label classification (viewed as doing multiple binary predictions)

The mode of the distribution $p(y|\mathbf{x}, \mathscr{D})$ , a.k.a. MAP (maximum a posteriori) estimate:

Given a probabilistic output, compute the “best guess” as to the “true label”: $\hat{y} = \hat{f} (\mathbf{x}) = \arg\max_{c=1}^C p(y=c|\mathbf{x},\mathscr{D})$

Regression

Descriptive/ unsupervised learning

Goal: Only given inputs $\mathscr{D} = \{\mathbf{x}_i\}_{i=1}^N$ , find “interesting patterns” in the data (a.k.a. knowledge discovery).
Unconditional density estimation, i.e. build models for $p(\mathbf{x}_i|\Theta)$

Popular deep unsupervised generative models:

GANs
VAEs
Fully visible belief networks (FVBN)

Discovering clusters

Dimension reduction: Clustering data into groups. Let $K$ denote the number of clusters, we estimate the distribution over the number of clusters, $P(K|\mathscr{D})$, which tells us if there are subpopulations within the data.

Discovering latent factors

Reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data.

Discovering graph structure

Measure a set of correlated variables, discover which ones are most correlated with which others. We learn the graph structure from the data, i.e. compute $\hat{G} = \arg\max p(\mathscr{G} | \mathscr{D} )$ .

Matrix completion

Missing data (NaN, “not a number”) completion.

Image inpainting

“Fill in” holes in an image with realistic texture. This can be tackled by building a joint probability model of the pixels, given a set of clean images, and then inferring the unknown variables (pixels) given the known variables (pixels).

Collaborative filtering

Key idea: the prediction is not based on features of the movie or user (although it could be), but merely on a ratings matrix $\mathbf{X}(m,u)$ with user $u$ of movie $m$.

Market basket analysis

Reinforcement learning

Goal: learn how to act / behave when given occasional reward or punishment signals (e.g. how a baby learns to walk).

The Gradient

Machine Learning Basics