Unsupervised Learning
Unsupervised Learning is used to 'understand data'.
In contrast to Supervised Learning where the data is some inputs and their outputs, the data in Unsupervised Learning is just a set of inputs and no outputs, i.e., data is of the form \(\{x^1,x^2,x^3,...,x^n\}\), where \(x^i\in\mathbb{R}^d\).
They build models that compress, explain and group data.
Dimensionality Reduction
Basic Setup
Data: \(\{x^1,x^2,x^3...,x^n\}\), where \(x^i\in\mathbb{R}^d\)
The goal of the encoder is to take in a \(d\)-dimensional vector and compress it to give a \(d'\)-dimensional vector.
Mathematically, Encoder \(f:\mathbb{R}^d\to\mathbb{R}^{d'}\)
Typically \(d'<d\)
The goal of the decoder is to take in a \(d'\)-dimensional vector and decompress it to give a \(d\)-dimensional vector.
Mathematically, Decoder \(g:\mathbb{R}^{d'}\to\mathbb{R}^{d}\)
Typically \(d'<d\)
Goal of the dimensionality reduction is \(g(f(x^i))\approx x^i\)
Loss of the dimensionality reduction is
Uses of Dimensionality Reduction
Dimensionality Reduction finds its main uses in compression and reduction.
Density Estimation
It gives out a probabilistic model, i.e., the output is a model that stores different configurations of reality.
Basic Setup
Data: \(\{x^1,x^2,x^3,\dots,x^n\}\), where \(x^i\in\mathbb{R}^d\)
Probabilistic Model
It gives out a probabilistic model \(P:\mathbb{R}^d\to\mathbb{R}_+\).
All outputs sums up to \(1\), i.e., \(\sum_{i=1}^nP(x^i) = 1\)
The goal of probability estimation is to to give a probabilistic model \(P\) such that \(P(x)\) is large if \(x\in\text{Data}\), and low otherwise.
Loss for the probabilistic model \(P\) is negative log likelihood, i.e.,