0%

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

简介

本篇文章来自于: Deep Unsupervised Learning using Nonequilibrium Thermodynamics,arXiv:1503.03585v8 [cs.LG] 18 Nov 2015

该篇文章为首次提出Deffusion Model的概念。算法的主要目标是构造一个前向传播、扩散的过程,通过这个过程可以将复杂的分布逐渐变为一个简单的分布。

生成图片

其中第一行是 swiss roll 数据,通过扩散过程,从左到右,逐渐变为一个高斯分布。第二行是训练的模型,从右到左逐步从高斯分布生成原始的数据分布。

Forward Trajectory

数据分布为q(x(0)),最终分布π(y),其中利用马尔科夫扩散核Tπ(y|y; β)β为扩散率。

$$ \begin{align} \pi(y) &= \int \mathrm{d}y' T_{\pi}(y|y';\beta)\pi (y') \\ q(x^{(t)}|x^{(t-1)}) &= T_{\pi}(x^{(t)}|x^{(t-1)};\beta_t) \\ q(x^{(0\dots T)}) &= q(x^{(0)})\prod_{t=1}^{T} q(x^{(t)}|x^{(t-1)}) \\ \end{align} $$

Reverse Trajectory

p为逆向使用数据的过程。

$$ \begin{align} p(x^{(T)}) &= \pi(x^{(T)}) \\ p(x^{(0\dots T)}) &= p(x^{(T)})\prod_{t=T}^{1} p(x^{(t-1)}|x^{(t)}) \\ \end{align} $$

Model Probability

p(x(0)) = ∫dx(1⋯T)p(x(0⋯T))

但是,事实上逆向轨迹几乎不可能被追踪,因此需要借助前向过程。 $$ \begin{align} p(x^{(0)})&=\int \mathrm{d}x^{(1\cdots T)}p(x^{(0\cdots T)})\frac{q(x^{(1\cdots T)|x^{(0)}})}{q(x^{(1\cdots T)|x^{(0)}})} \\ &=\int \mathrm{d}x^{(1\cdots T)}q(x^{(1\cdots T)}|x^{(0)})\frac{p(x^{(0\cdots T)})}{q(x^{(1\cdots T)}|x^{(0)})} \\ &=\int \mathrm{d}x^{(1\cdots T)}q(x^{(1\cdots T)}|x^{(0)})p(x^{(T)})\prod_{t=T}^{1}\frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})} \\ \end{align} $$

训练

目标是为了最小化模型似然估计。

$$ \begin{align} L &= \int \mathrm{d}x^{(0)}q(x^{(0)})\ln p(x^{(0)}) \\ &= \int \mathrm{d}x^{(0)}q(x^{(0)})\ln \left( \int \mathrm{d}x^{(1\cdots T)}q(x^{(1\cdots T)}|x^{(0)})p(x^{(T)})\prod_{t=T}^{1}\frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right) \\ &\geq \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)})\ln \left( p(x^{(T)})\prod_{t=T}^{1}\frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right) \\ &= \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)})\ln p(x^{(T)}) + \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \sum_{t=T}^{1}\ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)\\ &= \int \mathrm{d}x^{(T)}q(x^{(T)})\ln \pi(x^{(T)}) + \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \sum_{t=T}^{1}\ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)\\ &= \int \mathrm{d}x^{(T)}q(x^{(T)})\ln \pi(x^{(T)}) + \sum_{t=1}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)\\ &= \sum_{t=1}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)-H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)-H_p (x^{T})+\int \mathrm{d}x^{(0,1)}q(x^{(0, 1)}) \ln\left( \frac{p(x^{(0)}|x^{(1)})}{q(x^{(1)}|x^{(0)})}\right)\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)-H_p (x^{T})+\int \mathrm{d}x^{(0,1)}q(x^{(0, 1)}) \ln\left( \frac{\pi(x^{(0)})}{\pi(x^{(1)})}\right)\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)})}\right)-H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t)}|x^{(t-1)}, x^{(0)})}\right)-H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t-1)}|x^{(t)}, x^{(0)})} \frac{q(x^{(t-1)}|x^{(0)})}{q(x^{(t)}|x^{(0)})} \right)-H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t-1)}|x^{(t)}, x^{(0)})}\right)+\sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln \left( \frac{q(x^{(t-1)}|x^{(0)})}{q(x^{(t)}|x^{(0)})}\right) -H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t-1)}|x^{(t)}, x^{(0)})}\right)+\sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \left( \ln q(x^{(t-1)}|x^{(0)})-\ln{q(x^{(t)}|x^{(0)})}\right) -H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t-1)}|x^{(t)}, x^{(0)})}\right)+\sum_{t=2}^{T} \left( H_q(x^{(t)}|x^{(0)})-H_q(x^{(t-1)}|x^{(0)})\right) -H_p (x^{T})\\ &= \sum_{t=2}^{T} \int \mathrm{d}x^{(0\cdots T)}q(x^{(0\cdots T)}) \ln\left( \frac{p(x^{(t-1)}|x^{(t)})}{q(x^{(t-1)}|x^{(t)}, x^{(0)})}\right)+ H_q(x^{(T)}|x^{(0)})-H_q(x^{(1)}|x^{(0)})-H_p (x^{T})\\ &= -\sum_{t=2}^{T} \int \mathrm{d}x^{(0, t)}q(x^{(0, t)}) \text{D}_{KL}\left( {q(x^{(t-1)}|x^{(t)}, x^{(0)})}||{p(x^{(t-1)}|x^{(t)})} \right)+ H_q(x^{(T)}|x^{(0)})-H_q(x^{(1)}|x^{(0)})-H_p (x^{T})\\ &=K \end{align} $$

其中公式(16)定义Hp(xT) = −∫dx(T)q(x(T))ln π(x(T));公式(20)因为这个过程是马尔科夫过程,只与前一个状态有关;公式(21)为贝叶斯公式。经过以上的变换,成功找到下界,任务目标变为:

(x(t − 1)|x(t)) = argmaxp(x(t − 1)|x(t))K