Exponential family

A distribution is in the single-parameter exponential family if its PDF can be expressed in a particular form. Formally, it is a set $\{ p(\cdot|\theta): \theta \in \Theta \}$ of PMFs or PDFs on $\mathbb{R}^d$ such that…

$$ f(x|\theta) = h(x) \exp \big[ \eta(\theta)^T \cdot T(x) -A(\theta) \big] $$

with parameter $\theta \in \mathbb{R}^k$, data $x \in \mathbb{R}^d$, and four new functions:

Parameter Meaning
$\eta_i: \Theta \to \mathbb{R}$ Natural parameter; if $\eta(\theta) = \theta$: canonical form
$T_i: \mathbb{R}^d \to \mathbb{R}$ Sufficient statistic
$h: \mathbb{R}^d \to [0, \infty)$ Base measure – affects the support/scaling of the distribution
$A: \Theta \to [0, \infty)$ Log partition function: acts as normalizing constant, so that PDF is valid

Alternate parameterizations

There are a couple other ways to parameterize the same concept, the most popular being with the normalization term moved out of the exponential: $A(\boldsymbol{\eta}) = - \log g(\boldsymbol{\eta})$.

$$ f(x|\theta) = h(x) g(\theta)\exp \big[ \eta(\theta)^T \cdot T(x) \big] $$

Why they matter 💪🏼

  • Their sufficient statistics can summarize an arbitrary number of i.i.d. samples using a fixed number of parameters.
  • They have conjugate priors
  • Their posterior predictive distributions can be written in closed form
  • They are all maximum entropy distributions for some set of constraints



Show that the Binomial distribution is a member of the exponential family.

Our goal is to rearrange the Binomial PMF into the form $f(x|\theta) = h(x) \exp \big[ \eta(\theta)^T \cdot T(x) -A(\theta) \big]$.

Recall the Binomial PDF $$ \begin{aligned} X &\sim \text{Bin}(n, p) \\ p(x | p) &= {n \choose x} p^x (1-p)^{n-x} \\ \end{aligned} $$

Take $\exp\Big(\log\big(p(x|p)\big)\Big)$ and manipulate terms using properties of logarithms. $$ \begin{aligned} p(x|p) &= \exp \bigg( \log {n \choose x} + x \log p + (n-x) \log (1-p) \bigg) \\ &= {n \choose x} \exp \Big[ \log(\frac{p}{1 - p})x + n \log(1-p) \Big] \\ \end{aligned} $$

Match components with desired form $$ \begin{aligned} \therefore p(x|p) &= h(x) \exp \big[ \eta(\theta) T(x) - B(\theta) \big] \\[8pt] \text{with } & h(x) = {n \choose x} \\ & \eta(p) = \log(\frac{p}{1-p}) \\ & T(x) = x \\ & A(p) = -n\log(1-p) \end{aligned} $$

Source: Exponential Family: Binomial Distribution (fixed n) (YouTube, statisticsmatt)


Show that the Normal distribution is a member of the exponential family.

Our goal is to rearrange the Normal PDF into the form $f(x|\theta) = h(x) \exp \big[ \eta(\theta)^T \cdot T(x) -A(\theta) \big]$.

Recall the Normal PDF, $$ \begin{aligned} X &\sim \mathcal{N}(\mu, \sigma^2) \\ p(x|\mu, \sigma^2) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \bigg(-\frac{(x-\mu)^2}{2 \sigma^2}\bigg) \\ \end{aligned} $$

Pull the $\sigma^2$ term out of the normalizing constant into the exponential term. $$ \begin{aligned} &= \frac{1}{\sqrt{2\pi}} \exp \big( -\log \sigma \big) \exp \bigg(-\frac{x^2 - 2\mu x + \mu^2}{2 \sigma^2}\bigg) \\ &= \frac{1}{\sqrt{2\pi}} \exp \bigg( \frac{x \mu}{\sigma^2} + \frac{x^2}{-2 \sigma^2} - \frac{\mu}{2 \sigma^2} - \log \sigma \bigg) \\ &= \frac{1}{\sqrt{2\pi}} \exp \bigg( \begin{bmatrix} \frac{\mu}{\sigma^2} & \frac{-1}{2 \sigma^2} \end{bmatrix} \begin{bmatrix} x \\ x^2 \end{bmatrix} - \frac{\mu}{2 \sigma^2} - \log \sigma \bigg) \\ \end{aligned} $$

Now match with our known form, $$ \begin{aligned} \therefore p(x|\mu, \sigma^2) &= h(x) \exp \big[ \boldsymbol{\eta}(\mu, \sigma^2) \boldsymbol{T}(x) - A(\mu, \sigma^2) \big] \\[8pt] \text{with } & h(x) = \frac{1}{\sqrt{2 \pi}} \\ & \boldsymbol{\eta}(\mu, \sigma^2) = \begin{bmatrix} \frac{\mu}{\sigma^2} & \frac{-1}{2 \sigma^2} \end{bmatrix}^\prime \\ & \boldsymbol{T}(x) = \begin{bmatrix} x & x^2 \end{bmatrix}^\prime \\ & A(\mu, \sigma^2) = \Big( \frac{\mu}{2 \sigma^2} - \log \sigma \Big) \end{aligned} $$

Source: Exponential Family: Normal Distribution (YouTube, statisticsmatt)


Show that the Poisson distribution is a member of the exponential family.

Our goal is to rearrange the Poisson PMF into the form $f(x|\theta) = h(x) \exp \big[ \eta(\theta)^T \cdot T(x) -A(\theta) \big]$.

Recall the Poisson PMF, $$ \begin{aligned} X &\sim \text{Pois}(\theta) \\ p(x|\lambda) &= \frac{e^{-\lambda} \lambda^x}{x!} \\ \end{aligned} $$

Manipulate into the desired by from by taking the exponential of the log of the PMF, $$ \begin{aligned} p(x|\lambda) &= \exp \bigg( \log \Big( \frac{e^{-\lambda} \lambda^x}{x!} \Big) \bigg) \\ &= \exp \Big( x \log(\lambda) - \lambda - \log(x!) \Big) \\ &= \frac{1}{x!} \exp \Big( \log(\theta) x - \lambda \Big) \end{aligned} $$

Now match with known form, $$ \begin{aligned} \therefore p(x|\lambda) &= h(x) \exp \big[ \eta(\lambda) T(x) - A(\lambda) \big] \\[8pt] \text{with } & h(x) = \frac{1}{x!} \\ & \eta(\lambda) = \log(\lambda) \\ & T(x) = x \\ & A(\lambda) = \lambda \end{aligned} $$

Source: Exponential Family: Poisson Distribution (YouTube, statisticsmatt)

Further reading 📖

Exponential Family: Mean and Variance (YouTube, statisticsmatt) – Good intuitive explanation which matches the form used on wikipedia’s page for Exponential Family. Many other video tutorials use alternative forms, which I find are less intuitive to build understanding.

The Exponential Family (Jake Tae)

  • Generalized Linear Models
    • Most GLMs are part of the Exponential family, because all of these distributions are maximum entropy distributions.

© Geoff Ruddock 2020