Infomation Theory Basics

Notes about infomation theory basics.


In information theory, an entropy of a distribution pp, is captured by the following equation:

H[p]=jp(j)logp(j)H[p] = \sum_j -p(j) \log p(j)

Or for single probalility:

H[y]=ylogyH[y] = - y \log y

Entropy is level of surprise experienced by someone who knows the true probability.

Nut and Bit

In order to encode data drawn randomly from the distribution pp, we need at least H[p]H[p] nuts to encode it.

  • Nut
    Nut is the equivalent of bit but when using a code with base ee rather than one with base 22.

    1nut=1log(2)1.44bit1 \, \text{nut} = \frac{1}{\log(2)} \approx 1.44 \, \text{bit}

  • H[p]2\frac{H[p]}{2} is often also called the binary entropy.


Cross-Entropy from pp to qq, denoted H(p,q)H(p,q), is the expected surprisal of an ovserver with subjective probalilities qq upon seeing data that was actually generated accroding to probalilities pp.

H(p,q)=jp(j)logq(j)H(p,q) = \sum_j -p(j) \log q(j)

Kullback-Leibler Divergance

Kullback-Leibler Divergance, KL Divergance, or Relative Entropy, is the most commom way to measure the distance between two distributions. Which is simply the differece between the cross-entropy and the entropy.

D(pq)=H(p,q)H[p]=jp(j)logp(j)q(j)D(p||q) = H(p,q) - H[p] = \sum_j p(j) \log \frac{p(j)}{q(j)}

文章作者: Sheey
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Sheey的小窝