Log sum inequality: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 16:32, 28 February 2022 editNempnet (talk \| contribs)Extended confirmed users13,155 edits Using first/last in Cover & Thomas citation, author doesn't work with sfn← Previous edit		Revision as of 23:44, 12 January 2023 edit undoJoshuaZ (talk \| contribs)Extended confirmed users, Pending changes reviewers, Rollbackers31,657 edits →Generalizations: Generalization due to Dannan, Neff, ThielNext edit →
Line 29:		Line 29:
	The inequality remains valid for <math>n=\infty</math> provided that <math>a<\infty</math> and <math>b<\infty</math>.{{cn\|date=July 2020}}		The inequality remains valid for <math>n=\infty</math> provided that <math>a<\infty</math> and <math>b<\infty</math>.{{cn\|date=July 2020}}
	The proof above holds for any function <math>g</math> such that <math>f(x)=xg(x)</math> is convex, such as all continuous non-decreasing functions. Generalizations to non-decreasing functions other than the logarithm is given in Csiszár, 2004.		The proof above holds for any function <math>g</math> such that <math>f(x)=xg(x)</math> is convex, such as all continuous non-decreasing functions. Generalizations to non-decreasing functions other than the logarithm is given in Csiszár, 2004.

			Another generalization is due to Dannan, Neff and Thiel, who showed that if <math>a_1, a_2 \cdots a_n</math> and <math>b_1, b_2 \cdots b_n</math> are positive real numbers with <math>a_1+ a_2 \cdots +a_n=a</math> and <math>b_1 + b_2 \cdots +b_n=b</math>, and <math>k \geq 0</math>, then <math>\sum_{i=1}^n a_i\left(\log\frac{a_i}{b_i} \right) \geq a\log \left(\frac{a}{b}+k\right)</math>. <ref>{{cite journal \|last1=F. M. Dannan, P. Neff, C. Thiel \|title=On the sum of squared logarithms inequality and related inequalities \|journal=Journal of Mathematical Inequalities \|date=2016 \|volume=10 \|issue=1 \|doi=10.7153/jmi-10-01 \|url=http://files.ele-math.com/articles/jmi-10-01.pdf \|access-date=12 January 2023}}</ref>

	==Applications==		==Applications==

Revision as of 23:44, 12 January 2023

The log sum inequality is used for proving theorems in information theory.

Statement

Let $a_{1},\ldots ,a_{n}$ and $b_{1},\ldots ,b_{n}$ be nonnegative numbers. Denote the sum of all $a_{i}$ s by $a$ and the sum of all $b_{i}$ s by $b$ . The log sum inequality states that

\sum _{i=1}^{n}a_{i}\log {\frac {a_{i}}{b_{i}}}\geq a\log {\frac {a}{b}},

with equality if and only if ${\frac {a_{i}}{b_{i}}}$ are equal for all $i$ , in other words $a_{i}=cb_{i}$ for all $i$ .

(Take $a_{i}\log {\frac {a_{i}}{b_{i}}}$ to be $0$ if $a_{i}=0$ and $\infty$ if $a_{i}>0,b_{i}=0$ . These are the limiting values obtained as the relevant number tends to $0$ .)

Proof

Notice that after setting $f(x)=x\log x$ we have

{\begin{aligned}\sum _{i=1}^{n}a_{i}\log {\frac {a_{i}}{b_{i}}}&{}=\sum _{i=1}^{n}b_{i}f\left({\frac {a_{i}}{b_{i}}}\right)=b\sum _{i=1}^{n}{\frac {b_{i}}{b}}f\left({\frac {a_{i}}{b_{i}}}\right)\\&{}\geq bf\left(\sum _{i=1}^{n}{\frac {b_{i}}{b}}{\frac {a_{i}}{b_{i}}}\right)=bf\left({\frac {1}{b}}\sum _{i=1}^{n}a_{i}\right)=bf\left({\frac {a}{b}}\right)\\&{}=a\log {\frac {a}{b}},\end{aligned}}

where the inequality follows from Jensen's inequality since ${\frac {b_{i}}{b}}\geq 0$ , $\sum _{i=1}^{n}{\frac {b_{i}}{b}}=1$ , and $f$ is convex.

Generalizations

The inequality remains valid for $n=\infty$ provided that $a<\infty$ and $b<\infty$ . The proof above holds for any function $g$ such that $f(x)=xg(x)$ is convex, such as all continuous non-decreasing functions. Generalizations to non-decreasing functions other than the logarithm is given in Csiszár, 2004.

Another generalization is due to Dannan, Neff and Thiel, who showed that if $a_{1},a_{2}\cdots a_{n}$ and $b_{1},b_{2}\cdots b_{n}$ are positive real numbers with $a_{1}+a_{2}\cdots +a_{n}=a$ and $b_{1}+b_{2}\cdots +b_{n}=b$ , and $k\geq 0$ , then $\sum _{i=1}^{n}a_{i}\left(\log {\frac {a_{i}}{b_{i}}}\right)\geq a\log \left({\frac {a}{b}}+k\right)$ .

Applications

The log sum inequality can be used to prove inequalities in information theory. Gibbs' inequality states that the Kullback-Leibler divergence is non-negative, and equal to zero precisely if its arguments are equal. One proof uses the log sum inequality.

Proof
Let $P=(p_{i})_{i\in \mathbb {N} }$ and $Q=(q_{i})_{i\in \mathbb {N} }$ be pmfs. In the log sum inequality, substitute $n=\infty$ , $a_{i}=p_{i}$ and $b_{i}=q_{i}$ to get $\mathbb {D} _{\mathrm {KL} }(P\\|Q)\equiv \sum _{i}p_{i}\log _{2}{\frac {p_{i}}{q_{i}}}\geq 1\log {\frac {1}{1}}=0$ with equality if and only if $p_{i}=q_{i}$ for all i (as both $P$ and $Q$ sum to 1).

Proof

Let

P=(p_{i})_{i\in \mathbb {N} }

and

Q=(q_{i})_{i\in \mathbb {N} }

be pmfs. In the log sum inequality, substitute

n=\infty

a_{i}=p_{i}

and

b_{i}=q_{i}

to get

\mathbb {D} _{\mathrm {KL} }(P\|Q)\equiv \sum _{i}p_{i}\log _{2}{\frac {p_{i}}{q_{i}}}\geq 1\log {\frac {1}{1}}=0

with equality if and only if $p_{i}=q_{i}$ for all i (as both $P$ and $Q$ sum to 1).

The inequality can also prove convexity of Kullback-Leibler divergence.

Notes

^ Cover & Thomas (1991), p. 29.
F. M. Dannan, P. Neff, C. Thiel (2016). "On the sum of squared logarithms inequality and related inequalities" (PDF). Journal of Mathematical Inequalities. 10 (1). doi:10.7153/jmi-10-01. Retrieved 12 January 2023.{{cite journal}}: CS1 maint: multiple names: authors list (link)
MacKay (2003), p. 34.
Cover & Thomas (1991), p. 30.

References

Cover, Thomas M.; Thomas, Joy A. (1991). Elements of Information Theory. Hoboken, New Jersey: Wiley. ISBN 978-0-471-24195-9.
Csiszár, I.; Shields, P. (2004). "Information Theory and Statistics: A Tutorial" (PDF). Foundations and Trends in Communications and Information Theory. 1 (4): 417–528. doi:10.1561/0100000004. Retrieved 2009-06-14.
T.S. Han, K. Kobayashi, Mathematics of information and coding. American Mathematical Society, 2001. ISBN 0-8218-0534-7.
Information Theory course materials, Utah State University . Retrieved on 2009-06-14.
MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.

Categories: