The page you are reading is part of a draft (v2.0) of the "No bullshit guide to math and physics."
The text has since gone through many edits and is now available in print and electronic format. The current edition of the book is v4.0, which is a substantial improvement in terms of content and language (I hired a professional editor) from the draft version.
I'm leaving the old wiki content up for the time being, but I highly engourage you to check out the finished book. You can check out an extended preview here (PDF, 106 pages, 5MB).
The communication rates at which the tasks of network information theory can be derived by using the Packing lemma, which in turn depends on the Joint typicality lemma.
We derive the proof of these two results from first principles.
Let $(U,X,Y) \sim p_{UXY}(u,x,y)$. Let $p_{\tilde{U}\tilde{Y}}(\tilde{u}^n,\tilde{y}^n)$ be some probability distribution, not necessarily $p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n,\tilde{y}^n) = \prod_{i=1}^n p_{UY}(u_i,y_i)$, and let $(\tilde{U}^n, \tilde{Y}^n) \sim p_{\tilde{U}^n\tilde{Y}^n}$.
Let $X^n(m), m\in \mathcal{M}$, where $|\mathcal{M}| \leq 2^{nR}$, be random sequences, each distributed according to \[ \prod_{i=1}^n p_{X|U}(x_i|\tilde{u}_i). \] Assume that $X^n(m), m\in \mathcal{M}$ is pairwise conditionally independent of $\tilde{Y}^n$ given $\tilde{U}^n$, but is arbitrarily dependent on other $X^n(m)$ sequences.
Then, there exists $\delta(\epsilon) \to 0$ as $\epsilon \to 0$ such that \[ \text{Pr}\left\{ (\tilde{U}^n, X^n(m), \tilde{Y}^n) \in \mathcal{T}^{(n)}_\epsilon \text{ for some } m \in \mathcal{M} \right\} \to 0, \] as $n \to \infty$, if $R < I(X;Y|U) - \delta(\epsilon)$.
This corresponds to a situation where a superposition encoding is used to encode messages $\ell$ and $m$ into $X^n(l,m)$ using random codebooks \[ \{ u^n(\ell) \} \sim p_{U^n}(u^n), \qquad \{ x^n(\ell,m) \} \sim \prod_{i=1}^n p_{X|U}\!\left( x_i|u_i(\ell) \right), \] in which case we will have $p_{\tilde{U}^n\tilde{Y}^n}=p_{U^nY^n}$. The packing lemma, gives us a bound on the error associated with decoding the message $m$ (the superimposed message) given that we correctly decoded $\ell$ (the cloud centre).
Assume $U = \emptyset$. The random sequences $X^n(m), m \in \mathcal{M}$ represent different codewords. The $\tilde{Y}^n$ sequence represents the received sequence as a result of sending a codeword not in the set $\{X^n(m)\}$.
The lemma shows that under any probability mass function on $\tilde{Y}^n$ the probability that some codeword in $\mathcal{M}$ is jointly typical with $\tilde{Y}^n$ goes to zero as $n\to \infty$ if the rate of the code $R < I(X;Y)$.
Define the event:
\[ \mathcal{E}_m \equiv \{ (\tilde{U}^n, X^n(m), \tilde{Y}^n) \in \mathcal{T}^{(n)}_\epsilon(U,X,Y) \}, m \in \mathcal{M}. \]
By the union events bound, the probability of the event of interest can be bounded as \[ Pr\left\{ \cup_{m\in\mathcal{M}} \mathcal{E}_m \right\} \leq \sum_{m\in\mathcal{M}} Pr\left\{ \mathcal{E}_m \right\}. \]
Now consider \[ \begin{align} Pr\left\{ \mathcal{E}_m \right\} & = Pr\left\{ (\tilde{U}^n, X^n(m), \tilde{Y}^n) \in \mathcal{T}^{(n)}_\epsilon(U,X,Y) \right\} \nl & = \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) Pr\left\{ (\tilde{u}^n, X^n(m), \tilde{y}^n) \in \mathcal{T}^{(n)}_\epsilon(U,X,Y) \right\} \nl & = \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) \sum_{x^n \in \mathcal{T}^{(n)}_\epsilon(X|\tilde{u}^n,\tilde{y}^n)} p(x^n|\tilde{u}^n) \nl & \leq \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) \left| \mathcal{T}^{(n)}_\epsilon (X|\tilde{u}^n,\tilde{y}^n) \right| \cdot 2^{-n[H(X|U)- \delta^{\prime}(\epsilon)] } \nl & \leq \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) 2^{n[H(X|U,Y) -\delta^{\prime\prime}(\epsilon) ]} \cdot 2^{-n[H(X|U)- \delta^{\prime}(\epsilon) ] } \nl & = \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) 2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \nl & = 2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \sum_{ (\tilde{u}^n, \tilde{y}^n) \in \mathcal{T}^{(n)}_{\epsilon}(\tilde{U},\tilde{Y}) } p_{\tilde{U}^n\tilde{Y}^n}(\tilde{u}^n, \tilde{y}^n) \nl & \leq 2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \cdot 1. \end{align} \] The third equality follows from the way the $x^n$ is constructed: $X^n(m)\sim \prod_{i=1}^n p_{X|U}(x_i|\tilde{u}_i)$. The two inequalities follow from the properties of typical sequences (also known as the joint typicality lemma).
Hence \[ \begin{align} Pr\left\{ \cup_{m\in\mathcal{M}} \mathcal{E}_m \right\} &\leq \sum_{m\in\mathcal{M}} Pr\left\{ \mathcal{E}_m \right\} \nl &\leq |\mathcal{M}|2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \nl & \leq 2^{-n[ I(X;Y|U) - R - \delta(\epsilon) ]}. \end{align} \] Thus if we choose $R < I(X;Y|U) - \delta(\epsilon)$ the probability of error will tend to zero as $n\to \infty$.
Let $(U,X,Y) \sim p_{UXY}(u,x,y)$. Then there exists $\delta(\epsilon) \to 0$ as $\epsilon \to 0$ such that the following statements hold: If $(\tilde{U}^n,\tilde{Y}^n)$ is distributed according to an arbitrary pmf $p(\tilde{u}^n, \tilde{y}^n)$ and $\tilde{Z}^n \sim \prod_{i=1}^n p_{X|U}(\tilde{x}_i|\tilde{u}_i)$, conditionally independent of $\tilde{Y}^n$ given $\tilde{U}^n$, then \[ Pr\{ (\tilde{U}^n,\tilde{X}^n, \tilde{Y}^n) \in \mathcal{T}^{(n)}_\epsilon (U,X,Y) \} \leq 2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \]
Consider the following argument \[ \begin{align} Pr\{ (\tilde{U}^n,\tilde{X}^n, \tilde{Y}^n) \in \mathcal{T}^{(n)}_\epsilon (U,X,Y) \} & = \sum_{\tilde{x}^n \in \mathcal{T}^{(n)}_\epsilon (X|u^n,y^n) } p(\tilde{x}^n|\tilde{u}^n) \nl & \leq \left| \mathcal{T}^{(n)}_\epsilon (X|u^n,y^n) \right| \cdot 2^{-n[H(X|U)- \delta^{\prime}(\epsilon)] } \nl & \leq 2^{n[H(X|U,Y) -\delta^{\prime\prime}(\epsilon)] } \cdot 2^{-n[H(X|U)- \delta^{\prime}(\epsilon) ]} \nl & = 2^{-n[ I(X;Y|U) - \delta(\epsilon) ]} \end{align} \]
The packing lemma works for quantum states and superposition encoding.
Modification of packing lemma to two senders = simultaneous decoding.
Analogue – but because proof slightly different we can't just say “quantum packing lemma” – we have to look at the error analysis in each case.