Definition of Generalization

Suppose $\mu$ and $\nu$ are two distributions on $\mathbb{R}^d$. We draw $m$ samples $\{x_i\}_{i=1}^m$ $i.i.d.$ from $\mu$ and $m$ samples $\{y_i\}_{i=1}^m$ $i.i.d.$ from $\nu$. Let $\mu^m$ and $\nu^m$ be the uniform distributions over $\{x_i\}_{i=1}^m$ and $\{y_i\}_{i=1}^m$, respectively. In other words, $\mu^m,\nu^m$ are the empirical distributions of $\mu$ and $\nu$. Then, a distance (divergence, or metric) between distributions $\textbf{dist}(\cdot,\cdot)$ generalizes with error $\epsilon$ if with high probability $|\textbf{dist}(\mu,\nu)-\textbf{dist}(\mu^m,\nu^m)|\leq\epsilon$.


We consider the case where both distributions $\mu$ and $\nu$ are identical to some distribution $q$. That is, $\mu=\nu=q$. In this case, for any probability metric $\textbf{dist}(\cdot,\cdot)$, we have $\textbf{dist}(\mu,\nu)=0$. We then bound $\textbf{dist}(\mu^m,\nu^m)$. Specifically, we try to provide lower bounds in the following format: for a large $m$, with at least some constant probability, $\textbf{dist}(\mu^m,\nu^m)$ is no less than some constant. Then, this metric does not generalize with $m$ samples.

Mathematical Tools

$\textbf{Definition}$. Suppose random variables $X$ and $Y$ are drawn $i.i.d.$ from $q$, and $Z=X-Y$. We define the anti-concentration probability w.r.t. the difference of two identical distributions to be \[q_{\epsilon}=\p{\|Z\|_2\geq\epsilon}\] There are several bounds for $q_{\epsilon}$ in various cases. By the multidimensional Chebyshev's inequality, if the covariance matrix of $q$ is $\kappa I$, then we have \[q_{\epsilon}\leq\frac{2\kappa d}{\epsilon^2}\] Besides this specific upper bound, we are also interested in bouding $q_{\epsilon}$ in other cases. Since there is no general and concise result on this anti-concentration inequality, we prove some simple lower bounds as well as upper bounds for various $q$'s in this section.

$\textbf{$q$ is Gaussian}$. Suppose $q$ is a Gaussian distribution: $q\sim\mathcal{N}(\mu,\Sigma)$. The lower bound of $q_{\epsilon}$ can be derived in the following way. Since $Z\sim\mathcal{N}(0,2\Sigma)$, we have \[ \begin{array}{rl} q_{\epsilon} &\displaystyle = 1-\int_{\|x\|_2\leq\epsilon}\frac{\exp\left(-\frac14x^{\top}\Sigma^{-1}x\right)}{(2\pi)^{\frac d2}\sqrt{\det 2\Sigma}}dx \\ &\displaystyle \geq1 - \frac{\textbf{Vol}(B_{\epsilon}(0))}{(2\pi)^{\frac d2}\sqrt{2^d \det \Sigma}}\\ &\displaystyle =1-\frac{\epsilon^d}{2^d\Gamma\left(\frac d2+1\right)\sqrt{\det\Sigma}} \end{array} \] The upper bound of $q_{\epsilon}$ can be derived in the following way. Since $x^{\top}Ax\leq\lambda_{\max}(A)\|x\|_2^2$ holds for any matrix $A$, \[ \begin{array}{rl} q_{\epsilon} &\displaystyle \leq 1 - \exp\left(-\frac{\epsilon^2}{4\lambda_{\min}(\Sigma)}\right)\frac{\textbf{Vol}(B_{\epsilon}(0))}{(2\pi)^{\frac d2}\sqrt{2^d \det \Sigma}}\\ &\displaystyle =1-\exp\left(-\frac{\epsilon^2}{4\lambda_{\min}(\Sigma)}\right)\frac{\epsilon^d}{2^d\Gamma\left(\frac d2+1\right)\sqrt{\det\Sigma}} \end{array} \] When $\Sigma$ is an identity matrix, we are able to obtain a more simplified form. If $d=1$, we have \[1-\frac{2\epsilon}{\sqrt{\pi}}\leq q_{\epsilon}\leq 1-\frac{2\epsilon}{\sqrt{\pi}}\exp\left(-\frac{\epsilon^2}{4}\right)\] If $d=2$, we have \[1-\frac{\epsilon^2}{4}\leq q_{\epsilon}\leq 1-\frac{\epsilon^2}{4}\exp\left(-\frac{\epsilon^2}{4}\right)\] In addition, some more compact bounds can be obtained through tail inequalities. By Chernoff bound, we have \[q_{\epsilon} = 1-\p{Z^{\top}Z\leq\epsilon^2} \leq1-\p{\left|\sum_{i=1}^d Z_i\right|\leq\epsilon} \leq \exp\left(-\frac{\epsilon^2}{4d}\right)\] Finally, since $\frac14 Z^{\top}Z\sim\chi^2(d)$, we have for $\alpha>1$, \[q_{2\sqrt{\alpha d}}\leq(\alpha\exp(1-\alpha))^{\frac d2}\]

$\textbf{$q$ is Uniform}$. Suppose $q$ is the uniform distribution over $U=[0,1]^d$. The lower bound of $q_{\epsilon}$ can be derived in the following way. \[ \begin{array}{rl} q_{\epsilon} &\displaystyle = 1-\int_U dy \int_{U\cap B_{\epsilon}(y)} dx \\ &\displaystyle \geq1 - \int_U dy \int_{B_{\epsilon}(y)} dx\\ &\displaystyle =1-\textbf{Vol} (B_{\epsilon}(0))\\ &\displaystyle =1-\frac{\pi^{\frac d2}\epsilon^d}{\Gamma\left(\frac d2+1\right)} \end{array} \] The upper bound of $q_{\epsilon}$ can be derived in the following way. \[ \begin{array}{rl} q_{\epsilon} &\displaystyle \leq 1-(1-2\epsilon)^d\textbf{Vol} (B_{\epsilon}(0))\\ &\displaystyle = 1-\frac{(1-2\epsilon)^d\pi^{\frac d2}\epsilon^d}{\Gamma\left(\frac d2+1\right)} \end{array} \] Similarly, if $d=1$, we have \[1-2\epsilon\leq q_{\epsilon}\leq 1-2\epsilon(1-2\epsilon)\] If $d=2$, we have \[1-\pi\epsilon^2\leq q_{\epsilon}\leq1-\pi\epsilon^2(1-2\epsilon)^2\]