A long line of literature has produced algorithms which have smaller and smaller separation requirements. Typically, separation is measured by the minimum distance between the means of any pairs of distributions in the mixture, as a function of a standard deviation. The parameters here are d, the dimensionality of the data, σ, the maximum directional standard deviation, and σ* the maximum directional standard deviation in the subspace containing the means.

The table below summarizes some of the bounds.

Reference | Separation | Comments | Algorithm |

[Das99] | σ d^{1/2} |
Spherical Gaussians | Random Projections |

[DS00] | σ d^{1/4} |
Bounded Eccentricity | EM |

[AK01] | σ d^{1/4} |
Includes other conditions, like concentric Gaussians | Distance thresholding |

[VW02] | σ k^{1/4} |
Spherical Gaussians | PCA Projections |

[KSV05, AM05] | σ/w_{min}^{1/2} |
PCA Projections | |

[CR08] | σ* k^{1/2} |
Product Distributions | Canonical Correlations |

[BV08] | Low Fischer discriminant values | Isotropic PCA |

For a mixture of two product distributions, Freund and Mansour[FM99]
provide an algorithm, with running time polynomial in n, 1/ε.
For a mixture of k product distributions, Feldman, O'Donnell and Servedio [FOS05]
provides an algorithm which produces with a running-time of
d^{O(k3/ε3)}.

[SSR06] experimentally studies the sample requirement of EM for
learning Gaussian mixtures. They conjecture that learning mixtures
has three phases, depending on the number of
samples : with less than about Ω(d/s^{4}) samples, it
is information-theoretically hard; with more than about
Ω(d/s^{2})
samples, computationally easy, and in between, computationally hard,
but easy in an information-theoretic sense.

Also, [AM05] and [KSV05] show that their results apply to mixtures of a slightly broader class of distributions than Gaussians, including mixtures of log-concave distributions.

[DS00] shows a two-round variant of EM that produces the correct clustering
when the separation between the Gaussians is O(σ d^{1/4}).
[RW84] analyze the likelihood surface around the optimal solution, and show
that EM has first-order convergence; [XJ96] characterizes some conditions
under which EM has second-order convergence.

- [RW84] R. Redner, H. Walker, "Mixture densities, maximum likelihood and the EM algorithm", SIAM Review, 1984.
- [L96] B.G. Lindsey, "Mixture Models: Theory, Geometry and Applications", IMS Publishers, 1996.
- [XJ96] L. Xu, M.I. Jordan, "On Convergence Properties of the EM Algorithm for Gaussian Mixtures", Neural Computation, 1996.
- [Das99] S. Dasgupta, "Learning Mixtures of Gaussians", Proc. of Foundations of Comp. Sci, 1999.
- [FM99] Y. Freund, Y. Mansour, "Estimating a Mixture of Two Product Distributions", In Proc. of Conf. on Learning Theory, 1999.
- [DS00] S. Dasgupta, L. Schulman, "A two-round variant of EM for Gaussian Mixtures", Proc. of Uncertainty in Artificial Intelligence, 2000.
- [AK01] S. Arora, R. Kannan, "Learning Mixtures of Arbitrary Gaussians", In Proc. of Symp. on Theory of Computing, 2001.
- [VW02] S. Vempala, G. Wang, "A Spectral Algorithm for Learning Mixtures of Distributions", In Proc. of Foundations of Comp. Sci, 2002.
- [FOS05] J. Feldman, R. O'Donnell, R. Servedio, "Learning Mixtures of Product Distributions over Discrete Domains", In Proc. of Foundations of Comp. Sci, 2005.
- [KSV05] R. Kannan, H. Salmasian, S. Vempala, "The Spectral Method for General Mixture Models", In Proc. of Conf. on Learning Theory, 2005.
- [AM05] D. Achlioptas, F. McSherry, "On Spectral Learning of Mixtures of Distributions", In Proc. of Conf. on Learning Theory, 2005.
- [DHKS05] A. Dasgupta, J. Hopcroft, J. Kleinberg, M. Sandler, "On Learning Mixtures of Heavy-Tailed Distributions", In Proc. of Foundations of Comp. Sci, 2005.
- [SSR06] N. Srebro, G. Shakranovich, S. Roweis, "An Investigation of the Computational and Informational Limits in Gaussian Mixture Clustering", In Proc. of Intl. Conf. on Machine Learning, 2006.
- [CHRZ07] K. Chaudhuri, E. Halperin, S. Rao, S. Zhou, "A Rigorous Analysis of Population Stratification with Limited Data", In Proc. of Symp. on Discrete Algorithms, 2007.
- [Cha07] K. Chaudhuri, "Learning Mixtures of Distributions", Ph.D Thesis,EECS Department, Univ. of California, Berkeley. Department
- [BCFZ07] A. Blum, A. Coja-Oghlan, A. Frieze, S. Zhou, "Separating Populations with Wide Data: A Spectral Analysis", In Proc. of ISAAC, 2007.
- [CR08] K. Chaudhuri, S. Rao, "Learning Mixtures of Product Distributions using Correlations and Independence", In Proc. of Conf. on Learning Theory, 2008.
- [CR08b] K. Chaudhuri, S. Rao, "Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions", In Proc. of Conf. on Learning Theory, 2008.
- [BV08] S.C. Brubaker, S. Vempala, "Isotropic PCA and Affine-Invariant Clustering", In Foundations of Comp. Sci, 2008.