Representation Benefits of Deep Feedforward Networks.
Matus Telgarsky.
[arXiv]

There exist
classification problems where every shallow network needs exponentially
as many nodes to match the accuracy of certain deep or recurrent networks.
Convex Risk Minimization and Conditional Probability Estimation.
Matus Telgarsky, Miroslav DudÃk, Robert Schapire.
[arXiv]
[short video]
[poster]
 Conference on Learning Theory (COLT), 2015.
 Even when the parameter space is illbehaved (infinite dimensional, minima
don't exist, not bounded or regularized), risk minimization of certain standard
losses still converges to a unique object;
in the finite dimensional case, uniform convergence (generalization) holds for empirical risk minimization.
Momentbased Uniform Deviation Bounds for \(k\)means and Friends.
Matus Telgarsky, Sanjoy Dasgupta.
[pdf]
[arXiv]
[poster]
 Advances in Neural Information Processing Systems (NIPS), 2013.
 Generalization bounds for \(k\)means cost and
Gaussian mixture loglikelihood for unbounded parameter sets
when the data has a few bounded moments
(no boundedness or further modeling assumptions needed).
Margins, Shrinkage, and Boosting.
Matus Telgarsky.
[arXiv]
[video]
 International Conference on Machine Learning (ICML), 2013.
 AdaBoost, with a variety of losses, attains optimal margins
by simply multiplying the step size with a small constant.
Agglomerative Bregman Clustering.
Matus Telgarsky, Sanjoy Dasgupta.
[pdf]
[short video]
 International Conference on Machine Learning (ICML), 2012.
 Provides the natural algorithm,
with attention to: handling degenerate clusters via smoothing,
Bregman divergences for nondifferentiable convex functions,
exponential families without minimality assumptions.