Research papers

I work on algorithmic statistics, with a current focus on unsupervised learning.

Not every publication listed here is online -- some are under revision, or subject to copyright hassles -- but feel free to email me for copies.

 

  Active and semi-supervised learning

S. Dasgupta and D.J. Hsu. Hierarchical sampling for active learning.
Twenty-Fifth International Conference on Machine Learning (ICML), 2008.

S. Dasgupta, D.J. Hsu, and C. Monteleoni. A general agnostic active learning algorithm.
Neural Information Processing Systems (NIPS), 2007.

S. Dasgupta. Coarse sample complexity bounds for active learning.
Neural Information Processing Systems (NIPS), 2005.

S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning.
Eighteenth Annual Conference on Learning Theory (COLT), 2005.

S. Dasgupta. Analysis of a greedy active learning strategy.
Neural Information Processing Systems (NIPS), 2004.
Powerpoint slides

S. Dasgupta, M.L. Littman, and D. McAllester. PAC generalization bounds for co-training.
Neural Information Processing Systems (NIPS), 2001.

  Hierarchical clustering

D. Kauchak and S. Dasgupta. An incremental improvement procedure for hierarchical clustering.
Neural Information Processing Systems (NIPS), 2003.

S. Dasgupta and P.M. Long. Performance guarantees for hierarchical clustering.
Journal of Computer and System Sciences, 70(4):555-569, 2005.
An earlier version (COLT 2002), with a slightly different algorithm.
Powerpoint slides

  Dimensionality reduction and projection pursuit

S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds.
Fortieth ACM Symposium on Theory of Computing (STOC), 2008.

Y. Freund, S. Dasgupta, M. Kabra, and N. Verma. Learning the structure of manifolds using random projections.
Neural Information Processing Systems (NIPS), 2007.

S. Dasgupta, D.J. Hsu, and N. Verma. A concentration theorem for projections.
Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI), 2006.

S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss.
Random Structures and Algorithms, 22(1):60-65, 2003.

M. Collins, S. Dasgupta, and R.E. Schapire. A generalization of principal component analysis to the exponential family.
Neural Information Processing Systems (NIPS), 2001.

S. Dasgupta. Experiments with random projection.
Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2000.

  Mixture models

S. Dasgupta and L.J. Schulman. A two-round variant of EM for Gaussian mixtures.
Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2000.

S. Dasgupta. Learning mixtures of Gaussians.
Fortieth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 1999.

  Probabilistic nets

S. Dasgupta. Learning polytrees.
Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI), 1999.

S. Dasgupta. The sample complexity of learning fixed-structure Bayesian nets.
Machine Learning, 29(2-3):165-180, 1997.

  Other

L. Cayton and S. Dasgupta. A learning framework for nearest-neighbor search.
Neural Information Processing Systems (NIPS), 2007.

S. Dasgupta and D. Hsu. On-line estimation with the multivariate Gaussian distribution.
Twentieth Annual Conference on Learning Theory (COLT), 2007.

L. Cayton and S. Dasgupta. Robust Euclidean embedding.
Twenty-Third International Conference on Machine Learning (ICML), 2006.

T. Batu, S. Dasgupta, R. Kumar, and R. Rubinfeld. The complexity of approximating the entropy.
SIAM Journal on Computing, 35(1):132-150, 2005.
Earlier version: Thirty-Fourth Annual Symposium on Theory of Computing (STOC), 2002.

S. Dasgupta, W.S. Lee, and P.M. Long. A theoretical analysis of query selection for collaborative filtering.
Machine Learning, 51(3):283-298, 2003.

S. Dasgupta and P. M. Long. Boosting with diverse base classifiers.
Sixteenth Annual Conference on Learning Theory (COLT), 2003.

D. Precup, R.S. Sutton, and S. Dasgupta. Off-policy temporal-difference learning with function approximation.
Eighteenth International Conference on Machine Learning (ICML) , 2001.

S. Dasgupta. Learning probability distributions.
Ph.D. dissertation, University of California at Berkeley, 2000.