Some interesting papers/resources for deep unsupervised learning


(This is just a starting point; feel free to suggest other papers.)

Autoregressive models
Karpathy. The unreasonable effectiveness of recurrent neural networks.
Graves. Generating sequences with recurrent neural networks.
Goldberg. The unreasonable effectiveness of character-level language models.
Weiss, Goldberg, Yahav. On the practical computational power of finite precision RNNs for language recognition.
Olah. Understanding LSTM networks.
Bengio, Simard, Frasconi. Learning long-term dependencies with gradient descent is difficult.
Hochreiter and Schmidhuber. Long short-term memory.
Weiss, Goldberg, Yahav. Extracting automata from recurrent neural networks using queries and counterexamples.
Vaswani et al. Attention is all you need.

Embeddings
Collobert, Weston, Bottou, Karlen, Kavukcuoglu, Kuksa. Natural language processing almost from scratch.
Mikolov, Sutskever, Chen, Corrado, Dean. Distributed representations of words and phrases and their compositionality.
Pennington, Socher, Manning. GloVe: Global vectors for word representation.
Levy, Goldberg. Neural word embedding as implicit matrix factorization.
Levy, Goldberg. Linguistic regularities in sparse and explicit word representations.
Arora, Li, Liang, Ma, Risteski. Rand-Walk: A latent variable model approach to word embeddings.
Arora, Liang, Ma. A simple but tough-to-beat baseline for sentence embeddings.
Devlin, Chang, Lee, Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding.
Peters, Neumann, Iyyer, Gardner, Clark, Lee, Zettlemoyer. Deep contextualized word representations.

Auto-encoders
Hinton, Salakhutdinov. Reducing the dimensionality of data with neural networks.
Blei, Kucukelbir, McAuliffe. Variational inference: A review for statisticians.
Kingma, Welling. Auto-encoding variational Bayes.
Dinh, Krueger, Bengio. NICE: Non-linear independent components estimation.
Arora, Bhaskara, Ge, Ma. Provable bounds for learning some deep representations.

GANs
Goodfellow et al. Generative adversarial nets.
Radford, Metz, Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks.
Lucic, Kurach, Michalski, Gelly, Bousquet. Are GANs created equal? A large-scale study.
Richardson, Weiss. On GANs and GMMs.
Arora, Ge, Liang, Ma, Zhang. Generalization and equilibrium in generative adversarial nets.
Liu, Bousquet, Chaudhuri. Approximation and convergence properties of generative adversarial learning.

Self-supervised learning
Ando, Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data.
Jing, Tian. Self-supervised visual feature learning with deep neural networks: a survey.