COGS 200, Spring 2012
Back Propagation, 25 years later

UNIVERSITY OF CALIFORNIA, SAN DIEGO

(Section ID: 744768)


Fridays 3:00-4:30 PM Cognitive Science Building 003
Student section: Fridays 2:00-2:50 PM
Cognitive Science Building 003
Course mailing list: cs200@cogsci.ucsd.edu
[how to subscribe/unsubscribe]

Professor Gary Cottrell


Course Description:

It has been roughly 25 years since the publication of "Learning representations by back-propagating errors" by Rumelhart, Hinton and Williams in the journal Nature in 1986, and the corresponding Chapter 8 of the Parallel Distributed Processing books. This seems like a good time to revisit it. Why did this paper cause a paradigmatic earthquake in the AI and Cognitive Science communities? What has happened since? What are the latest directions in neural network research employing back propagation and similar algorithms? What's all this fuss about "deep networks"?

As I wrote in my 2006 commentary, "New Life for Neural Networks,"  on Hinton and Salakhutdinov's paper in Science, there are reasons now to be interested in neural networks again. They are achieving state of the art results in a number of domains in machine learning and computer vision. New approaches have allowed the construction and training of networks with many layers of processing - as we believe occurs in the brain. Biologically plausible algorithms - some of which nearly exactly mirror back propagation, others with only a family resemblance - have muted complaints about back propagation's biological implausibility.

In this series of talks, we will hear from some of the researchers that are pushing the boundaries of what can be accomplished with neural networks. In the first talk on Friday, I will give a gentle introduction to back propagation, why it is important, and why it is still important. In the following weeks, we will hear from researchers across the country who will bring us up to date on the most recent advances.

Current schedule:
 

DATE SPEAKER
TITLE (if abstract is available, click on title)
Reading (you should read this before the lecture)
April 13 Gary Cottrell
UCSD 
Introduction: Why was backpropagation important, and why is it still important? [ppt]
Here is a video of Dave Rumelhart giving a very similar lecture.

  • Rumelhart, D.E., Hinton, G. E., & Williams, R.J. (1986) Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, D.E. Rumelhart, J.L. Mclelland, and the PDP Group. pdf
  • Tong, M.H., Joyce, C.A., and Cottrell, G.W. (2008) Why is the fusiform face area recruited for novel categories of expertise? A neurocomputational investigation Brain Research 1202:14-24. pdf

April 20 Andrew Ng
Stanford
Machine learning and AI via large scale brain simulations.
  • Olshausen, B.A. & Field, D.J. (1996) Emergence of simple cell properties by learning sparse codes of natural images. Nature 381:607-609. pdf
  • Rajat Raina Alexis Battle Honglak Lee Benjamin Packer Andrew Y. Ng (1997) Self-taught Learning: Transfer Learning from Unlabeled Data. In International Conference on Machine Learning. pdf
  • PLUS!: Andrew's tutorial on deep networks and relevant papers is here.
April 27  Yoshua Bengio
U. Montreal
From Deep Learning to Cultural Evolution
  • Yoshua Bengio (2012) Evolving Culture vs Local Minima. arXiv:1203.2990v1 [cs.LG] 14 Mar 2012 pdf
  • Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent,  Samy Bengio (2010) Why Does Unsupervised Pre-training Help Deep Learning? Journal of Machine Learning Research 11:625-660 pdf

May  04 Yann LeCun
Courant Institute,
NYU
Learning Representations for Perception
Yann sent a link to a website with many, many papers. I picked two:
  • Yann LeCun, Koray Kavukcuoglu and ClĂ©ment Farabet: Convolutional Networks and Applications in Vision, Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010 pdf
  • Kevin Jarrett, Koray Kavukcuoglu, Marc'Aurelio Ranzato and Yann LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009 pdf
And for your reference, here is a much longer paper, with theorems, Yann calls the "deep learning manifesto":
  • Yoshua Bengio and Yann LeCun: Scaling learning algorithms towards AI, in Bottou, L. and Chapelle, O. and DeCoste, D. and Weston, J. (Eds), Large-Scale Kernel Machines, MIT Press, 2007. pdf
And here is the link to all of his papers:
  • http://yann.lecun.com/exdb/publis/index.html#lecun-iscas-10
May  11 Ruslan Salakhutdinov
University of Toronto 
Learning Hierarchical Models
  • Ruslan Salakhutdinov, Josh Tenenbaum & Antonio Torralba (2012) Learning to Learn with Compound Hierarchical-Deep Models.  Neural Information Processing Systems (NIPS 25). pdf
  • Ruslan Salakhutdinov and Geoffrey Hinton (2009) Deep Boltzmann Machines. In 12th International Conference on Artificial Intelligence and Statistics. pdf
May  18  Graham Taylor
NYU
Learning Representations of Sequences
  • Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis (2011) Two Distributed-State Models For Generating High-Dimensional Time Series. Journal of Machine Learning Research 12:1025-1068 pdf
May  25 Geoff Hinton
University of Toronto 
Does the brain do inverse graphics? [pdf]
  • Hinton, G. E., Krizhevsky, A. and Wang, S. (2011) Transforming Auto-encoders. ICANN-11: International Conference on Artificial Neural Networks, Helsinki. [pdf]
June 01 Randy O'Reilly
CU Boulder
The biological basis of multilayer error-driven learning. [ppt]
  • O'Reilly, R.C. & Munakata, Y., Frank, M.J., Hazy, T.E., and contributors (2012) Learning In O'Reilly, R. C., Munakata, Y., Frank, M. J., Hazy, T. E., and Contributors (2012). Computational Cognitive Neuroscience. Wiki Book, 1st Edition. URL: http://ccnbook.colorado.edu
  • O'Reilly, R.C. (1996). Biologically Plausible Error-driven Learning using Local Activation Differences: The Generalized Recirculation Algorithm. Neural Computation, 8:895-938.   pdf
June 08
Gary Cottrell
UCSD

A hierarchical model of early visual cortex.
  • Shan, Honghao, Zhang, Lingyun and Garrison W. Cottrell (2007) Recursive ICA. In Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA. [pdf]
  • Shan, H. and Cottrell, G.W. (2008) Looking around the back yard helps the recognition of faces and digits. In Computer Vision and Pattern Recognition (CVPR 2008). [pdf]



Course goal: Students will develop an appreciation for back propagation and more recent developments in neural networks. They will know where to look to learn more.

Prerequisites: Familiarity with mathematical concepts and notation (e.g., linear algebra, vector calculus, probability and statistics) will be helpful for a complete appreciation of the presentations, but we will try to keep the equations to a minimum.

Evaluation: Evaluation is based on two components:
1) Students are expected to read the papers assigned for each lecture, and be able to extract the main message. We will discuss the papers before the lecture, where I will ask the question, "What is the point of this paper?" several times, and then we will get into the details to the extent possible given the limited time. I use the dreaded "index card" method, where I put every student's name on an index card, shuffle them, and then ask these questions in order of the cards.
2) A final paper of about 5-10 pages that suggests a new application of these techniques to a problem of your choice (e.g., in cognitive modeling), a new extension of these techniques, or a critique of one of the approaches described by the one of the speakers.

This course should be taken for S/U grade only.
If your department requires a letter grade, or you have some other reason why you need a letter grade, please let me know.

The course instructor is Professor Gary Cottrell, whose office is CSE Building room 4130.  Feel free to send email to arrange an appointment, or telephone (858) 534-6640.


Most recently updated on April 16th, 2012 by Gary Cottrell, gary@ucsd.edu