Note of "Deep learning for image and video processing tutorial"

By Jon Shlens and George Toderici from Google Research @ 2017-01-20 Fri

  • History

    • Convolutional NN: old tech, why suddenly it works?

      • Scale: 60M parameters
  • At least 60M +1 data point to fit these parameters

  • SIMD hardware (GPU)

    • Domain transfer

      • Use trained CNN (with large data set) on some other applications with limited data set
  • CNN (convolutional neuron network)

    • Toy model of a neuron

      • Sum over weighted input + nonlinear activation function to output

      • Very little relationship with real neuron

    • Softmax classifier

    • Learning: cross-entropy loss (cost function)

      • Gradient descent with back-propagation

      • Optimization is HIGHLY non-convex, and works on O(1M) dimensions

    • Baseline task: MIST

      • Handwritten digits recoginition

      • Problem: input image size could blow up with more pixel

    • CNN arch fundamentals

      • Pre-choice parameter
  • Size of the filter

  • Stride: overlapping while sliding

  • Padding: what to do at the edge

  • Input depth / output depth (< 1024)

    • Learned by system
  • The parameter inside the filter

  • Advances in network arch

    • Types

      • AlexNet

      • Inception

      • BN-Inception

      • ResNet

    • 2 parts

      • Convolutional
  • Less parameters, more computational intense

    • Fully connected

    • Themes in inception

      • Network-in-network
  • Dimension reduction

    • Multi-scale

    • Covariate shifts

      • Batch normalization (BN)
  • Image embedding and captioning

    • Embedding vs classification

      • Embedding: out-of-box information
    • Language model

      • Not words, but a sequence of words
    • RNN (recurrent neuron networks)

      • State is a function of previous state and inputs

      • Training

  • Unrolling trick

    • Long short term memory (LSTM)
  • Popular RNN

  • Predicting in Pixel Space

    • Autoencoder: dimension reduction tool

      • Convolution encoder and decoder

      • Deconvolution: upconvolutions

  • Video

    • Hard

      • Computantional intensive

      • Dataset is hard