Deep learning with GPUs in production
Start-up: python -> enterprise: C/Java/Scala, more engineers, faster Research: quick result and prototyping
GPU? Data movement between GPU and CPU is important
[ ] fast.ai: class (high school math)
infrastructure: spark/flink scheduler problem distributed file system
Problems to think about when running works on GPU clusters memory is relatively small throughput, jobs are more than matrix math resource provisioning: how many resource we need? GPU/CPU/RAM GPU allocation per job Python <-> Java overhead, defeats the points of GPUs