This is my personal list of resources related to machine learning systems. Feel free to drop me an email if you think there’s something worth mentioning. I will try to update this page frequently to include the most recent stuffs in mlsys.
- Deep Learning Systems: Algorithms and Implementation
- Machine Learning Compilation: offered by Tianqi Chen, intro to ML compiler. Open to all.
- 15-884: Machine Learning Systems: offered by Tianqi Chen at CMU.
- CSE 291F: Advanced Data Analytics and ML Systems: offered by Arun Kumar at UCSD.
- Tinyflow: tutorial code on how to build your own Deep Learning System in 2k Lines.
- CSE 599W: Systems for ML: offered by Tianqi Chen at UW.
- CS8803-SMR: Special Topics: Systems for Machine Learning: offered by Alexey Tumanov at Georgia Tech. [schedule]
- CS 744: Big Data Systems: offered by Aditya Akella back at UW-Madison.
- CS 329S: Machine Learning Systems Design: offered by Stanford.
- EECS 598: Systems for AI: offered by Mosharaf Chowdhury at UMich.
- CS 378: Big Data Systems: offered by Aditya Akella at UT.
- Machine Learning Systems (Fall 2019): from UCB
Labs & Faculties
- CMU Catalyst
- Berkeley RISE Lab
- MIT DASIL Lab
- UW SAMPL
- Shivaram Venkataraman at UW-Madison
- UTNS Lab
- Neeraja Yadwadkar at UT Austin
- SAIL: Systems for Artificial Intelligence Lab @ GT
- Amar Phanishayee at MSR
- Dive into Deep Learning: interactive deep learning book with code, math, and discussions.
- CS231n: Convolutional Neural Networks for Visual Recognition
- Pytorch-Internals: must-read for PyTorch basics.
- Differential Programming with JAX
- Getting started with JAX (MLPs, CNNs & RNNs)
- Physics-based Deep Learning
- Dive into Deep Learning Compiler
- Autodidax: JAX core from scratch: really really good resource for learning Jax internals.
- Extending JAX with custom C++ and CUDA code
- Vector, Matrix, and Tensor Derivatives
- ML Memory Optimization: slides from UW. Visualization of dataflow graph helps understand how to optimize memory.
Posts & Blogs
- How to Load PyTorch Models 340 Times Faster with Ray
- 金雪锋: MindSpore 技术负责人
- Why are ML Compilers so Hard?
- Alpa: Automated Model-Parallel Deep Learning
- Data Transfer Speed Comparison: Ray Plasma Store vs. S3
This section could potentially be extremely long..
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
- The OoO VLIW JIT Compiler for GPU Inference: JIT Compiler to enable better GPU multiplexing.
- LazyTensor: combining eager execution with domain-specific compilers: combining dynamic graph with JIT. summary
- Automatic Generation of High-Performance Quantized Machine Learning Kernels
- DietCode: Automatic Optimization for Dynamic Tensor Programs
- The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
- The Deep Learning Compiler: A Comprehensive Survey
Dynamic Neural Network
- Dynamic Neural Networks: A Survey
- Dynamic Multimodal Fusion
- Hash Layers For Large Sparse Models: Using hashing for MoE gating.
- An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems
Switch & ML
- Do Switches Dream of Machine Learning? Toward In-Network Classification
- Taurus: A Data Plane Architecture for Per-Packet ML