SHEPHERD: Serving DNNs in the Wild

Paper link: SHEPHERD: Serving DNNs in the Wild

Achieving scalability, high system goodput and maximize resource utilization, at the same time is hard for an inference system.

while individual request streams can be highly unpredictable, aggregating request streams into moderately-sized groups greatly improves predictability, permitting high resource utilization as well as scalability

SHEPHERD’s main observation is