Doktorsavhandling

Huaifeng Zhang, Dator- och nätverkssystem

Debloating Machine Learning Systems

Översikt

In recent decades, Machine Learning (ML) has rapidly evolved from an academic pursuit into a cornerstone of modern industries, with applications spanning manufacturing, healthcare, finance, transportation, etc.. The emergence of large language models (LLMs) has further accelerated this growth, driving unprecedented demand for ML systems capable of supporting models of widely varying scales and data modalities. Despite the growing importance of ML systems, the problem of software bloat within them remains underexplored. Software bloat usually refers to unnecessary code, files, or dependencies. It degrades performance, increases resource consumption, and introduces security risks. With the rise of containerized applications as a common deployment method for ML, the problem is further exacerbated: containers must package not only applications but also all associated libraries and dependencies, leading to significant overhead. This thesis investigates bloat in ML systems across multiple layers, from ML containers to ML shared libraries. This thesis introduces novel techniques to measure, analyze, and mitigate bloat in ML systems. The main contributions are: (A) MMLB: a framework for measuring ML bloat that quantifies container-level bloat and identifies its causes, showing that ML containers are substantially more bloated than general-purpose containers and ML shared libraries are the major source of bloat. (B) BLAFS: a bloat-aware file system that efficiently and effectively reduces file bloat in ML containers by detecting and removing unused files. (C) RTrace: a tracer that accurately identifies functions executed in shared libraries, improving the visibility of shared library execution and enabling precise detection of host-code bloat. (D) MERGESHUFFLE: a tool that removes unused code from shared libraries while preserving functionality and improving performance and security. (E) Negative-ML: This work reveals that ML shared libraries are different from generic libraries: while the latter contain only host code, ML shared libraries include both host code and device code, the latter targeting GPUs and significantly contributing to their size. Negative-ML presents a holistic debloating approach that targets both host and device code, representing the first systematic investigation of device-code bloat. This thesis thus offers both a systematic understanding of software bloat in ML systems and practical techniques to mitigate it, contributing to more efficient, secure, and sustainable deployment of ML in real-world environments.