CAMAL: Optimizing LSM-trees via Active Learning

Abstract

We use machine learning to optimize the LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with the traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts the active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune the LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 30% on average and up to 9x compared to a state-of-the-art RocksDB design.

Publication
In Proceedings of SIGMOD Conference 2025
Yu Zihao
Yu Zihao
Ph.D Student