Database Systems and Information Management

DIMA Researchers at SIGMOD 2022 in Philadelphia, Pennsylvania

Researchers in the Database Systems and Information Management (DIMA) Group at TU Berlin presented four research papers at the International Conference on Management of Data (SIGMOD 2022), held in Philadelphia, Pennsylvania from June 12th to the 17th. In addition, they also offered talks at Harvard University, MIT, Boston University, and Columbia University as well as Google, Microsoft, NVIDIA, and Oracle.
 

SIGMOD 2022

DIMA researchers contributed four research papers to the leading international conference on the management of data, SIGMOD 2022, a CORE A* conference (i.e., a flagship conference leading in data management). Each of these papers seeks to optimize performance in systems, thereby shortening processing time and accelerating the analysis of large-scale data across a myriad of applications. Furthermore, Prof. Volker Markl offered a keynote on “NebulaStream: Data Management for the Internet of Things” at BiDEDE 2022, the International Workshop on Big Data in Emergent Distributed Environments.

In their paper “Rethinking Stateful Stream Processing with RDMA,” Bonaventura Del Monte et al. focus on using high-speed networks to accelerate stream processing engines. The biggest challenge are the real-time constraints and state consistency guarantees. To this end, they propose Slash, a novel stream processing engine that uses high-speed networks and RDMA to execute distributed streaming computations efficiently. Slash embraces a processing model suited for RDMA acceleration and omits expensive data pre-partitioning. Overall, Slash improves throughput up to two orders of magnitude over existing systems deployed on an InfiniBand network. Moreover, it is up to 22x faster than a self-developed solution that relies on RDMA-based data pre-partitioning to scale out query processing.

In the paper “Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects,” Clemens Lutz et al. propose a new join algorithm that takes advantage of fast interconnects, in order to scale to large data volumes on modern GPUs. Fast interconnects, such as NVLink 2.0 are a new technology that connects GPUs to main memory at a high bandwidth. By exploiting interconnects, joins are able to spill their state, which leads GPU-enabled DBMSs to scale beyond GPU memory capacity.

In the paper “NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access,” Alexander Renz-Wieland et al. introduce NuPS, a novel parameter server architecture that: (i) integrates multiple management techniques and employs a suitable technique for each parameter, and (ii) supports sampling directly via suitable primitives and schemes that allow for a controlled quality–efficiency trade-off. As a result, NuPS outperforms existing parameter servers by up to one order of magnitude and provides linear scalability across multiple machine learning tasks.

In their paper “Materialization and Reuse Optimizations for Production Data Science Pipelines,” Behrouz Derakhshan et al. propose a system for optimizing the training of machine learning (ML) pipelines via materialization and reuse. This paper formulates the problem of materialization and reuse of the artifacts in ML pipelines and devises a unified cost model across different types of ML artifacts.

The Publications in Detail:

Full Research Papers:

  1. Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, and Volker Markl. “Rethinking Stateful Stream Processing with RDMA.” Proceedings of the 2022 International Conference on Management of Data. 2022. [PDF].
  2. Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. “Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects.” Proceedings of the 2022 International Conference on Management of Data. 2022. [PDF].
  3. Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and Volker Markl. “NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access.” Proceedings of the 2022 International Conference on Management of Data. 2022 [PDF].
  4. Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Zoi Kaoudi, Tilmann Rabl, and Volker Markl. “Materialization and Reuse Optimizations for Production Data Science Pipelines.” Proceedings of the 2022 International Conference on Management of Data. 2022. [PDF]