Incremental Techniques for Large-Scale Dynamic Query Processing

Half-day tutorial at CIKM 2018- Friday, 26 October 2018

Iman Elghandour

Université Libre de Bruxelles, Belgium

Iman Elghandour is an assistant professor at Alexandria University (Alexandria, Egypt) and is currently a visiting researcher at Université Libre de Bruxelles (Brussels, Belgium). Her research interests are in the area of database management with a current focus on distributed databases, distributed platforms for big data analytics, and query optimization and scheduling in heterogeneous clusters. She has several publications in top venues, such as the VLDB journal, ACM SIGMOD, VLDB, and ICDE. She is also a co-author of one US patent. She received a PhD degree from the University of Waterloo, Canada. During her PhD, she has been on internships at IBM Almaden Research Center and IBM Toronto Lab, and she has received several awards including IBM PhD fellowship, Pat Selinger IBM PhD Fellowship, and David R. Cheriton Graduate Scholarship in Computer Science.

Ahmet Kara

University of Oxford, United Kingdom

Ahmet Kara is a postdoctoral researcher at the Computer Science Department of the University of Oxford. His research interests are logics on structures with data values, verification of in infinite state systems and database theory. His current focus is on succinct representations of query results and worst-case optimal algorithms for incremental query processing. He has publications in conferences like ICDT, FSTTCS and MFCS. He received his PhD degree from the University of Dortmund.

Dan Olteanu

University of Oxford, United Kingdom

Dan Olteanu is Professor of Computer Science at the University of Oxford, and also holds affiliations as Faculty Fellow at Alan Turing Institute in London, and Computer Scientist at RelationalAI in Berkeley. He holds a PhD from the University of Munich. He spends his time understanding hard computational challenges and designing simple and scalable solutions towards these challenges. He has published over 70 papers in the areas of database systems, AI, and theoretical computer science, contributing to XML query processing, incomplete information and probabilistic databases, factorized databases, scalable and incremental in-database optimization, and the commercial systems LogicBlox and RelationalAI. He co-authored the book "Probabilistic Databases" (2011). He received the following awards: ACM SIGMOD Distinguished PC Member (2018), ERC Consolidator Grant (2016), and Oxford Outstanding Teaching Award (2009). He has served as member of over 60 programme committees, as associate editor for PVLDB’13 and IEEE TKDE, track chair for IEEE ICDE’15, group leader for ACM SIGMOD’15, vice chair for ACM SIGMOD’17, and co-chair for AMW’18. He is currently serving as associate editor for ACM TODS.

Stijn Vansummeren

Université Libre de Bruxelles, Belgium

Stijn Vansummeren is Assistant Professor at the Université Libre de Bruxelles in Brussels, Belgium where he is co-director of the Laboratory for Web & Information Systems. He holds a joint PhD from Hasselt University and the Transnational University of Limburg, Belgium. His research interests lie in the wide area of data and information management. He has published over 50 papers in top-ranked journals such as the Journal of the ACM, ACM Transactions on Database Systems, ACM Transactions on Computational Logic, ACM Transactions on the Web as well as highly competitive conferences such as SIGMOD, VLDB, WWW, PODS, and ICDT. He is currently scientific coordinator of SPICES, a Brussels-funded project concerning event processing for computer security. Stijn has been serving as a member of the programme committees of several international conferences including WWW, EDBT, PODS, and ICDT.

Abstract

Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years. Traditional approaches rely on updating the results of a query when updates are streamed rather than re-computing these queries, and therefore, higher execution performance is expected. However, they do not perform well for large databases that are updated at high frequencies. Therefore, new algorithms and approaches have been proposed in the literature to address these challenges by, for instance, reducing the complexity of processing updates. Moreover, many of these algorithms are now leveraging distributed streaming platforms such as Spark Streaming and Flink. In this tutorial, we briefly discuss legacy approaches for incremental query processing, and then give an overview of the new challenges introduced due to processing big data streams. We then discuss in detail the recently proposed algorithms that address some of these challenges. We emphasize the characteristics and algorithmic analysis of various proposed approaches and conclude by discussing future research directions.

Detailed Outline

Section

Topics

Introduction

What is dynamic query processing?
Applications employing dynamic query processing
Tutorial overview and focus

Main Algorithmic Ideas in Dynamic Query Processing: Traditional IVM and Recent Advances

Idea 1: computation of delta queries
Traditional IVM
Idea 2: Materialization of subresults
Higher order IVM
Dynamic Yannakakis
Factorized IVM
Complexity lower bounds
Idea 3: dealing with data skew
Worst-case optimal dynamic query processing
Space-time tradeoffs

Unified Treatment of Applications Based on Ring Computation

Aggregates over joins
Gradient computation for linear regression models
Relational data rings

Dynamic Query Processing in Big Data Frameworks

Incremental processing in MapReduce
Distributed streaming platforms
Custom distributed dynamic processing platforms

Outlook

Summary
Open problems

Link to External Resources

Tutorial resources