Many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data…mehr
Many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data Stream Management Systems (DSMSs) and Streaming Data Warehouses (SDWs). A traditional database management system typically processes a stream of ad-hoc queries over relatively static data. In contrast, a DSMS evaluates static (long-running) queries on streaming data, making a single pass over the data and using limited working memory. In the first part of this lecture, we will discuss researchproblems in DSMSs, such as continuous query languages, non-blocking query operators that continually react to new data, and continuous query optimization. The second part covers SDWs, which combine the real-time response of a DSMS by loading new data as soon as they arrive with a data warehouse's ability to manage Terabytes of historical data on secondary storage. Table of Contents: Introduction / Data Stream Management Systems / Streaming Data Warehouses / Conclusions
Lukasz Golab is an Associate Professor at the University of Waterloo and a Canada Research Chair. Prior to joining Waterloo, he was a Senior Member of Research Staff at AT&T Labs in Florham Park, NJ, USA. He holds a B.Sc. in Computer Science (with High Distinction) from the University of Toronto and a Ph.D. in Computer Science (with Alumni Gold Medal) from the University of Waterloo. His publications span several research areas within data management and data analytics, including data stream management, data profiling, data quality, data science for social good, and educational data mining. M. Tamer Özsu is a professor of computer science and University Research Chair at the University of Waterloo. Between January 2007 and July 2010, he was the Director of the David R. Cheriton School of Computer Science. Prior to his current position, he was with the Department of Computing Science of the University of Alberta between 1984 and 2000. He holds a PhD from Ohio State University. Dr. Özsu's current research covers distributed data management, focusing on data stream systems, distributed XML processing, and data integration, and multimedia data management. He is a Fellow of ACM, a Senior Member of IEEE, and a member of Sigma Xi; he is the recipient of 2006 ACM SIGMOD Contributions Award and the 2008 Ohio State University College of Engineering Distinguished Alumnus Award.
Inhaltsangabe
Introduction.- Data Stream Management Systems.- Streaming Data Warehouses.- Conclusions.