Compact Data Representation of Data Streams for Cleaning and Knowledge Discovery
Hakim Qahtan
Date: 16:00 – 16:30, Thursday, 06.05.2021
Location: MS Teams ICS Colloquium
Title: Compact Data Representation of Data Streams for Cleaning and Knowledge Discovery
Abstract: In many organizations, a vast amount of data is collected every day. This data is collected in the form of data streams such as logs, continuous measurements, or tables of relational databases. Processing the data to extract meaningful information is a challenging task. Therefore, constructing a compact representation of the data becomes crucial for better data understanding and analysis. In this talk, I will describe the extraction process of a compact representation for numerical data streams by estimating their probability density function (PDF). This compact representation has applications in data stream mining and data cleaning. I will focus on two main applications; (i) outlier (anomaly) detection and how it can be used for detecting potential market manipulation in cryptocurrncy markets; (ii) change detection in data streams where the distribution of the current data values differs from a reference data distribution of data values that arrived earlier in the stream.