All tutorials will take place on November 5, 2001. Tutorials T1 and T2 will be in the morning (8am - 12noon); T3 and T4 will be in the afternoon (1pm - 5pm).
T2: Indexing and Mining Time Series Data
by Eamonn Keogh, University of California, Irvine, USA
T4: Mining Unstructured Data
by Ronen Feldman, ClearForest Corporation, USA
Time series data accounts for a large fraction of the data stored in financial, medical and scientific databases.
Recently there has been an explosion of interest in data mining time series, with researchers attempting to index,
cluster, classify and mine association rules from increasing massive sources of data. In this tutorial, I will
give a complete overview of the state of the art in time series data mining. I will explain why the unique structure
of time series presents difficulties for classic data mining algorithms, and how this structure can potentially
be leveraged off.
As with any computer science problem, representation is the key. I will therefore review every representation proposed for mining time series, including wavelets, fourier transforms, singular value decomposition, piecewise polynomial models and symbolic mappings. I will present detailed and extensive empirical comparisons of how these representations perform on a variety of data mining tasks.
About the Instructor:
Dr Keogh obtained his Ph.D. from the University of California, Irvine in 2001. His thesis is entitled "Similarity Search in Massive Time series Databases". Dr Keogh has published more than a dozen papers on mining time series data, and has won three best paper awards for his work (including SIGMOD 2001). His research interests include Machine Learning, Data Mining, Multimedia Indexing and Information Retrieval. Beginning fall 2001, he will be on the faculty at University of California, Riverside.
The information age has made it easy to store large amounts of data. The proliferation of documents available on the Web, on corporate intranets, on news wires, and elsewhere is overwhelming. However, while the amount of data available to us is constantly increasing, our ability to absorb and process this information remains constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few key strokes. Text Mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, NLP, IR and knowledge management. Text Mining involves the preprocessing of document collections (text categorization, term extraction), the storage of the intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules etc) and visualization of the results. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. Special emphasis will be given to efficient algorithms for very large document collections, tools for visualizing such document collections, the use of intelligent agents to perform text mining on the internet, and the use information extraction to better capture the major themes of the documents. The Tutorial will cover the state of the art in this rapidly growing area of research. Several real world applications of text mining will be presented.
About the Instructor:
Ronen Feldman is a senior lecturer at the Mathematics and Computer Science Department of Bar-Ilan University in Israel, and the Director of the Data Mining Laboratory. He received his B.Sc. in Math, Physics and Computer Science from the Hebrew University, M.Sc. in Computer Science from Bar-Ilan University, and his Ph.D. in Computer Science from Cornell University in NY. He served on the program committees of KDD'97, ECAI'98, AAAI'98, KDD'98, SIGIR'99, KDD'99, AAAI'2000, KDD'2000, IAAI'2001, KDD'2001 and co-organized the IJCAI'99 workshop on Text Mining, and the SIGIR'2001 workshop on operational Text Categorization. He is the founder, president and chief scientist of ClearForest Corporation, a software company specializing in development of text mining tools and applications.