III-COR: Collaborative Research: Mining Biomedical and Network Data Using Tensors

Supported by the National Science Foundation (NSF) - IIS-0705359, IIS-0705215


Investigators

Christos Faloutsos, PhD
Department of Computer Science

Carnegie Mellon University
Wean Hall

5000 Forbes Ave.

Pittsburgh, PA 15213-3891

412-268-1457, 412-268-5576 (fax)
christos AT cs DOT cmu DOT edu
http://www.cs.cmu.edu/~christos

Vasileios Megalooikonomou, PhD
Department of Computer and Information Sciences

Temple University
314 Wachman Hall

1805 N. Broad Street

Philadelphia PA 19122

215-204-5774, 215-204-5082 (fax)
vasilis AT cis DOT temple DOT edu
http://www.cis.temple.edu/~vasilis

List of Personnel

Keywords

Data mining

fMRI images
network traffic

tensors

Project Summary

Given a large collection of fMRI images over time, how can one find patterns and correlations? Similarly, given a never-ending stream of network traffic information, how can one monitor for anomalies, intrusions, and potential failures? The main idea behind this proposal is to treat both problems using the theory of tensors. Despite the seemingly wide differences in the two settings, they both boil down to finding patterns in multidimensional arrays, sparse or dense. Tensors are exactly generalizations of matrices, and correspond roughly to ``DataCubes'' of data mining. Matrix analysis and decompositions are part of the standard toolbox for data mining, with SVD/PCA/LSI being the typical methods for dimensionality reduction, pattern discovery and ``hidden variable'' discovery. Extending these tools to higher dimensionalities is valuable and tensors provide the tools to do this generalization. However, these tools have not yet been put to use in large volume data mining. This is the main contribution of this proposal. The investigators propose (a) to design tensor decomposition algorithms that scale for large datasets, with special attention to sparse datasets, and to never-ending streams of data and (b) to apply them on two driving applications, fMRI data analysis and network data analysis. The investigators propose to analyze large volumes of fMRI data performing the following sub-tasks: cluster voxels with similar behavior over time for a given subject and/or task or across subjects and/or tasks, classify patterns of brain activity, and detect lag correlations and spatio-temporal patterns among fMRI time sequences. The investigators also propose to perform the following inter-related tasks on multiple GigaBytes of network flow data: anomaly detection, pattern discovery, and compression. Both of these applications are important for medicine, health management, and for computer and national security. Analysis of fMRI data can help understanding how the brain functions, which parts of the brain collaborate with what other parts, and whether there are variations across subjects and across task-related activities. For the network traffic monitoring setting, fast detection of anomalies is important, to spot malware, port-scanning attempts, and just plain non-malicious failures. The educational goals include incorporating the research findings in advanced graduate courses at CMU (15-826) and at Temple (9664, 9665) and proposing tutorials in leading conferences in databases, data mining and bio-informatics audiences.  

Publications and Products

M. Barnathan, V. Megalooikonomou, C. Faloutsos, F.B. Mohamed, S. Faro, " TWAVE: High-order Analysis of Functional MRI ", Neuroimage, 58(2): pp. 537-548, 2011.

M. Barnathan, V. Megalooikonomou, C. Faloutsos, F.B. Mohamed, S. Faro, " TWave: High-Order Analysis of Spatiotemporal Data ", Proceedings of the 2010 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, June 2010, Part I, LNAI 6118, pp. 246.253, 2010.

M. Barnathan, V. Megalooikonomou, C. Faloutsos, F.B. Mohamed, S. Faro, " High-Order Concept Discovery in Functional Brain Imaging ", Proceedings of the 2010 International Symposium on Biomedical Imaging, Rotterdam, April 2010.

Q. Wang, V. Megalooikonomou, C. Faloutsos, " Time Series Analysis with Multiple Resolutions", Information Systems, Vol. 35, No. 1, pp. 56-74, 2010.

Charalampos E. Tsourakakis " MACH: Fast Randomized Tensor Decompositions ", Proceedings of the SIAM Data Mining Conference 2010, Columbus, Ohio, pp. 689-700, (2010).

E. Miranda, G. Shan, and V. Megalooikonomou, " Performing Vector Quantization Using Reduced Data Representation ", Proceedings of the Data Compression Conference (DCC), Salt Lake City, Utah, 2009.

C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos, " DOULION: Counting Triangles in Massive Graphs with a Coin ", Proceedings of the ACM SIGKDD'09, Paris, France, 2009.

Q. Wang, V. Megalooikonomou, "A Performance Evaluation Framework for Association Mining in Spatial Data", Journal of Intelligent Information Systems, DOI: 10.1007/s10844-009-0115-6, 2009..

V. Megalooikonomou, M. Barnathan, D. Kontos, P. R. Bakic, A. D.A. Maidment, " A Representation and Classification Scheme for Tree-like Structures in Medical Images: Analyzing the Branching Pattern of Ductal Trees in X-ray Galactograms", IEEE Transactions on Medical Imaging, Vol. 28, No. 4, 2009, pp. 487-493.

Leman Akoglu, Mary McGlohon, Christos Faloutsos, " RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs. ", Proceedings of ICDM, Pisa, Italy, pp. 701-706, 2008.

Duen Horng Chau, Christos Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, and Tina Eliassi-Rad " GRAPHITE: A Visual Query System for Large Graphs ", Proceedings of ICDM, Pisa, Italy, pp. 15-19, 2008.

U Kang, C. Tsourakakis, A. P. Appel, C. Faloutsos, J. Leskovec, " HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop ", Carnegie Mellon University, Technical Report CMU-ML-08-117, December 2008.

M. Barnathan, R. Li, V. Megalooikonomou, F. Mohamed, S. Faro, " Wavelet Analysis of 4D Motor Task fMRI Data", Proceedings of the 22nd International Congress and Exhibition on Computer Assisted Radiology and Surgery, Barcelona, Spain, 2008.

J. Sun, T. Eliassi-Rad, C. E. Tsourakakis, E. Hoke, C. Faloutsos, " Two heads better than one: Pattern Discovery in Time-evolving Multi-Aspect Data ", Data Mining and Knowledge Discovery, Vol. 17, No. 1, pp. 111-128, 2008.

A. P. Appel, D. H. Chau, C. E. Tsourakakis, C. Faloutsos, J. Leskovec, "Shatterviz: Principled Graph Summarization and Visualization" , (under review).

C. Tsourakakis, " Fast Counting of triangles in real-world networks: proofs, algorithms and observations ", IEEE International Conference on Data Mining (ICDM), 2008.

Other specific products

TWave and OpenWaveCluster (GPL) are available " here "

Area References

L.R. Tucker."Some mathematical notes on three-mode factor analysis," Psychometrika, 31(3):279--311, 1966.

R.A. Harshman." Foundations of the parafac procedure: model and conditions for an explanatory multi-mode factor analysis," UCLA working papers in phonetics, 16:1--84, 1970.

Petros Drineas and Michael W. Mahoney." A randomized algorithm for a tensor-based generalization of the svd," Technical Report, 2005.

Jimeng Sun, Dacheng Tao, and Christos Faloutsos." Beyond streams and graphs: Dynamic tensor analysis," KDD,2006.

V. Megalooikonomou, J. Ford, L. Shen, F. Makedon, and Andrew Saykin. "Data mining in brain imaging. Some mathematical notes on three-mode factor," Statistical Methods in Medical Research, 9(4):359--394, 2000.

 

Project Websites

http://knight.cis.temple.edu/~vasilis/research/tensors.html

This is the main website for our project.