We, humans, can extract variety of knowledge from given data by the full use of our reasoning. However, such reasoning ability of humans is so limited that most of the massive and complex data acquired through computer network are wasted without any humans' inspection. To provide efficient remedies to this difficulty, our department studies novel reasoning approaches to extract knowledge from the massive and complex data by using computers. These techniques are named data mining and knowledge discovery. We also study the application of these techniques to variety of fields such as science, information network, quality/risk management, medicine, security, marketing and finance. Recently, we obtained significant outcomes in the research topics of knowledge discovery from extremely high dimensional data, knowledge discovery from graph structured data, data mining of comprehensive knowledge and discovery of time dependent law equations from data.

Data consisting of massive variables (extremely high dimensional data) representing numerous events and/or states became available by developments of computer network, ubiquitous sensing and scientific measurement technologies. Examples are sales data of a large scale shopping center under various conditions, global climate data consisting of various and massive meteorological measurements and the profile data of thousands of gene expressions in biological systems. We study novel techniques to estimate variable relations and dynamic mechanisms from such data acquired from large scale and complex structured systems. An example of our study is to develop filtering techniques to estimate state changes of a large scale objective system and dynamic mechanism governing the changes from a time series data consisting of several hundreds to several thousands measurement variables. These techniques enable to analyze mechanisms and to know the associated knowledge on the systems such as a large scale shopping mall and global earth climates.

We develop advanced statistical methods for discovering useful causal structures in data. Such a causal structure is estimated in the form of a graph or a diagram that graphically represents causal relations in an objective system so that it is easily understandable by application experts. The key idea is to extract considerably more information from data than conventional approaches by utilizing non-Gaussianity of data. The idea of non-Gaussianity distinguishes our research from previous works on this line. A promising application is neuroimaging data analysis such as functional magnetic resonance imaging (fMRI) and magnetoencephalograph (MEG). Our method can be applied to brain connectivity analysis. One could model the connections as causal relations between active brain regions. Gene network estimation from microarray data in bioinformatics would be another promising application. Our framework also is a new useful alternative to financial data analysis in economics and traditional questionnaire data analysis in psychology and sociology.

Against a backdrop of accelerating progress of data acquisition technologies, there are more scenes where we deal with high-dimensional data in a variety of engineering problems, such as bioinformatics, natural language processing and image data processing. Such data processing often requires combinatorial computation, where we select the subset of all dimensions that optimizes some criteria. One example is the problem where we seek to find a small number of genes most related to some disease or symptom in gene sequence data consisting of a huge number of genes. But this kind of computation often becomes intractable in practice because of combinatorial explosion caused by the high-dimensionality of data. We develop efficient algorithms applicable to such problems using discrete structure of data, especially submodularity (discrete convexity). And, we aim at discovering important knowledge in a variety of applications by applying the developed algorithms to real-world data.

Our goal is to discover statistically significant knowledge from Big Data toward a deeper understanding of phenomenons. We develop scalable algorithms that find interesting combinatorial patterns from massive data. Moreover, we investigate statistical methods such as hypothesis testing that can control the ratio of false positive patterns. There are many applications: a classic data mining problem of finding combinations of items from purchase data, a modern problem of finding interesting communities from large scale social networks, and detecting common substructures of chemical compounds in drug discovery. In particular, our methods can work with data analysis techniques based on A/B testing to extract reliable knowledge from Big Data.