Research Area of Data Mining Lab

Our Vision

Our research group is interested in data science with artificial intelligence and machine learning to develop effective and scalable algorithms for analyzing real-world data. We specifically study for understanding relationships between entities on graph data with interesting applications such as ranking, recommender system, and anomaly detection.

Graph Machine Learning

How can we learn from graph data in complex systems?

Graphs represent attributes of entities and their relationships in complex systems such as social networks, hyperlink networks, knowledge graphs, user-item networks, molecular graphs, biomedical networks, computational graphs, as well as modeling 3D objects in a point cloud, connections of computer systems, and function calls in source code. Graph machine learning is an emerging research area, which aims at learning deep representation on graphs for a wide range of applications from recommender systems and natural language processing to drug discovery and fraud detection.

In this project, we work on designing machine learning methods for real-world graphs modeling complex, dynamic, and richly-labeled relational structures.

Publications

  • Time-aware Random Walk Diffusion to Improve Dynamic Graph Learning, AAAI, 2023.
  • Accurate Node Feature Estimation with Structured Variational Graph Autoencoder, SIGKDD, 2022.
  • Signed Random Walk Diffusion for Effective Representation Learning in Signed Graphs, PLoS ONE, 2022.
  • Compressing Deep Graph Convolution Network With Multi-staged Knowledge Distillation, PLoS ONE, 2021.
  • Accurate Relational Reasoning in Edge-labeled Graphs by Multi-Labeled Random Walk with Restart, WWW Journal, 2020.

Large-scale Data Analytics

How can we design scalable algorithms for big data?

Data such as networks are continuously growing thanks to recent advances in the Web and computing technologies; the extent of networks reaches tera- or peta-scale. As a result, traditional methods fail to compute a ranking in a reasonable time with restricted resources on very large graphs. Networks also become complicated by involving massive attributes on nodes and links to represent various events. Those obstacles degrade the performance of applications on data in terms of speed and quality.

In this project, we aim to develop efficient and scalable methods for faster mining on large-scale data.

Publications

  • TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions, ICDM, 2023.
  • NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors, The Web Conference, 2023.
  • Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and Incremental Approach, Machine Learning, 2021.
  • BalanSiNG: Fast and Scalable Generation of Realistic Signed Networks, EDBT, 2021.
  • BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart, SIGMOD, 2017.
  • Random Walk with Restart on Large Graphs Using Block Elimination, TODS, 2016.

Applied Data Science

How can we use real-world data for making better decisions?

Data science plays a crucial role for knowledge discovery from underlying data, applying computational techniques leading to meaningful and insightful solutions to academic and industrial problems such as ranking, regression, prediction, recommendation, and anomaly detection. For example, many e-commercial companies utilize and analyze user historical data to effectively recommend items, movies, news, articles, friends, restaurants, etc. Data mining techniques help in forecasting the customers who buy the policies, analyze the medical claims that are used together, find out fraudulent behaviors and risky customers.

In this project, we conduct multi-diciplinary research using applied data science to develop beneficial applications based on real-world data.

Publications

  • Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion, SIGKDD, 2021.
  • Random Walk Based Ranking in Signed Social Networks: Model and Algorithms, KAIS, 2019.
  • Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Pattern an Arbitrary Time Range, CIKM, 2018.
  • A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems, BigData, 2017.
  • Personalized Ranking in Signed Networks using Signed Random Walk with Restart, ICDM, 2016.