Recruiting: I am recruiting research interns and PhD students. Send me an email if you feel interested in doing research with me at Rutgers University. Preferably with excellent programming skills and master commonly used algorithms and data structures so that you can get started right away.

Welcome to my homepage

I am an assitant professor in the Computer Science Department at Rutgers University since 2019. I obtained my PhD from Tsinghua University and had my Postdoc training at MIT CSAIL. I regularly publish my research at major data management and database system conferences e.g., SIGMOD, PVLDB, and ICDE.

My research interests are data management, data science, and database systems, with a focus on developing novel algorithm and buiding practical systems to address data problems. My current research topics includes:

  • scalable data curation (textual data curation, structured data curation, and feature data curation)
  • data manipulation and wrangling at scale
  • data integration, data cleaning, and data discovery
  • scientific dataset management, data lake management, data warehouse management

What’s new

  • 2024-05: Our paper "Near-Duplicate Text Alignment with One Permutation Hash" is accepted by SIGMOD 2025.
  • 2023-12: Our paper "SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search" is accepted by SIGMOD 2024.
  • 2023-04: Our paper "ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph" is accepted by PVLDB 2023.
  • 2023-02: We are organizing the SIGMOD Student Programming Contest 2023.
  • 2023-02: Our paper "Near-Duplicate Sequence Search at Scale for Large Language Model Memorization" was accepted by SIGMOD 2023.
  • 2023-01: Our paper "The Case for Learned Provenance Graph Storage Systems" was accepted by USENIX Security Symposium 2023.
  • 2022-09: NSF funded our project "III: Small: Large-Scale High Dimensional Dense Vector Management"!
  • 2022-07: NSF funded our project "CDSE: Computation-Informed Learning of Melt Pool Dynamics for Real-Time Prognosis"!
  • 2022-05: Our group won the second place in the SIGMOD Student Programming Contest 2022! In addition, a team of students taking my Adavanced Data Management class won the fourth place. The task this year is about entity blocking. Check it out!

Recent publications: