Recruiting: I am recruiting research interns and PhD students. Send me an email if you feel interested in doing research with me at Rutgers University. Preferably with excellent programming skills and master commonly used algorithms and data structures so that you can get started right away.
Welcome to my homepage
I am an assitant professor in the Computer Science Department at Rutgers University since 2019. I obtained my PhD from Tsinghua University and had my Postdoc training at MIT CSAIL. I regularly publish my research at major data management and database system conferences e.g., SIGMOD, PVLDB, and ICDE.
My research interests are data management, data science, and database systems, with a focus on developing novel algorithm and buiding practical systems to address data problems. My current research topics includes:
- scalable data curation (textual data curation, structured data curation, and feature data curation)
- data manipulation and wrangling at scale
- data integration, data cleaning, and data discovery
- scientific dataset management, data lake management, data warehouse management
What’s newThe Invisiable Failures (selected)
- 2023-04: Our paper "ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph" is accepted by PVLD2023..
- 2023-02: We are organizing the SIGMOD Student Programming Contest 2023.
- 2023-02: Our paper "Near-Duplicate Sequence Search at Scale for Large Language Model Memorization" was accepted by SIGMOD 2023.
- 2023-01: Our paper "The Case for Learned Provenance Graph Storage Systems" was accepted by USENIX Security Symposium 2023.
- 2022-09: NSF funded our project "III: Small: Large-Scale High Dimensional Dense Vector Management"!
- 2022-07: NSF funded our project "CDSE: Computation-Informed Learning of Melt Pool Dynamics for Real-Time Prognosis"!
- 2022-05: Our group won the second place in the SIGMOD Student Programming Contest 2022! In addition, a team of students taking my Adavanced Data Management class won the fourth place. The task this year is about entity blocking. Check it out!
Chaoji Zuo, Dong Deng*
ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph
Zhencan Peng, Zhizhi Wang, Dong Deng*
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization.
Hailun Ding, Juan Zhai, Dong Deng, Shiqing Ma
The Case for Learned Provenance Graph Storage Systems.
USENIX Security Symposium 2023
Zhizhi Wang, Chaoji Zuo, Dong Deng*
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection.
Chaoji Zuo, Sepehr Assadi, Dong Deng*
SPINE: Scaling up Programming-by-Negative-Example for String Filtering and Transformation.
Qingyu Xu, Feng Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, Bingsheng He
Efficient Load-Balanced Butterfly Counting on GPU.
Weiqi Feng, Dong Deng*
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts.
Runhui Wang, Dong Deng*
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search.