Recruiting: I am recruiting research interns and PhD students. Send me an email if you feel interested in doing research with me at Rutgers University. Preferably with excellent programming skills and master commonly used algorithms and data structures so that you can get started right away.
Welcome to my homepage
I am an assitant professor in the Computer Science Department at Rutgers University since 2019. I obtained my PhD from Tsinghua University and had my Postdoc training at MIT CSAIL. I regularly publish my research at major data management and database system conferences e.g., SIGMOD, PVLDB, and ICDE.
My research interests are data management, data science, and database systems, with a focus on developing novel algorithm and buiding practical systems to address data problems. My current research topics includes:
- scalable data curation (textual data curation, structured data curation, and feature data curation)
- data manipulation and wrangling at scale
- data integration, data cleaning, and data discovery
- scientific dataset management, data lake management, data warehouse management
What’s newThe Invisiable Failures (selected)
- 2022-03: Our group had a paper about full-text near-duplicate detection accepted by SIGMOD 2022!
- 2022-02: My student Chaoji had a paper about data wrangling accepted by SIGMOD 2022!
- 2021-05: My student Chaoji Zuo is in the finalist of SIGMOD programming contest. The task this year is is an Entity Resolution problem. He used random forest (and more) and achieved >0.93 F1-score.
- 2021-03: A paper got accepted by SIGMOD 2021.
- 2020-10: A paper got accepted by PVLDB 2020.
Zhizhi Wang, Chaoji Zuo, Dong Deng*
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection.
Chaoji Zuo, Sepehr Assadi, Dong Deng*
SPINE: Scaling up Programming-by-Negative-Example for String Filtering and Transformation.
Weiqi Feng, Dong Deng*
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts.
Runhui Wang, Dong Deng*
DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search.