I mainly focus on Machine Learning (ML) and Natural Language Processing (NLP). Now, in the era of LLMs, I specifically work on the retrieval-related works, i.e., (1) When to Retrieve: Given a LLM, when to trigger the retrieval? (2) How to Retrieve: Given a query, how to retrieve the relevant information in the corpus? (3) What to Retrieve: Given a LLM and a query, what is the information that we need to retrieve?
I'm a third-year Doctoral Student at TU Darmstadt, Germany, cosupervised by Prof. Heinz Koeppl and Prof. Iryna Gurevych. I also closely collaborate with Xinran Zhao and Dr. Hongming Zhang. Before my PhD, I finished my master degree in computer science at École Polytechnique Fédérale de Lausanne (EPFL). At that time, I was supervised by Prof. Boi Faltings and Dr. Fei Mi.
Most recent publications on Google Scholar.
‡ indicates equal contribution.
Knowledge Graph–Augmented DNA Representation Learning
Fengyu Cai*, Erik Kubaczka*, Shaobo Cui, Heinz Koeppl
ICML 2025 Workshop on Multi-modal Foundation Models and LLMs for Life Sciences
Revela: Dense Retriever Learning via Language Modeling
Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl
In submission
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu
In submission
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Jiahui Geng*, Fengyu Cai*, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray
In submission
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray
In submission
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray
In submission
MixGR:Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl
EMNLP 2024 Main Conference
Finetuning Large Language Model for Personalized Ranking
Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong
2024 ACM International Conference on Information and Knowledge Management (CIKM 2024)
GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics
Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl
ACL 2024 Findings
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych
NAACL 2024 Main Conference
SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling
Fengyu Cai, Wanhao Zhou, Fei Mi, Boi Faltings
ICASSP 2022: the 2022 IEEE Internal Conference on Acoustics, Speech and Signal Processing
Self-training Improves Pre-training for Few-shot Learning in Task-oritented Dialogue Systems
Fei Mi, Wanhao Zhou, Fengyu Cai, Lingjing Kong, Minglie Huang, Boi Faltings
EMNLP 2021: the 2021 Conference on Empirical Methods in Natural Language Processing
CRASH: A Collaborative Aerial-Ground Exploration System Using Hybrid-Frontier Method
Luqi Wang, Fei Gao, Fengyu Cai, Shaojie Shen
ROBIO 2018: 2018 IEEE International Conference on Robotics and Biomimetics
A collaborative aerial-ground robotic system for fast exploration
Luqi Wang, Daqian Cheng, Fei Gao, Fengyu Cai, Jixin Guo, Mengxiang Lin, Shaojie Shen
Proceedings of the 2018 International Symposium on Exxperimental Robotics
Knowledge Graph–Augmented DNA Representation Learning
Fengyu Cai*, Erik Kubaczka*, Shaobo Cui, Heinz Koeppl
ICML 2025 Workshop on Multi-modal Foundation Models and LLMs for Life Sciences
Revela: Dense Retriever Learning via Language Modeling
Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl
In submission
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu
In submission
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Jiahui Geng*, Fengyu Cai*, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray
In submission
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray
In submission
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray
In submission
MixGR:Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl
EMNLP 2024 Main Conference
Finetuning Large Language Model for Personalized Ranking
Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong
2024 ACM International Conference on Information and Knowledge Management (CIKM 2024)
GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics
Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl
ACL 2024 Findings
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych
NAACL 2024 Main Conference
SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling
Fengyu Cai, Wanhao Zhou, Fei Mi, Boi Faltings
ICASSP 2022: the 2022 IEEE Internal Conference on Acoustics, Speech and Signal Processing
Self-training Improves Pre-training for Few-shot Learning in Task-oritented Dialogue Systems
Fei Mi, Wanhao Zhou, Fengyu Cai, Lingjing Kong, Minglie Huang, Boi Faltings
EMNLP 2021: the 2021 Conference on Empirical Methods in Natural Language Processing
CRASH: A Collaborative Aerial-Ground Exploration System Using Hybrid-Frontier Method
Luqi Wang, Fei Gao, Fengyu Cai, Shaojie Shen
ROBIO 2018: 2018 IEEE International Conference on Robotics and Biomimetics
A collaborative aerial-ground robotic system for fast exploration
Luqi Wang, Daqian Cheng, Fei Gao, Fengyu Cai, Jixin Guo, Mengxiang Lin, Shaojie Shen
Proceedings of the 2018 International Symposium on Exxperimental Robotics