Fengyu Cai

Ph.D. Student at TU Darmstadt

{first name}.{last name} [AT] tu-darmstadt.de

Bio

I mainly focus on Machine Learning (ML) and Natural Language Processing (NLP). Now, in the era of LLMs, I specifically work on the retrieval-related works, i.e., (1) When to Retrieve: Given a LLM, when to trigger the retrieval? (2) How to Retrieve: Given a query, how to retrieve the relevant information in the corpus? (3) What to Retrieve: Given a LLM and a query, what is the information that we need to retrieve?

I'm a third-year Doctoral Student at TU Darmstadt, Germany, cosupervised by Prof. Heinz Koeppl and Prof. Iryna Gurevych. I also closely collaborate with Xinran Zhao and Dr. Hongming Zhang. Before my PhD, I finished my master degree in computer science at École Polytechnique Fédérale de Lausanne (EPFL). At that time, I was supervised by Prof. Boi Faltings and Dr. Fei Mi.

Publications

Most recent publications on Google Scholar.
indicates equal contribution.

Knowledge Graph–Augmented DNA Representation Learning

Fengyu Cai*, Erik Kubaczka*, Shaobo Cui, Heinz Koeppl

ICML 2025 Workshop on Multi-modal Foundation Models and LLMs for Life Sciences

Revela: Dense Retriever Learning via Language Modeling

Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl

In submission

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu

In submission

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Jiahui Geng*, Fengyu Cai*, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

In submission

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

In submission

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray

In submission

MixGR:Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

EMNLP 2024 Main Conference

Finetuning Large Language Model for Personalized Ranking

Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

2024 ACM International Conference on Information and Knowledge Management (CIKM 2024)

GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

ACL 2024 Findings

A Survey of Confidence Estimation and Calibration in Large Language Models

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

NAACL 2024 Main Conference

SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling

Fengyu Cai, Wanhao Zhou, Fei Mi, Boi Faltings

ICASSP 2022: the 2022 IEEE Internal Conference on Acoustics, Speech and Signal Processing

Self-training Improves Pre-training for Few-shot Learning in Task-oritented Dialogue Systems

Fei Mi, Wanhao Zhou, Fengyu Cai, Lingjing Kong, Minglie Huang, Boi Faltings

EMNLP 2021: the 2021 Conference on Empirical Methods in Natural Language Processing

CRASH: A Collaborative Aerial-Ground Exploration System Using Hybrid-Frontier Method

Luqi Wang, Fei Gao, Fengyu Cai, Shaojie Shen

ROBIO 2018: 2018 IEEE International Conference on Robotics and Biomimetics

A collaborative aerial-ground robotic system for fast exploration

Luqi Wang, Daqian Cheng, Fei Gao, Fengyu Cai, Jixin Guo, Mengxiang Lin, Shaojie Shen

Proceedings of the 2018 International Symposium on Exxperimental Robotics

Knowledge Graph–Augmented DNA Representation Learning

Fengyu Cai*, Erik Kubaczka*, Shaobo Cui, Heinz Koeppl

ICML 2025 Workshop on Multi-modal Foundation Models and LLMs for Life Sciences

Revela: Dense Retriever Learning via Language Modeling

Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl

In submission

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu

In submission

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Jiahui Geng*, Fengyu Cai*, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

In submission

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

In submission

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray

In submission

MixGR:Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

EMNLP 2024 Main Conference

Finetuning Large Language Model for Personalized Ranking

Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

2024 ACM International Conference on Information and Knowledge Management (CIKM 2024)

GeoHard: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Fengyu Cai, Xinran Zhao, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

ACL 2024 Findings

A Survey of Confidence Estimation and Calibration in Large Language Models

Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

NAACL 2024 Main Conference

SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling

Fengyu Cai, Wanhao Zhou, Fei Mi, Boi Faltings

ICASSP 2022: the 2022 IEEE Internal Conference on Acoustics, Speech and Signal Processing

Self-training Improves Pre-training for Few-shot Learning in Task-oritented Dialogue Systems

Fei Mi, Wanhao Zhou, Fengyu Cai, Lingjing Kong, Minglie Huang, Boi Faltings

EMNLP 2021: the 2021 Conference on Empirical Methods in Natural Language Processing

CRASH: A Collaborative Aerial-Ground Exploration System Using Hybrid-Frontier Method

Luqi Wang, Fei Gao, Fengyu Cai, Shaojie Shen

ROBIO 2018: 2018 IEEE International Conference on Robotics and Biomimetics

A collaborative aerial-ground robotic system for fast exploration

Luqi Wang, Daqian Cheng, Fei Gao, Fengyu Cai, Jixin Guo, Mengxiang Lin, Shaojie Shen

Proceedings of the 2018 International Symposium on Exxperimental Robotics

Vitæ

Acknowledgement

This website uses the website design and template by Martin Saveski