Ryo Kamoi

鴨井遼 (ja)

Ryo Kamoi is a Ph.D. student in Computer Science at Penn State University advised by Dr. Rui Zhang. He received his master’s degree in CS from UT Austin where he was advised by Dr. Greg Durrett, and received his bachelor’s degree in Statistics from Keio University where he was advised by Dr. Kei Kobayashi. He is currently interning at Microsoft OAR and previously interned at Amazon Alexa.

He is broadly interested in Natural Language Processing, especially focusing on:

Reasoning capability of LLMs
- Error detection [COLM'24] and reward modeling [arXiv'25]
- Self-correction and verifier-guided refinement [TACL'24]
Fact checking, factuality evaluation, and NLI [EMNLP'23, EACL'23]
Vision-Language Models [COLM'25]

News

Jul 2025 A first-author paper that pointed out limitations of vision-language models at perceiving geometric information has been accepted to COLM 2025!

Jun 2025 A co-authored paper about evaluating LVLMs on high-resolution images has been accepted to ICCV 2025! Congratulations, Yusen!

May 2025 We published a new preprint on improving process reward models (PRMs) without relying on human annotation!

May 2025 I will start my internship at Microsoft later this month!

May 2025 A co-authored paper has been accepted to ICML 2025! Congratulations, Renze!

Apr 2025 I passed my comprehensive exam! Thank you to my advisor, committee members, and co-authors for their support!

Mar 2025 A co-authored paper has received the Best Paper Award at the 2nd AI4Research workshop @ AAAI 2025! Congratulations, Renze!

Jan 2025 A co-authored paper has been accepted to ICLR 2025! Congratulations, Sarkar!

Dec 2024 We published a new preprint on evaluating LVLMs at visual perception!

Nov 2024 I will present our survey paper on self-correction of LLMs (TACL) at EMNLP 2024 (oral, in person)!

Aug 2024 A first-author survey paper on self-correction of LLMs has been accepted to TACL!

Jul 2024 A first-author paper on LLM error detection has been accepted to COLM 2024!

Jun 2024 A co-authored paper has been accepted to ACL 2024! Congratulations, Yilun!

May 2024 A co-authored paper has been accepted to NAACL 2024! Congratulations, Yusen!

FoVer: Training PRMs with Formal Verification Tools (2025)

Process Reward Models (PRMs) provide step-level verification to LLM reasoning. However, collecting accurate step-level labels for training PRMs is a bottleneck. We propose FoVer, an approach to use formal verification tools like Z3 and Isabelle to automatically annotate step-level error labels on LLM responses without relying on human annotation. This data synthesis is feasible only for tasks compatible with these tools, but our training data improves LLM-based PRMs over broad reasoning tasks.

Critical Survey of Self-Correction of LLMs (TACL 2024)

We critically survey broad papers and discuss the conditions required for successful self-correction. Our survey indicates that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs, except for studies in tasks that are exceptionally suited for self-correction, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.

ReaLMistake: Evaluating LLMs at Detecting Errors in LLM Responses (COLM 2024)

ReaLMistake is a benchmark for evaluating error detection methods that detects errors in LLM responses. This benchmark includes errors made by GPT-4 and Llama 2 70B on three tasks (math word problem generation, fine-grained fact verification, and answerability classification). We observe that LLMs still cannot reliably detect mistakes made by LLMs. Strong LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans.

Selected publications. For the full list, please see Google Scholar or Semantic Scholar

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang (2025). COLM 2025.

Training Step-Level Reasoning Verifiers with Formal Verification Tools
Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang (2025). arXiv preprint arXiv:2505.15960.

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang (2024). TACL 2024. Oral at EMNLP 2024.

Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang (2024). COLM 2024.

WiCE: Real-World Entailment for Claims in Wikipedia
Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett (2023). EMNLP 2023. Oral.

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization
Ryo Kamoi, Tanya Goyal, Greg Durrett (2023). EACL 2023.

Why is the Mahalanobis Distance Effective for Anomaly Detection?
Ryo Kamoi, Kei Kobayashi (2020). arXiv preprint arXiv:2003.00402.

Education

Penn State University — Ph.D. Student in Computer Science Aug 2023 – Present State College, PA

PhD Advisor: Rui Zhang

University of Texas at Austin — M.S. in Computer Science Aug 2020 – Dec 2022 Austin, TX

Advisor: Greg Durrett, Mentor: Tanya Goyal

Keio University — B.E. in Statistics Apr 2016 – Mar 2020 Tokyo, Japan

Advisor: Kei Kobayashi, Keio Engineering Foundation Award (Top student in the Department of Mathematics)

Work Experience

Microsoft, Office of Applied Research — Research Internship May 2025 – Aug 2025 Redmond, WA

Amazon, Alexa Team — Applied Scientist Internship Jul 2021 – Dec 2021 Cambridge, U.K.

Research on the quality evaluation of Alexa

Services

NLP Colloquium JP (NLPコロキウム) — Staff Mar 2024 – Present

Awards

Keio University Global FellowshipAug 2020

Scholarship for alumni of Keio University to pursue degrees at overseas graduate schools

Keio Engineering Foundation AwardMar 2020

Graduation with highest honors - First place in the Department of Mathematics at Keio University

Media Mentions

Self-Correction in Large Language Models — Communications of the ACMFeb 2025

Interview about our survey paper on LLM self-correction.

Invited Talks

Matsuo-Iwasawa Lab at the University of Tokyo (LLM講座, ja)

Lecture about self-correction of LLMs based on our survey paper

Oct 17, 2024 The University of Tokyo (Online)

NLP Colloquium JP (NLPコロキウム, ja)

Talk about our paper “WiCE Real-World Entailment for Claims in Wikipedia”

Nov 29, 2023 Online

Nagoya NLP Seminar at Nagoya University (名古屋NLPセミナー, ja)

Talk about our papers on factuality and entailment

Jul 3, 2023 Nagoya University

How to Pronounce My Name

Preferred English Pronunciation:
Native Pronunciation: