Ryo Kamoi

Ryo Kamoi

鴨井 遼 (ja)

Ryo Kamoi is a Ph.D. student in Computer Science at Penn State University advised by Dr. Rui Zhang. He received his master’s degree in CS from UT Austin where he was advised by Dr. Greg Durrett, and received his bachelor’s degree in Statistics from Keio University where he was advised by Dr. Kei Kobayashi. He interned at Amazon and will be interning at Microsoft from May to August 2025.

He is broadly interested in Natural Language Processing, especially focusing on:

  • Reasoning in LLMs
    • Error Detection [COLM'24] and Reward Modeling [arXiv'25]
    • Self-Correction and Verifier-Guided Refinement [TACL'24]
  • Fact Checking, Factuality Evaluation, and NLI [EMNLP'23, EACL'23]
  • Vision-Language Models [arXiv'24]
FoVer: Training PRMs with Formal Verification Tools (2025)
Process Reward Models (PRMs) provide step-level verification to LLM reasoning. However, collecting accurate step-level labels for training PRMs is a bottleneck. We propose FoVer, an approach to use formal verification tools like Z3 and Isabelle to automatically annotate step-level error labels on LLM responses without relying on human annotation. This data synthesis is feasible only for tasks compatible with these tools, but our training data improves LLM-based PRMs over broad reasoning tasks.
Critical Survey of Self-Correction of LLMs (TACL 2024)
We critically survey broad papers and discuss the conditions required for successful self-correction. Our survey indicates that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs, except for studies in tasks that are exceptionally suited for self-correction, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.
ReaLMistake: Evaluating LLMs at Detecting Errors in LLM Responses (COLM 2024)
ReaLMistake is a benchmark for evaluating error detection methods that detects errors in LLM responses. This benchmark includes errors made by GPT-4 and Llama 2 70B on three tasks (math word problem generation, fine-grained fact verification, and answerability classification). We observe that LLMs still cannot reliably detect mistakes made by LLMs. Strong LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans.
Selected publications. For the full list, please see Google Scholar or Semantic Scholar

Training Step-Level Reasoning Verifiers with Formal Verification Tools PDF Cite Code Dataset Model Website
(2025). arXiv preprint arXiv:2505.15960.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information PDF Cite Dataset Website
(2024). arXiv preprint arXiv:2412.00947.
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs PDF Cite Video Slides Paper List
(2024). TACL 2024. Oral at EMNLP 2024.
Evaluating LLMs at Detecting Errors in LLM Responses PDF Cite Code Dataset Poster
(2024). COLM 2024.
WiCE: Real-World Entailment for Claims in Wikipedia PDF Cite Dataset Slides
(2023). EMNLP 2023. Oral.
Why is the Mahalanobis Distance Effective for Anomaly Detection? PDF Cite
(2020). arXiv preprint arXiv:2003.00402.

Education

Penn State University — Ph.D. Student in Computer Science Aug 2023 – Present State College, PA
PhD Advisor: Rui Zhang
University of Texas at Austin — M.S. in Computer Science Aug 2020 – Dec 2022 Austin, TX
Advisor: Greg Durrett, Mentor: Tanya Goyal
Keio University — B.E. in Statistics Apr 2016 – Mar 2020 Tokyo, Japan
Advisor: Kei Kobayashi, Keio Engineering Foundation Award (Top student in the Department of Mathematics)

Work Experience

Microsoft, Office of Applied Research — Research Internship May 2025 – Aug 2025 Redmond, WA
Amazon, Alexa Team — Applied Scientist Internship Jul 2021 – Dec 2021 Cambridge, U.K.
Research on the quality evaluation of Alexa

Services

NLP Colloquium JP (NLPコロキウム) — Staff Mar 2024 – Present

Awards

Scholarship for alumni of Keio University to pursue degrees at overseas graduate schools
Graduation with highest honors - First place in the Department of Mathematics at Keio University

Media Mentions

Interview about our survey paper on LLM self-correction.

How to Pronounce My Name

  • Preferred English Pronunciation:
  • Native Pronunciation: