Full Publication List
Preprint
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
arXiv preprint, Sep. 2025
Same Content, Different Representations: A Controlled Study for Table QA
arXiv preprint, Sep. 2025
The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs
arXiv preprint, Aug. 2025
International Conferences & Workshops
Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-k
EMNLP 2025 (Oral)
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
ICLR 2025
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Findings of NAACL 2025
Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models
NAACL 2024 (Oral), acceptance rate: 23.2%
Low-resource Interactive Active Labeling for Fine-tuning Language Models
Findings of EMNLP 2022
Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs
NeurIPS 2022 (Datasets & Benchmarks)
Effective Candidate Selection and Interpretable Interest Extraction for Follower Prediction on Social Media
WI-IAT 2021
Journal Papers
Domestic Conferences/Others
多様な人工グラフを用いた GNN によるノード分類の実証研究
DEIM Forum 2023, Mar. 2023