// ML Researcher · Architect of Language Models
I build ML models of language — and increasingly, I evaluate whether today's LLMs actually understand it. My current work develops novel calibration metrics (ESR, CDS) to benchmark frontier models (GPT, Claude, Gemini) on fine-grained social and pragmatic tasks.
My background spans Computer Science, Media Studies, and Computational Linguistics (BSc–MSc–PhD). I bring 15+ years of rigorous modeling — game theory, multi-agent RL, probabilistic NLP — to the question that matters most in applied AI: does the model actually get what humans mean?
I'm actively seeking roles in NLP research, LLM evaluation, or applied language AI where linguistic depth meets engineering ambition.
Combining formal linguistic theory with modern ML methods to understand and model natural language.
Active research at the intersection of NLP, LLM evaluation, and computational pragmatics.
Developing ESR and CDS metrics to evaluate GPT-4, Claude, and Gemini on politeness, precision, and register. Submitted to CMCL 2026 · follow-up targeting ACL/EMNLP
Bayesian/RSA models predicting human judgments of politeness and (im)precision with >90% accuracy · benchmarking against transformer-based baselines
40+ peer-reviewed papers spanning NLP, AI, linguistics, and cognitive science · journals include Synthese, Linguistics & Philosophy, Experimental Economics, AI & Society
NAWA-funded (€75K) · Nicolaus Copernicus University · 2019–2021 · iterated learning models, experimental paradigm design