Supervision

Habilitation Thesis mentorship

Please send me an email to organise a zoom session. My university’s current requirements are listed here. Be aware of the timeline and requirements.

PhD supervision

See the required documents for PhD registration/grants applications and in French

I am potentially available for supervision in the field of AI research for education, in relation to my research topics, please do take a look at my publications for example on the analysis of current frameworks developed with colleagues, see paper submission in French or on the uses of Speech Models for L2 Scoring.

Analytics for Language Learning In relation to the AI-generated dashboard we have created for the A4LL project, we are looking for PhD candidates with some background in psycholinguistics to conduct field research on UX (User experience) with this platform and dashboard system. For candidates who are more theoretically grounded, we need to carry out an analysis of the metalinguistic comments based on the processing of the dashboard metrics. We also would like to compare the current plot-based interface with metalinguistic feedback. For more downstream tasks, we also search for a candidate able to test the relevance of LLMs to produce exercises in relation to the linguistic issues diagnosed by the system.

Mispronunciation Detection and Diagnosis I can also supervise PhD students willing to test what I call the technical ‘affordance’ of Whisper, analyzing more specifically the ability to properly diagnose mispronunciations based on the investigation of the subtoken and probabilities predicted by Whisper. This implies for example the analysis of a potential value of the mspronunciation thresholds to be determined for different types of subtokens. If you want this paragraph to make sense, please read this paper

biases in speech tokenisers In relation to our project on “Promoting fairness for under-represented languages in multilingual LLMs (2025-2026)” (UW Global Innovation Fund Research award for a UPCIté/UW collaboration to promote fairness for under-represented languages in multilingual LLMs), we investigate architectural biases in speech foundation models, in particular its consequences on ASR and their distribution in the dictionary of subtokens. See our latest EMNLP2025 paper.

the linguistic “knowledge” of LLMs I also elaborate on an epistemological line of research on the “knowledge” of LLMs. See our draft here for an example of potential investigations of phonological knowledge. We are investigating the vector space or hypothesis space of theses models. See our draft.

Keystroke logging modelling Last, for Neural modelling of Keylogs: for students (preferably) with a background in data science, I am looking for applicants likely to investigate neural models for keylogging data for the dataset we collected in our Deep learning for Language Assessment project. See our paper on the dataset published at LREC2024 and our short paper.

MA supervision

Our current brochure for the MA in English linguistics is [here](https://cloud.parisdescartes.fr/index.php/s/Rbcdrk6pdkEKtF4](https://cloud.u-paris.fr/s/QxJ8SqyNPLow4NC). We are part of the Paris Graduate School of Linguistics. See our grants for international students. Please note the early deadline for applications to our Mobility grants (deadline Jan16).

Areas of expertise: neural machine translation, automatic approaches to learner data (written/spoken data), corpus phonology : L2 pronunciation/prosody/varieties of English within a corpus-based/quantitative/automatic/AI perspective, epistemology of linguistic data science

Possible topics for 2026-2027: AI for speech scoring / Large Language Models (LLMs) for learner feedback / Probing the phonological representations of LLMs

AI for L2 speech scoring We are using Whisper to analyse learner phonetic realisations, your role will be to contribute to the analysis of the Whisper outputs. simple version more complex background

Large Language Models (LLMs) for learner feedback (For future teachers of English, The actionability of complexity metrics): at the intersection of second language acquisition (SLA) and Natural Language Processing (NLP), this project invites you to analyse the automatic measurements (metrics) we devised for the A4LL project. We designed a system that automatically analyses texts written by learners. Your role will be to write the corresponding prompts and test the reliability of some of the metrics we implemented.

Probing the phonological representations of LLMs Using Whisper, you will analyse the representations encoded in this Large Language Model analysing the transcriptions and the corresponding speech signal. See this paper.

Nicolas Ballier

Supervision

Habilitation Thesis mentorship

PhD supervision

MA supervision