The London School of Hygiene & Tropical Medicine (LSHTM) PhD Studentship Opportunity (2026–27)

0
2
PhD Studentship Opportunity (2026–27)

The London School of Hygiene & Tropical Medicine (LSHTM) invites applications for a cutting-edge doctoral research opportunity exploring how Large Language Models (LLMs) can transform the way complex diseases are identified in large-scale Electronic Health Record (EHR) databases.

This project addresses one of the most pressing challenges in modern health data science: developing accurate, transparent and efficient methods for phenotyping—the process of identifying patient groups with specific diseases or conditions from routine clinical data. With clinical datasets expanding rapidly, AI-driven approaches offer the potential for major advances in epidemiological research.

Facilitating Complex Phenotyping for Electronic Health Records Using Large Language Models  MRC London Intercollegiate Doctoral Training Partnership (MRC LID) Studentship
Project available for 2026/27 entry
Full-time or Part-time


Supervisory Team

Primary Supervisor
Dr Julian Matthewman, LSHTM
Faculty of Epidemiology & Population Health
Department of Non-communicable Disease Epidemiology
Email: julian.matthewman@lshtm.ac.uk

Co-Supervisor
Professor Sinéad Langan, LSHTM
Faculty of Epidemiology & Population Health
Department of Non-communicable Disease Epidemiology
Email: sinead.langan@lshtm.ac.uk


Project Overview

Accurate identification of patients with complex conditions within EHR systems is essential for high-quality epidemiological research. Traditional phenotyping methods rely heavily on manual expert interpretation and are often difficult to scale—particularly for diseases involving subtypes, diagnostic uncertainty, or multifaceted clinical histories.

This PhD project will investigate how Large Language Models—with their capacity to process and interpret vast amounts of clinical information—can enhance and accelerate phenotyping workflows. The successful candidate will develop a transparent, reproducible framework for LLM-assisted phenotyping and apply it to create an Atlas of Complex Disease Phenotypes using the UK’s CPRD Aurum primary care database.

The work will generate a novel methodology of broad value to the health data science community, demonstrating the practical impact of advanced AI technologies on epidemiological research.


Key Objectives

  1. Review the current landscape of LLM applications in clinical classification and phenotyping, including the development of a living scoping review.

  2. Design a reproducible framework integrating clinical expertise with LLM-derived insights for high-quality phenotyping.

  3. Apply and validate the framework by creating phenotype definitions for a range of complex diseases in CPRD Aurum—likely within skin or inflammatory disease areas, depending on candidate and supervisor expertise.

  4. Evaluate epidemiological metrics (e.g. incidence, prevalence) produced using the new LLM-derived phenotypes and compare them with established methods.


Skills You Will Develop

  • Large Language Model evaluation, prompt design and fine-tuning

  • Electronic Health Record data management and analysis

  • Epidemiological study design and interpretation

  • Quantitative analysis using R or Python

  • Natural Language Processing methods for clinical text

  • Reproducible research and open science practices

This project aligns with MRC LID themes in Health Data Science, Translational & Implementation Research, and Global Health.


Data Resource

You will work primarily with CPRD Aurum, a comprehensive UK primary care dataset with linkages to hospital records and other health data sources. It includes diagnoses, symptoms, prescriptions, referrals and test results.


Eligibility & Entry Requirements

Applicants must meet LSHTM’s standard doctoral eligibility criteria and should demonstrate:

  • A strong quantitative background and health-related training

  • A Master’s degree in epidemiology, health data science, medical statistics, or a related field

  • Proficiency in Python or R


Study Format & Location

  • Full-time: Yes

  • Part-time: Yes

  • Primary location: LSHTM, Bloomsbury, London

  • Travel requirements: None beyond standard conference attendance (up to three across the studentship)

Students funded through MRC LID are expected to work regularly on site.

For more information click here

For more opportunities, Click HERE