Home

Joseph Marvin Imperial

Kumusta! I'm Joseph. I'm a UKRI CDT Doctoral Researcher at the University of Bath's Integrated Ph.D. Program in Accountable, Responsible, and Transparent AI (also called ART-AI).

I do state-of-the-art research in Natural Language Processing (NLP) and Machine Learning (ML). I'm particularly interested in the following research areas:

Aligning, Controlling, and Standardizing for/with Generative AI (Whitepaper 2025, EMNLP2024, EMNLP2023, GEM2023).
Benchmarking Capabilities, Safety, and Potential Risks of Generative AI (ICML2024, AILuminate 1.0 Paper, Humanity's Last Exam).
Building Multilingual Low-Resource Language Corpora (ICLR2025, ACL2025, EMNLP2024, EMNLP2023, NAACL2024, FilBench, UniversalCEFR).

I'm originally from the Philippines 🇵🇭 but am currently based in the UK for the rest of my Ph.D. studies. I'm also affiliated with the Department of Computer Science as an NLP researcher of the Human Language Technology (HLT) Lab at National University, Philippines.

You may visit my Google Scholar page for the complete list of my previous research engagements.

Updates

Aug 20, 2025 - Delighted to share that two of my papers (UniversalCEFR and FilBench) have been accepted as Main Conference to EMNLP 2025!

Aug 12, 2025 - We released FilBench, an open Filipino-centric LLM leaderboard covering tasks such as Cultural Knowledge, Classical NLP, Reading Comprehension, and Generation. Access the paper, leaderboard, and Huggingface blog here.

Jun 2, 2025 - We released UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR in 13 languages. Access the paper, project, and data here.

May 15, 2025 - Our paper Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia has been accepted to ACL 2025 (Main Track).

May 1, 2025 - I'm serving as an Area Chair (AC) for NeurIPS 2025 - Position Papers Track.

February 20, 2025 - I'm excited to join the British Standards Institution (BSI) as a committee member and UK national expert for the AI working group, ART/1, mirroring CEN-CENELEC's Joint Task Force 21 towards developing standards for AI.

January 28, 2025 - Our frontier AI benchmark Humanity's Last Exam (HLE) established by The Center for AI Safety has been released! I contributed 5 hard mathematics and linguistics questions in this benchmark for evaluating advanced AI capabilities. Access HLE here.

January 22, 2025 - Our paper INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge has been accepted at ICLR 2025 (Spotlight). Access the paper here.

October 10, 2024 - I'm an invited speaker at Meta's Open Innovation AI Research Community (OIAIRC) Annual Research Workshop in London. I will present my research on integrating industry standards into LLMs.

September 20, 2024 - I'm happy to have 3 long papers (2 Main, 1 Findings) accepted at EMNLP 2024 covering works on standardized NLG, benchmarking LLMs with specialized dictionaries, and our SEACrowd Project.

July 23, 2024 - I'm happy to receive the Best Reviewer Award (Top ~2% of 7,437 reviewers) at ICML 2024 in Vienna, Austria.

May 8, 2024 - I'm happy to receive the Doctoral Recognition Award 2024 for my research in NLP by the University of Bath Doctoral College.

May 2, 2024 - Our new position paper Near to Mid-term Risks and Opportunities of Open Source Generative AI has been accepted at ICML 2024 (Oral, Top 1.5%). Work led by University of Oxford and supported by Meta. Paper can be found here.