NU Human Language Technology Lab

Welcome!

This is the official site of National University Philippines' Human Language Technology (NU HLT) Lab. We are one of the few labs in the Philippines dedicated to state-of-the-art and applied research on Natural Language Processing (NLP), Computational Linguistics (CL), and Machine Learning (ML).

Currently, we are focused on the following areas:

Multilingual Low-Resource Corpora and Dataset Building
AI Safety, Science of Evaluations, and Benchmarking
Inter- and Multidisciplinary Applications of NLP

The lab is currently managed remotely by Joseph Marvin Imperial.

Joining (for new incoming undegrad/graduate students): If you're interested in working with NU HLT, you may enroll at National University's graduate programs under MS or PhD in Computer Science. We also advise you to take a look at the NU Admissions Page and the Graduate Program's FB Page for more information.

Joining (for current NU students): Send an email to Joseph Imperial regarding your interest/intent to pursue an NLP-related undergraduate or graduate thesis.

Our Research
Take a look at our excellent track record in published NLP research at high-impact venues. We have about 30+ NLP papers published across a number of high-ranking conferences, including ACL, AIED, EMNLP, NAACL, PACLIC, and ICML.

2025

Joseph Marvin Imperial, Abdullah Barayan, Regina Stodden, Rodrigo Wilkens, Ricardo Munoz Sanchez, Lingyun Gao, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Reynolds, Eugenio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas Francois, Fernando Alva-Manchego, Harish Tayyar Madabushi. UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. (EMNLP 2025 Main). [pdf] [project] [dataset]

Lester James V. Miranda, Elyanah Aco, Conner Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial. FilBench: Can LLMs Understand and Generate Filipino? (EMNLP 2025 Main). [pdf] [code] [blog]

Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial (40+ authors). INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge. (ICLR 2025, Spotlight Top 5%). [pdf][dataset]

Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, (40+ authors). Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. (ACL 2025 Main). [pdf]

2024

Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, et al. Near to Mid-term Risks and Opportunities of Open Source Generative AI. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024 Oral, Top 1.5%). Vienna, Austria. [pdf]

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico. [pdf][code]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA Workshop 2024). [pdf]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (LREC - READI Workshop 2024) co-located at LREC-COLING 2024. [pdf]

2023

Joseph Marvin Imperial, Ekaterina Kochmar. BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore. [pdf][code]

Joseph Marvin Imperial, Ekaterina Kochmar. Automatic Readability Assessment for Closely Related Languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada. [paper][code]

Ma. Beatrice Emanuela Pilar, Ellyza Mari Papas, Mary Loise Buenaventura, Dane Dedoroy, Myron Darrel Montefalcon, Jay Rhald Padilla, Lany Maceda, Mideth Abisado, Joseph Marvin Imperial. CebuaNER: A New Baseline Cebuano Named Entity Recognition Model. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. [paper][code]

Alex Hernandez, Mideth Abisado, Ramon Rodriguez, Joseph Marvin Imperial. Predicting the Use Behavior of Higher Education Students on ChatGPT: Evidence from the Philippines. In 2023 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE 2023). IEEE. [paper]

Cerwin Dexter Dela Rosa, Kreed Zion Lorenzo Lagunilla, Jomari Ramos, Austin Kenneth V San Pedro, Joseph Marvin Imperial. Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition. Third International Conference on Computer Vision and Information Technology (CVIT 2022). [paper]

Gizelle Ponce Mideth Abisado, Lany Maceda, Ramon Rodriguez, Joseph Marvin Imperial, Myron Darrel Montefalcon, Jay Rhald Padilla. Discovering Insights via Hybrid Thematic Analysis: A Case Study on Disaster Risk Reduction and Management for Legazpi City, Albay. 5th International Conference on Machine Learning and Intelligent Systems (MLIS 2023). [paper]

Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez. Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections. 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR 2023). [paper]

2022

Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien, Joseph Marvin Imperial. On Applicability of Neural Language Models for Readability Assessment in Filipino. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED 2022). Durham, United Kingdom. [pdf]

Joseph Marvin Imperial, Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien. A Baseline Readability Model for Cebuano. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA 2022). [pdf] [code]

Joseph Marvin Imperial. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (ACL - CMCL Workshop 2022). Dublin, Ireland. [pdf][code]

Lamar Clarence Cruz, Jessica Nicole dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez. Is Twitter an Echo Chamber? Connecting Online Public Sentiments to Actual Results From the 2019 Philippine Midterm Elections. International Conference on Asian Language Processing (IALP 2022). [paper]

Jomari Valmadrid Ramos, John Michael Ballesta, Andrew Kobe Lee Yam, Moises Kairon Mogol, Ramon Rodriguez, Joseph Marvin Imperial. WikAnalytics: A Web-based Application for Identifying Linguistic Features of a Text Group Supporting Filipino, English, and Taglish Languages. 5th International Conference on Machine Learning and Machine Intelligence (MLMI 2022). [paper]

2021

Joseph Marvin Imperial and Ethel Ong. Under the Microscope: Interpreting Readability Assessment Models for Filipino. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021). [pdf]

Rachel Edita Roxas, Joseph Marvin Imperial, Angelica De La Cruz. Science Mapping of Publications in Natural Language Processing in the Philippines: 2006 to 2020. In Proceedings of the 35th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2021). [pdf]

Joseph Marvin Imperial. BERT Embeddings for Automatic Readability Assessment. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). [pdf]

Joseph Marvin Imperial and Ethel Ong. Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts. In Proceedings of the International Conference on Computers in Education (ICCE 2021). [pdf]

Myron Darrel Montefalcon, Jay Rhald Padilla, Joshua Paulino, Jeline Go, Ramon Llabanes Rodriguez, Joseph Marvin Imperial. Understanding Facial Expression Expressing Hate from Online Short-form Videos. International Conference on E-Society, E-Education and E-Technology (ESET 2021). [paper]

Rommel Hernandez Urbano Jr, Jeffrey Uy Ajero, Angelic Legaspi Angeles, Maria Nikki Hacar Quintos, Joseph Marvin Regalado Imperial, Ramon Llabanes Rodriguez. A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos. International Conference on E-Society, E-Education and E-Technology (ESET 2021). [paper]

2020

Clark Emmanuel Paulo, Arvin Ken Ramirez, David Clarence Reducindo, Rannie Mark Mateo, Joseph Marvin Imperial. A Simple Disaster-Related Knowledge Base for Intelligent Agents. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation (PACLIC 2020). [pdf]

Joseph Marvin Imperial and Ethel Ong. Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability. In Proceedings of the International Asian Language Processing (IALP 2020). Monash University, Kuala Lumpur, Malaysia. [link] [pdf]

Joseph Marvin Imperial and Ethel Ong. Semi-Automatic Construction of Sight Words Dictionary for Filipino Text Readability. In Proceedings of Principles and Practice of Data and Knowledge Acquisition Workshop (PKAW 2020). Yokohama, Japan. [link] [pdf]

2019

Joseph Marvin Imperial, Rachel Edita Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almario. Developing a machine learning-based grade level classifier for Filipino children’s literature. In Proceedings of the International Asian Language Processing (IALP 2019). Shanghai, China. [link] [pdf]

John Daniel Valencia, Laure Al Joseph, Niño Mark Centino, Bernie Fabito, Joseph Marvin Imperial, Ramon Rodriguez, Angelica De La Cruz, Manolito Octaviano, Marilou Jamis. Understanding Anonymous Social Media Posts using Topic Modeling. IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM 2019). [paper]

Open Language Resources
Over the years, our members have contributed to a number of high-impact language resources and corpora from our own research and collaborations. Feel free to use the following for your own research, but don't forget to cite the original papers!

FilBench (https://huggingface.co/blog/filbench)
UniversalCEFR (https://universalcefr.github.io/)
Readability Assessment for Philippine Languages (Tagalog, Cebuano, Bikol, Hiligaynon, Minasbate, Karay-a, and Rinconada) - https://github.com/imperialite/ara-close-lang and https://github.com/imperialite/BasahaCorpus-HierarchicalCrosslingualARA
Expanded Readability Features for Filipino - https://github.com/imperialite/filipino-linguistic-extractors
CebuaNER: A New Baseline Cebuano Named Entity Recognition Model - https://github.com/mebzmoren/CebuaNER
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark (w/ Filipino and Cebuano data) - https://www.universalner.org/
Multilingual Lexical Simplification (w/ Filipino Data) - https://github.com/MLSP2024/MLSP_Data/tree/main
Cohere for AI - Large Dataset for Instruction Tuning (w/ Filipino Data) - https://huggingface.co/datasets/CohereForAI/aya_dataset
Filipino Hate Speech Text Dataset from Tiktok - https://github.com/imperialite/filipino-tiktok-hatespeech
Philippine Languages Online Corpora - https://github.com/imperialite/Philippine-Languages-Online-Corpora
SeaCrowd Data Repository (w/ Philippine language data) - https://github.com/SEACrowd/seacrowd-datahub
Annotated Filipino Tweets in Multiple Domains (Natural Disasters, Politics, Disease) - https://github.com/imperialite/Philippine-Languages-Online-Corpora/tree/master/Tweets

Members and Collaborators
We are extremely grateful for the following current and past students, faculty members, and collaborators of the NU HLT Lab.

Current Members and Faculty Collaborators

Dr. Mideth Abisado
Ramon Rodriguez
Mico Magtira
John Carlo Jimenez
Aiken Gunay
Dr. Lany Maceda (Bicol University)
Dr. Rodolfo Raga (Jose Rizal University)
Dr. Vladimir Mariano (YSEALI Academy of Fulbright University Vietnam)

Past / Graduated Members and Faculty Collaborators

Gian Tan (2025)
Nick Zudeima (2025)
Sheikha Encabo (2025)
James Ald Teves (Silliman University - Internship)
Ray Daniel Cal (Silliman University - Internship)
Josh Villaluz (Silliman University - Internship)
Shayne Maglangit (2023)
Jomari Ramos (2023)
Jessica Nicole Dela Cruz (2023)
Clarence Lamar Cruz (2023)
Cerwin Dexter Dela Rosa (2023)
Kreed Zion Lorenzo Lagunilla (2023)
Austin Kenneth San Pedro(2023)
Michael Ibañez (2022)
Jay Rhald Padilla (2022)
Myron Darrel Montefalcon (2022)
Ranz Sapinit (2022)
Lloyd Antonie Reyes (2022)
Mohammed Hussien (2022)
Ma Beatrice Emanuela Pilar (Silliman University - Internship)
Ellyza Mari Papas (Silliman University - Internship)
Mary Loise Buenaventura (Silliman University - Internship)
Dane Dedoroy (Silliman University - Internship)
Clark Emmanuel Paulo (2020)
Arvin Ken Ramirez (2020)
David Clarence Reducindo (2020)
Rannie Mark Mateo (2020)
Erica Mae Campos (2019)
Jemelee Oandasan (2019)
Reyniel Caraballo (2019)
Ferry Winsley Sabdani (2019)
Dr. Ani Rosa Almario (Adarna House Inc.)
Dr. Rachel Edita Roxas (Former VP for Research, now at UP Los Baños)
Dr. Alex Hernandez (Former Faculty Member, now at LPU)
Nathaniel Oco (Former Lab Head, now at DLSU)
Manolito Octaviano Jr. (Former Faculty Member)
Angelica De La Cruz (Former Faculty Member)

Contact

You may contact or send inquiries to:

Joseph Marvin R. Imperial
Senior NLP Researcher and Faculty Member
jrimperial@national-u.edu.ph

Page updated

Google Sites

Report abuse