NU Human Language Technology Lab
NU Human Language Technology Lab
Welcome!
This is the official site of National University Philippines' Human Language Technology (NU HLT) Lab. We are one of the few labs in the Philippines dedicated to state-of-the-art and applied research on Natural Language Processing (NLP), Computational Linguistics (CL), and Machine Learning (ML).
Currently, we are focused on the following areas:
NLP for Social Media Understanding
NLP for Educational Applications
NLP for Low-Resource Corpora and Dataset Building
The lab is currently managed remotely by Joseph Marvin Imperial.
Joining (for new incoming undegrad/graduate students): If you're interested in working with NU HLT, you may enroll at National University's graduate programs under MS or PhD in Computer Science. We also advise you to take a look at the NU Admissions Page and the Graduate Program's FB Page for more information.
Joining (for current NU students): Send an email to Joseph Imperial regarding your interest/intent to pursue an NLP-related undergraduate or graduate thesis.
Publications
Take a look at our excellent track record in published NLP research at high-impact venues. We have about 20+ NLP papers published across a number of high-ranking conferences, including ACL, AIED, EMNLP, NAACL, PACLIC, and ICML.
Take a look at our excellent track record in published NLP research at high-impact venues. We have about 20+ NLP papers published across a number of high-ranking conferences, including ACL, AIED, EMNLP, NAACL, PACLIC, and ICML.
2024
Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, et al. Near to Mid-term Risks and Opportunities of Open Source Generative AI. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024 Oral, Top 1.5%). Vienna, Austria. [pdf]
Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). Mexico. [pdf][code]
Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA Workshop 2024). [pdf]
Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (LREC - READI Workshop 2024) co-located at LREC-COLING 2024. [pdf]
2023
Joseph Marvin Imperial, Ekaterina Kochmar. BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Singapore. [pdf][code]
Joseph Marvin Imperial, Ekaterina Kochmar. Automatic Readability Assessment for Closely Related Languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Toronto, Canada. [paper][code]
Ma. Beatrice Emanuela Pilar, Ellyza Mari Papas, Mary Loise Buenaventura, Dane Dedoroy, Myron Darrel Montefalcon, Jay Rhald Padilla, Lany Maceda, Mideth Abisado, Joseph Marvin Imperial. CebuaNER: A New Baseline Cebuano Named Entity Recognition Model. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. [paper][code]
Alex Hernandez, Mideth Abisado, Ramon Rodriguez, Joseph Marvin Imperial. Predicting the Use Behavior of Higher Education Students on ChatGPT: Evidence from the Philippines. In 2023 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE 2023). IEEE. [paper]
Cerwin Dexter Dela Rosa, Kreed Zion Lorenzo Lagunilla, Jomari Ramos, Austin Kenneth V San Pedro, Joseph Marvin Imperial. Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition. Third International Conference on Computer Vision and Information Technology (CVIT 2022). [paper]
Gizelle Ponce Mideth Abisado, Lany Maceda, Ramon Rodriguez, Joseph Marvin Imperial, Myron Darrel Montefalcon, Jay Rhald Padilla. Discovering Insights via Hybrid Thematic Analysis: A Case Study on Disaster Risk Reduction and Management for Legazpi City, Albay. 5th International Conference on Machine Learning and Intelligent Systems (MLIS 2023). [paper]
Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez. Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections. 6th International Conference on Natural Language Processing and Information Retrieval (NLPIR 2023). [paper]
2022
Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien, Joseph Marvin Imperial. On Applicability of Neural Language Models for Readability Assessment in Filipino. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED 2022). Durham, United Kingdom. [pdf]
Joseph Marvin Imperial, Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien. A Baseline Readability Model for Cebuano. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA 2022). [pdf] [code]
Joseph Marvin Imperial. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (ACL - CMCL Workshop 2022). Dublin, Ireland. [pdf][code]
Lamar Clarence Cruz, Jessica Nicole dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez. Is Twitter an Echo Chamber? Connecting Online Public Sentiments to Actual Results From the 2019 Philippine Midterm Elections. International Conference on Asian Language Processing (IALP 2022). [paper]
Jomari Valmadrid Ramos, John Michael Ballesta, Andrew Kobe Lee Yam, Moises Kairon Mogol, Ramon Rodriguez, Joseph Marvin Imperial. WikAnalytics: A Web-based Application for Identifying Linguistic Features of a Text Group Supporting Filipino, English, and Taglish Languages. 5th International Conference on Machine Learning and Machine Intelligence (MLMI 2022). [paper]
2021
Joseph Marvin Imperial and Ethel Ong. Under the Microscope: Interpreting Readability Assessment Models for Filipino. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021). [pdf]
Rachel Edita Roxas, Joseph Marvin Imperial, Angelica De La Cruz. Science Mapping of Publications in Natural Language Processing in the Philippines: 2006 to 2020. In Proceedings of the 35th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2021). [pdf]
Joseph Marvin Imperial. BERT Embeddings for Automatic Readability Assessment. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). [pdf]
Joseph Marvin Imperial and Ethel Ong. Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts. In Proceedings of the International Conference on Computers in Education (ICCE 2021). [pdf]
Myron Darrel Montefalcon, Jay Rhald Padilla, Joshua Paulino, Jeline Go, Ramon Llabanes Rodriguez, Joseph Marvin Imperial. Understanding Facial Expression Expressing Hate from Online Short-form Videos. International Conference on E-Society, E-Education and E-Technology (ESET 2021). [paper]
Rommel Hernandez Urbano Jr, Jeffrey Uy Ajero, Angelic Legaspi Angeles, Maria Nikki Hacar Quintos, Joseph Marvin Regalado Imperial, Ramon Llabanes Rodriguez. A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos. International Conference on E-Society, E-Education and E-Technology (ESET 2021). [paper]
2020
Clark Emmanuel Paulo, Arvin Ken Ramirez, David Clarence Reducindo, Rannie Mark Mateo, Joseph Marvin Imperial. A Simple Disaster-Related Knowledge Base for Intelligent Agents. In Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation (PACLIC 2020). [pdf]
Joseph Marvin Imperial and Ethel Ong. Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability. In Proceedings of the International Asian Language Processing (IALP 2020). Monash University, Kuala Lumpur, Malaysia. [link] [pdf]
Joseph Marvin Imperial and Ethel Ong. Semi-Automatic Construction of Sight Words Dictionary for Filipino Text Readability. In Proceedings of Principles and Practice of Data and Knowledge Acquisition Workshop (PKAW 2020). Yokohama, Japan. [link] [pdf]
2019
Joseph Marvin Imperial, Rachel Edita Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almario. Developing a machine learning-based grade level classifier for Filipino children’s literature. In Proceedings of the International Asian Language Processing (IALP 2019). Shanghai, China. [link] [pdf]
John Daniel Valencia, Laure Al Joseph, Niño Mark Centino, Bernie Fabito, Joseph Marvin Imperial, Ramon Rodriguez, Angelica De La Cruz, Manolito Octaviano, Marilou Jamis. Understanding Anonymous Social Media Posts using Topic Modeling. IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM 2019). [paper]
Open Language Resources
Over the years, our members have contributed to a number of high-impact language resources and corpora from our own research and collaborations. Feel free to use the following for your own research, but don't forget to cite the original papers!
Over the years, our members have contributed to a number of high-impact language resources and corpora from our own research and collaborations. Feel free to use the following for your own research, but don't forget to cite the original papers!
Readability Assessment for Philippine Languages (Tagalog, Cebuano, Bikol, Hiligaynon, Minasbate, Karay-a, and Rinconada) - https://github.com/imperialite/ara-close-lang and https://github.com/imperialite/BasahaCorpus-HierarchicalCrosslingualARA
Expanded Readability Features for Filipino - https://github.com/imperialite/filipino-linguistic-extractors
CebuaNER: A New Baseline Cebuano Named Entity Recognition Model - https://github.com/mebzmoren/CebuaNER
Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark (w/ Filipino and Cebuano data) - https://www.universalner.org/
Multilingual Lexical Simplification (w/ Filipino Data) - https://github.com/MLSP2024/MLSP_Data/tree/main
Cohere for AI - Large Dataset for Instruction Tuning (w/ Filipino Data) - https://huggingface.co/datasets/CohereForAI/aya_dataset
Filipino Hate Speech Text Dataset from Tiktok - https://github.com/imperialite/filipino-tiktok-hatespeech
Philippine Languages Online Corpora - https://github.com/imperialite/Philippine-Languages-Online-Corpora
SeaCrowd Data Repository (w/ Philippine language data) - https://github.com/SEACrowd/seacrowd-datahub
Annotated Filipino Tweets in Multiple Domains (Natural Disasters, Politics, Disease) - https://github.com/imperialite/Philippine-Languages-Online-Corpora/tree/master/Tweets
Members and Collaborators
We are extremely grateful for the following past and current members and collaborators of the NU HLT Lab.
We are extremely grateful for the following past and current members and collaborators of the NU HLT Lab.
Current Members and Collaborators
Joseph Marvin Imperial (current lab manager)
Dr. Mideth Abisado
Ramon Rodriguez
Mico Magtira
John Carlo Jimenez
Aiken Gunay
Dr. Lany Maceda
Dr. Rodolfo Raga
Dr. Vladimir Mariano
Past Members and Collaborators
Jay Rhald Padilla
Myron Darrel Montefalcon
Shayne Maglangit
Jomari Ramos
Jessica Nicole Dela Cruz
Clarence Lamar Cruz
Michael Ibañez
Ranz Sapinit
Lloyd Antonie Reyes
Mohammed Hussien
Cerwin Dexter Dela Rosa
Kreed Zion Lorenzo Lagunilla
Jomari Ramos,
Austin Kenneth San Pedro
Ma Beatrice Emanuela Pilar (Silliman University)
Ellyza Mari Papas (Silliman University)
Mary Loise Buenaventura (Silliman University)
Dane Dedoroy (Silliman University)
Clark Emmanuel Paulo
Arvin Ken Ramirez
David Clarence Reducindo
Rannie Mark Mateo
Erica Mae Campos
Jemelee Oandasan
Reyniel Caraballo
Ferry Winsley Sabdani
Jesvir Zuniega
Dr. Ani Rosa Almario
Dr. Rachel Edita Roxas
Dr. Alex Hernandez
Nathaniel Oco
Manolito Octaviano Jr.
Angelica De La Cruz
Contact
You may contact or send inquiries to:
Joseph Marvin R. Imperial
Senior NLP Researcher and Faculty Member
jrimperial@national-u.edu.ph