Selected Publications

View the full list of publications on my Google Scholar page.

2024

Joseph Marvin Imperial, Gail Forey, and Harish Tayyar Madabushi. Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Main). Miami, Florida. [pdf][website]

Joseph Marvin Imperial and Harish Tayyar Madabushi. SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Findings). Miami, Florida. [pdf]

Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, + other authors. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Main). Miami, Florida. [paper] [project]

Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, et al. Near to Mid-term Risks and Opportunities of Open Source Generative AI. In Proceedings of the 41st  International Conference on Machine Learning (ICML 2024, Top 1.5%). Vienna, Austria. [pdf]

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024 Main). Mexico. [pdf][code]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA Workshop 2024). [pdf]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (LREC - READI Workshop 2024) co-located at LREC-COLING 2024. [pdf]

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial et al. Introducing v0.5 of the AI Safety Benchmark from MLCommons. [paper] [project]

2023

Joseph Marvin Imperial, Harish Tayyar Madabushi. Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP - GEM Workshop 2023). Singapore. [pdf][code]

Joseph Marvin Imperial, Ekaterina Kochmar. BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Main). Singapore. [pdf][code]

Joseph Marvin Imperial, Harish Tayyar Madabushi. Uniform Complexity for Text Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Findings). Singapore. [pdf][code]

Joseph Marvin Imperial, Ekaterina Kochmar. Automatic Readability Assessment for Closely Related Languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023 Findings). Toronto, Canada. [paper][code]

Ma. Beatrice Emanuela Pilar, Ellyza Mari Papas, Mary Loise Buenaventura, Dane Dedoroy, Myron Darrel Montefalcon, Jay Rhald Padilla, Lany Maceda, Mideth Abisado, Joseph Marvin Imperial. CebuaNER: A New Baseline Cebuano Named Entity Recognition Model. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. [paper][code]


2022

Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien, Joseph Marvin Imperial. On Applicability of Neural Language Models for Readability Assessment in Filipino. In Proceedings of the  International Conference on Artificial Intelligence in Education (AIED 2022). Durham, United Kingdom. [pdf]

Joseph Marvin Imperial, Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien. A Baseline Readability Model for Cebuano. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA 2022). [pdf] [code]

Joseph Marvin Imperial. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (ACL - CMCL Workshop 2022). Dublin, Ireland. [pdf][code]

2021

Joseph Marvin Imperial and Ethel Ong. Under the Microscope: Interpreting Readability Assessment Models for Filipino. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021). [pdf]

Rachel Edita Roxas, Joseph Marvin Imperial, Angelica De La Cruz. Science Mapping of Publications in Natural Language Processing in the Philippines: 2006 to 2020. In Proceedings of the 35th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2021). [pdf]

Joseph Marvin Imperial. BERT Embeddings for Automatic Readability Assessment. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). [pdf]

Joseph Marvin Imperial and Ethel Ong. Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts. In Proceedings of the International Conference on Computers in Education (ICCE 2021). [pdf]

2020

Joseph Marvin Imperial and Ethel Ong.  Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability. In Proceedings of the International Asian Language Processing (IALP 2020). Monash University, Kuala Lumpur, Malaysia. [link] [pdf]

Joseph Marvin Imperial and Ethel Ong.  Semi-Automatic Construction of Sight Words Dictionary for Filipino Text Readability. In Proceedings of Principles and Practice of Data and Knowledge Acquisition Workshop (PKAW 2020). Yokohama, Japan. [link] [pdf]

2019

Joseph Marvin Imperial, Rachel Edita Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almario. Developing a machine learning-based grade level classifier for Filipino children’s literature. In Proceedings of the International Asian Language Processing (IALP 2019). Shanghai, China. [link] [pdf]

2018

Joseph Marvin Imperial, Jeyrome Orosco, Shiela Mae Mazo, Lany Maceda. Sentiment Analysis of Typhoon Related Tweets using Standard and Bidirectional Recurrent Neural Networks. Presented at the Bicol University University-Wide Research and Development Colloquium 2018. BEST PAPER AWARD.