Publications

View the full, updated list of publications on my Google Scholar.

2025

Joseph Marvin Imperial, Abdullah Barayan, Regina Stodden, Rodrigo Wilkens, Ricardo Munoz Sanchez, Lingyun Gao, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Reynolds, Eugenio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas Francois, Fernando Alva-Manchego, Harish Tayyar Madabushi. UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment. (EMNLP 2025 Main). [pdf] [project] [dataset]

Lester James V. Miranda, Elyanah Aco, Conner Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial. FilBench: Can LLMs Understand and Generate Filipino? (EMNLP 2025 Main). [pdf] [code] [blog]

Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial (40+ authors). INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge. (ICLR 2025, Spotlight Top 5%). [pdf][dataset]

Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, (40+ authors). Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. (ACL 2025 Main). [pdf]

Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary (20+ authors). Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation. Arxiv. [pdf]

Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami, Usman Gohar, Ben Huang, Supheakmungkol Sarin, Elie Alhajjar, Canyu Chen, Roman Eng, Kashyap Ramanandula Manjusha, Virendra Mehta, Eileen Long, Murali Emani, Natan Vidra, Benjamin Rukundo, Abolfazl Shahbazi, Kongtao Chen, Rajat Ghosh, Vithursan Thangarasa, Pierre Peigné, Abhinav Singh, Max Bartolo, Satyapriya Krishna, Mubashara Akhtar, Rafael Gold, Cody Coleman, Luis Oala, Vassil Tashev, Joseph Marvin Imperial, (20+ authors). AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons. Arxiv. [pdf] [website]

Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Joseph Marvin Imperial, (700+ additional authors not mentioned). Humanity's Last Exam. Arxiv. [pdf][website]

Joseph Marvin Imperial, Matthew D. Jones, Harish Tayyar Madabushi. Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance. Arxiv. [pdf]

Joseph Marvin Imperial, Harish Tayyar Madabushi. Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces. Arxiv. [pdf]

2024

Joseph Marvin Imperial, Gail Forey, and Harish Tayyar Madabushi. Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Main). Miami, Florida. [pdf][website]

Joseph Marvin Imperial and Harish Tayyar Madabushi. SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Findings). Miami, Florida. [pdf]

Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, + other authors. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Main). Miami, Florida. [paper] [project]

Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, et al. Near to Mid-term Risks and Opportunities of Open Source Generative AI. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024, Oral Top 1.5%). Vienna, Austria. [pdf]

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter. Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024 Main). Mexico. [pdf][code]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA Workshop 2024). [pdf]

Matthew Shardlow, Fernando Alva-Manchego, Riza Batista-Navarro, Stefan Bott, Saul Calderon Ramirez, Rémi Cardon, Thomas François, Akio Hayakawa, Andrea Horbach, Anna Huelsing, Yusuke Ide, Joseph Marvin Imperial, Adam Nohejl, et al. An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (LREC - READI Workshop 2024) co-located at LREC-COLING 2024. [pdf]

Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial et al. Introducing v0.5 of the AI Safety Benchmark from MLCommons. Arxiv. [paper] [project]

2023

Joseph Marvin Imperial, Harish Tayyar Madabushi. Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP - GEM Workshop 2023). Singapore. [pdf][code]

Joseph Marvin Imperial, Ekaterina Kochmar. BasahaCorpus: An Expanded Linguistic Resource for Readability Assessment in Central Philippine Languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Main). Singapore. [pdf][code]

Joseph Marvin Imperial, Harish Tayyar Madabushi. Uniform Complexity for Text Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Findings). Singapore. [pdf][code]

Joseph Marvin Imperial, Ekaterina Kochmar. Automatic Readability Assessment for Closely Related Languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023 Findings). Toronto, Canada. [paper][code]

Ma. Beatrice Emanuela Pilar, Ellyza Mari Papas, Mary Loise Buenaventura, Dane Dedoroy, Myron Darrel Montefalcon, Jay Rhald Padilla, Lany Maceda, Mideth Abisado, Joseph Marvin Imperial. CebuaNER: A New Baseline Cebuano Named Entity Recognition Model. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37). Hong Kong. [paper][code]

2022

Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien, Joseph Marvin Imperial. On Applicability of Neural Language Models for Readability Assessment in Filipino. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED 2022). Durham, United Kingdom. [pdf]

Joseph Marvin Imperial, Lloyd Lois Antonie Reyes, Michael Antonio Ibañez, Ranz Sapinit, Mohammed Hussien. A Baseline Readability Model for Cebuano. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (NAACL - BEA 2022). [pdf] [code]

Joseph Marvin Imperial. NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (ACL - CMCL Workshop 2022). Dublin, Ireland. [pdf][code]

2021

Joseph Marvin Imperial and Ethel Ong. Under the Microscope: Interpreting Readability Assessment Models for Filipino. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021). [pdf]

Rachel Edita Roxas, Joseph Marvin Imperial, Angelica De La Cruz. Science Mapping of Publications in Natural Language Processing in the Philippines: 2006 to 2020. In Proceedings of the 35th Pacific Asia Conference on Language, Information, and Computation (PACLIC 2021). [pdf]

Joseph Marvin Imperial. BERT Embeddings for Automatic Readability Assessment. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). [pdf]

Joseph Marvin Imperial and Ethel Ong. Diverse Linguistic Features for Assessing Reading Difficulty of Educational Filipino Texts. In Proceedings of the International Conference on Computers in Education (ICCE 2021). [pdf]

2020

Joseph Marvin Imperial and Ethel Ong. Exploring Hybrid Linguistic Feature Sets to Measure Filipino Text Readability. In Proceedings of the International Asian Language Processing (IALP 2020). Monash University, Kuala Lumpur, Malaysia. [link] [pdf]

Joseph Marvin Imperial and Ethel Ong. Semi-Automatic Construction of Sight Words Dictionary for Filipino Text Readability. In Proceedings of Principles and Practice of Data and Knowledge Acquisition Workshop (PKAW 2020). Yokohama, Japan. [link] [pdf]

2019

Joseph Marvin Imperial, Rachel Edita Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almario. Developing a machine learning-based grade level classifier for Filipino children’s literature. In Proceedings of the International Asian Language Processing (IALP 2019). Shanghai, China. [link] [pdf]

2018

Joseph Marvin Imperial, Jeyrome Orosco, Shiela Mae Mazo, Lany Maceda. Sentiment Analysis of Typhoon Related Tweets using Standard and Bidirectional Recurrent Neural Networks. Presented at the Bicol University University-Wide Research and Development Colloquium 2018. BEST PAPER AWARD.

Page updated

Google Sites

Report abuse