Projects

Building Annotated Datasets for Baseline NLP Models in Filipino

Principal Researcher 

Funding Agency: National University - Philippines

Status: On-going (September 2023 - July 2024 )

In this project, we start the initiative of corpus building in the Filipino language for structured Natural Language Processing tasks such as named entity recognition (NER) and lexical simplification (LS) through a crowdsourcing platform. These models will serve as a benchmark for succeeding research initiatives by fellow researchers in the language processing field. Towards observance of full transparency and reproducibility of research, all code and datasets produced from the project will be opensourced specifically for research and non-commercial works via permissive licenses.

A Text Complexity Perspective on Story Generation with Neural Language Models  

Principal Researcher 

Funding Agency: National University - Philippines

Status: Completed (December 2021 - June 2022 )

This project proposes to investigate the text complexity aspect of story generation using large language models (GPT-3, GPT-2, and BERT). A wide variety of linguistic properties covering lexical, semantics, syntactic, and discourse will be extracted from prompts (WritingPrompts dataset) and generated texts of both model and humans to study for similarity or difference. 

A Novel Post-Processing Technique for Improving Readability Assessment of Texts in Multiple Languages  

Principal Researcher 

Funding Agency: National University - Philippines

Status: Completed (March 2021 - September 2021)

The project aims to explore novel approaches in alleviating the poor performance of machine learning and deep learning-based readability assessment models for low-resource datasets such as Filipino. The project also aims to test the efficacy and integrity of the proposed technique in other languages such as German and English.

Cross-Textual Analysis of COVID-19 Tweets: On Themes and Trends Over Time 

Principal Researcher 

Funding Agency: National University - Philippines

Status: Completed (June 2020 - December 2020)

The project aims to understand how Filipinos react as the timeline of events occur in the Philippines during the COVID-19 pandemic. We performed analyses in two main aspects: identification of prominent themes and its trend of usage over time. We found out that Filipinos (a) express positive emotions such as love, hope, and longing, (b) raise concerns over health and testing, and (c) share their experience as main themes of discussion on Twitter.

Project results were orally at the Information and Communication Technology in concurrent with ICT Excellence Awards 2021 to be held at London, United Kingdom.

AH-NU FiTRI: Filipino Text Readability Index

Principal Researcher

Funding Agencies: Adarna Publishing House, National University - Philippines

Status: Completed (June 2018 - January 2019) 

The project, in partnership with the Department of Computer Science of National University – College of Computing and Information Technology and Adarna House Inc., developed and deployed a readability index software for automatically classifying the grade-level difficulty or readability of Filipino text, specifically children's literature. Currently in use by Adarna House.