This Buwan ng Wika (National Language Month), I'm proud to introduce FilBench, a big step forward in Filipino NLP evaluation. Read to learn more!| Lj Miranda
Just a fun weekend experiment on model-context protocol (MCP): I asked several tool-calling LLMs to draw a 4-frame spritesheet of a swordsman performing a sl...| Lj Miranda
The rise of LLMs is forcing us to rethink Filipino NLP. But there's still a ton of work to do—just not the stuff you might think. Here's my take on what's worth doing, what's a waste of time, and where Filipino NLP research should be heading.| Lj Miranda
Can we spot differences between preference pairs just by looking at their word embeddings? In this blog post, I want to share my findings from examining lexical distances between chosen and rejected responses in preference datasets.| Lj Miranda
A few weeks ago, I held a guest lecture at University of North Carolina Charlotte on how we can use large language models for annotation in the context of argument mining and fact verification. Here are the contents of that lecture in blog post format.| Lj Miranda
While cloning a repository from an organization, I encountered an SSH error that I've never seen before. It's something related to SAML SSO. I managed to solve it, so I'm documenting the steps here. Hope it helps you too!| Lj Miranda
Lately, I've been thinking a lot about visualizing datasets, and good old-fashioned t-SNE embeddings came to mind. In this blog post, indulge me as I examine a "data map" of our Tagalog NER dataset.| Lj Miranda
A collection of notes, projects, and essays.| Lj Miranda
Large language models showed promise on structured prediction tasks like named entity recognition and text categorization. But how well do they perform when ...| Lj Miranda
A development log on the calamanCy project and the Tagalog NLP pipeline. The tl;dr: we just finished re-annotating the dataset. I also want to share my learn...| Lj Miranda
As an extension of my previous post on using LLMs to annotate argument mining datasets, I want to explore how we can incorporate annotation guidelines into a...| Lj Miranda
In this blog post, I want to demonstrate how we can leverage large language models like GPT-3 as a viable affordance to reduce a human annotator's cognitive ...| Lj Miranda
In the age of big data and large language models, building NLP pipelines for Tagalog is still difficult. In this blog post, I'll report my progress on buildi...| Lj Miranda