Let’s start with the obvious question, what is a tokenizer? A tokenizer in Natural Language Processing (NLP) is a text preprocessing step where the text is split into tokens. Tokens can be sentences, words, or any other unit that makes up a text. Every NLP package has a word tokenizer implemented in it. But there […] The post Malayalam Subword Tokenizer appeared first on QBurst Blog.