Architecting a platform to enable deployments and maintenance of machine learning models is not as straightforward as conventional software architectures. The most common misconception among software developers and data scientists is that a machine learning project lifecycle consists of just training a single successful model and deploying it in a service. The real world scenario is much more complicated than this. In fact, assuming that there would always be a single model in service for a p...| Sujay S Kumar
Evaluating language understanding is as difficult as elucidating the meaning of the word “understanding” itself. Before getting into evaluating computer models on language understanding, let’s explore how we evaluate human “understanding” of natural language. How do we evaluate whether a person understands a particular language? Do we emphasize on the person’s memory of the meanings of different words? Or do we emphasize on the person’s ability to construct sequences of words/to...| Sujay S Kumar
The size of the SOTA neural networks is growing bigger everyday. Most of the SOTA models have parameters in excess of 1 billion.| Sujay S Kumar
This is a running summary of the NLP textbook by Jacob Eisenstein. This blog post adds a bit of my personal take on his ideas.| Sujay S Kumar
Link to start of the blog series: Interpretable ML| Sujay S Kumar
Link to start of the blog series: Interpretable ML| Sujay S Kumar
Link to start of the blog series: Interpretable ML| Sujay S Kumar
Link to start of the blog series: Interpretable ML| Sujay S Kumar
Link to start of the blog series: Interpretable ML| Sujay S Kumar
One of the main impediments to the wide adoption of machine learning, especially deep learning, in critical (and commercial) applications is the apparent lack of trust accorded to these machine learning applications. This distrust mainly stems from the inability to reason about the outputs spit out by these models. This phenomenon is not just relegated to those who are outside the machine learning domain either. Even seasoned machine learning practitioners are flummoxed by the apparent failin...| Sujay S Kumar
This blog post is a summary of the case study made by Google when it first launched it’s voice services. For any company/person trying to undertake this endeavor of building a voice product, this early stage case study is a huge knowledge bank. For one, it is always better to learn from someone else’s mistake rather than our own and two, majority of the underlying architecture of Google’s voice platform has remained the same over the years. Going through this case study gives us an impo...| Sujay S Kumar
Package management is one of the components that contribute to the steep learning curve of using a linux system for a majority of first time linux users. Coming from the GUI environment of Windows or MacOS where installing/removing applications consist of clicking on Next -> Next -> I Agree -> Install/Uninstall, the terminal based package managers might be intimidating. And this issue is not constrained to first time users either. Even though I have used a linux system as my primary OS for se...| Sujay S Kumar
Link to the start of ASR series: Automatic Speech Recognition (ASR Part 0)| Sujay S Kumar
Link to the start of ASR series: Automatic Speech Recognition (ASR Part 0)| Sujay S Kumar
Link to the start of ASR series: Automatic Speech Recognition (ASR Part 0)| Sujay S Kumar
Link to the start of ASR series: Automatic Speech Recognition (ASR Part 0)| Sujay S Kumar
Automatic Speech Recognition (ASR) systems are used for transcribing spoken text into words/sentences. ASR systems are complex systems consisting of multiple components, working in tandem to transcribe. In this blog series, I will be exploring the different components of a generic ASR system (although I will be using Kaldi for some references).| Sujay S Kumar
This blog is a summary of the ideas outlined in Chomsky’s Syntactic Structures.| Sujay S Kumar
In the previous blog post on Transfer Learning, we discovered how pre-trained models can be leveraged in our applications to save on train time, data, compute and other resources along with the added benefit of better performance. In this blog post, I will be demonstrating how to use ELMo Embeddings in Keras.| Sujay S Kumar
Word embeddings are the staple of any Natural Language Processing (NLP) task. In fact, representation of words in the form of vectors is probably the first step in building any NLP application. These vector representations of words fall in a wide spectrum in semantic encoding space, with a one-hot representation on one end of the spectrum, encoding absolutely nothing in terms of semantics between words and the other end of the spectrum still being an active area of research with ELMo embeddin...| Sujay S Kumar
Conditional Random Field (CRF) is the go-to algorithm for sequence labeling problems. Initial attempts at sequence labeling such as POS tagging and Named Entity Recognition were accomplished using the Hidden Markov Models. Although HMMs gave promising results for the same, they suffered from the same drawbacks as using a Naive Bayes models i.e conditional independence. Both Naive Bayes (and HMM in extension) try to fit a model that maximizes the joint probability distribution ( P(X , Y) ). Th...| Sujay S Kumar
We are fully aware of the marked influence of the introduction of Word2Vec method of word embedding on the Natural Language Processing domain. It was a huge leap forward from the hitherto constricting method of word embeddings namely, Term Frequency (TF) and Inverse Document Frequency (IDF). Neither of these methods were anywhere close to preserving the semantics of the words in their representations. With the introduction of Word2Vec and the possibility of semantic embedding in the vectors, ...| Sujay S Kumar
Hidden Markov Models (HMM) were the mainstay of generative models a couple of years ago. Even though more sophisticated Deep Learning generative models have emerged, we cannot rule out the effectiveness of the humble HMM. After all, one of the most widely known principle (Occam’s Razor** states that if you have a number of competing hypothesis, the simplest one is the best one. The purpose of this blog post is to explore the mathematical basis on which HMMs are built.| Sujay S Kumar
I recently came across a blog post by Francois Chollet, the creator of Keras, where he explores the limitations of deep learning methods. It is an extremely informative blog piece, which I would recommend readers to go through before continuing further. I, personally, am guilty of over estimating the capabilities of deep learning for machine learning tasks. Theoretically, a recurrent neural network can be considered as a Turing Complete machine. To put it in a simpler phrase, every Turing Mac...| Sujay S Kumar
One of the major drawbacks of neural networks (or any machine learning model, in general) is the inability to handle data with huge dimensions, effectively. In this blog post, I will be exploring the technique of neural embedding, which is a variation on using auto-encoders for dimensionality reduction. Data with high dimensions pose a very unique problem to any statistical analyses as the volume of the vector space under consideration increases exponentially with increase in dimensionality. ...| Sujay S Kumar
In machine learning, there are two primary categories of models, generative models and discriminatory models. Discriminatory models strive to discriminate the given input into one or the other output classes depending on the type of input data. Whereas a generative model does not have a set of output classes that it has to categorize the data into. A generative model, as it’s name suggests, tries to generate data that fits into the distribution exhibited by the input data. Mathematically, w...| Sujay S Kumar
One of the most striking observation I made in the past couple of years in the Machine Learning domain is the gradual shift in the demographics of ML Engineers from academia to that of the industry. A couple of years ago, ML and AI applications were built by a group consisting predominantly of academicians and researchers in the ML domain. This could be seen from the mass hiring of entire ML departments, students and teachers included, of colleges like MIT by companies like Google, Apple etc....| Sujay S Kumar
In my previous blog post, I explored some of the early ways of word embeddings and their shortcomings. The purpose of this post is to explore one of the most widely used word representations in the natural language processing industry today. Word2Vec was created by a team of researchers led by Tomas Mikolov at Google. According to Wikipedia, Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to r...| Sujay S Kumar
One of the most important aspects of any Natural Language Processing is the representation of the words. This throws up a unique problem of how to represent words in vector form. Words are ambiguous and can have multiple, complex relationships with other words. Additionally, the number of words in the vocabulary of any given language is in hundreds of thousands of magnitude. One of the earliest attempts at word representation was made by WordNet, where each word was represented discretely and...| Sujay S Kumar
Evaluating a classification model is fairly straightforward and simple. You just count how many of the classifications the model got right and how many it didn’t. Evaluating a regression model is not that straightforward, at least from my perspective. One of the useful metric that is used by a majority of the implementations is R-squared. What is R-squared? R-squared is a goodness-of-fit test in order to evaluate how good your model fits the data. It is also known as the coefficient of dete...| Sujay S Kumar
Two of the main validation techniques for CART models are Out-Of-Bag (OOB) validation and k-Fold validation. OOB - Used mainly for Random Forests. k-Fold - Used mainly for XGB models Out-Of-Bag (OOB) Validation: OOB validation is a technique where each tree sample not used in the construction of the current tree becomes the test set for the current tree. As we know, in a random forest, a random selection of data and/or variables is chosen as a subset for training for each tree. This means tha...| Sujay S Kumar
Decision Trees are one of the most intuitive models in the world of perplexing and obscure ML models. This is because of the similarity in the human decision making process and a decision tree. A decision tree can be visualized and we can actually see how a computer arrived at a decision, which is rather difficult in case of other models. Hence, it is also called as a white box model. The purpose of this post is to explore some of the intuition behind building a stand alone decision tree and ...| Sujay S Kumar
It is officially declared that the process of awarding H1B visas by the government of US is based on a lottery system i.e a random process. Therefore, as a random project, I decided to cross verify the claim. Is it truly random or is there a pattern underlying that is not apparent? The US government releases the data of all applications for H1B and the status of whether they were certified or not. You can find the data at United States Department of Labour. I built a neural network in order t...| Sujay S Kumar
One of the important aspects of building a machine learning model is to understand the data first. Most of us forget this and jump right into modelling. Another corollary to this is that we often times forget to build a baseline model before building something complicated. What is a Baseline Model and a Baseline Accuracy? A baseline model, in simple words, is the most simple model that you can build over the provided data. The accuracy that is achieved by a baseline model is the lower bound f...| Sujay S Kumar
The objective of this post is to list down some of the pointers to keep in mind while building a Machine Learning model. Always start with the simplest of models. You can increase the complexity if the performance of a simple model is inadequate. Understand your dataset first. Build a baseline model before building any prediction model. I will expand on this further in another post. Complex models tend to over fit and simpler models tend to under fit. It is your job to find a balance betwee...| Sujay S Kumar
One of the main drawbacks of any NLU neural model is it’s lack of generalization. This topic has been explored extensively in the previous post Empirical Evaluation of Current Natural Language Understanding (NLU). To what can we attribute this lack of common-sense to?| Sujay S Kumar