One year ago, Tomáš Mikolov (together with his colleagues at Google) made some ripples by releasing word2vec, an unsupervised algorithm for learning the meaning behind words. In this blog post, I’ll evaluate some extensions that have appeared over the year. Read more on Making sense of word2vec…|
The latest gensim release of 0.10.3 has a new class named Doc2Vec. All credit for this class, which is an implementation of Quoc Le & Tomáš Mikolov: “Distributed Representations of Sentences and Documents”, as well as for this tutorial, goes to the illustrious Tim Emerick. Read more on Doc2vec tutorial…|
Latent Dirichlet Allocation (LDA), one of the most used modules in gensim, has received a major performance revamp recently. Using all your machine cores at once now, chances are the new LdaMulticore class is limited by the speed you can feed it input data. Make sure your CPU fans are in working order! Read more on Multicore LDA in Python: from over-night to over-lunch…|
There are tools and concepts in computing that are very powerful but potentially confusing to novices. One such concept is data streaming (aka lazy evaluation), which can be realized neatly and natively in Python. Do you know when and how to use generators, iterators and iterables? Read more on Data streaming in Python: generators, iterators, iterables…|
MALLET, “MAchine Learning for LanguagE Toolkit” is a brilliant software tool. Unlike gensim, “topic modelling for humans”, which uses Python, MALLET is written in Java and spells “topic modeling” with a single “l”. Dandy. Read more on Tutorial on Mallet in Python…|
I never got round to writing a tutorial on how to use word2vec in gensim. It’s simple enough and the API docs are straightforward, but I know some people prefer more verbose formats. Let this post be a tutorial and a reference example. Read more on Word2vec Tutorial…|
Previous posts explained the whys & whats of nearest-neighbour search, the available OSS libraries and Python wrappers. We converted the English Wikipedia to vector space, to be used as our testing dataset for retrieving “similar articles”. In this post, I finally get to some hard performance numbers. Read more on Performance Shootout of Nearest Neighbours: Querying…|
The end of the year is proving crazy busy as usual, but gensim acquired a cool new feature that I just had to blog about. Ben Trahan sent a patch that allows automatic tuning of Latent Dirichlet Allocation (LDA) hyperparameters in gensim. This means that an optimal, asymmetric alpha can now be trained directly from your data. Read more on Asymmetric LDA Priors, Christmas Edition…|
Efficient topic modelling in Python| radimrehurek.com
Efficient topic modelling in Python| radimrehurek.com
Now that I have a blog, I figured I could start posting more info about our travels. So here’s a little digest from one of our recent trips. I’m hoping it will be useful to other tourists looking to visit Gran Canaria, especially in the same season we went (February).| RaRe Consulting
For our vacation this year, we picked Vietnam: Christmas in the touristy region of Mui Ne, then New Year in Saigon (aka Ho Chi Minh City). This post is a short travel log with pictures, highlights, travel costs and tips from our stay there. My hope is this may help someone planning a similar trip.| RaRe Consulting