A simple way to create a machine learning model which can generate text are markov chains. I had to adjust my workflows at this point, because just loading the whole dataset into memory started causing performance issues… So filtering submissions while loading them from disk was the way to go. Having all HN submission titles, it was pretty quick to filter out successful submissions (ones which received at least a few comments or upvotes), and provide them to a ready-made markov chain librar...