Acknowledgement: A big thank you to Johanna Haffner of ETH Zürich for giving feedback on this! So I previously wrote Just know stuff. (Or, how to achieve success in a machine learning PhD.) Success in technical subjects requires, in large part, knowing a lot of stuff about that subject. Two years have gone by since then, and in the mean time the field of bioML has become this whole thing. Isomorphic are betting on structure prediction for small molecule therapeutics.| kidger.site
Semantic version (SemVer) is possibly the most widely used software versioning scheme. We all know how SemVer works: MAJOR.MINOR.PATCH. The first number is for backward-incompatible changes, the middle number is for backward-compatible new features, and the last number is for backward-compatible bugfixes. …it’s a shame how infrequently it actually seems to be used this way! Backward incompatible changes on minor versions happen all the time. By far the most common example are deprecations...| kidger.site
TL;DR: you can explicitly use type annotations of the form def f(x: Float[Tensor, "channels"], y: Float[Tensor, "channels"]): ... to specify the shape+dtype of tensors/arrays; declare that these shapes are consistent across multiple arguments; use runtime type-checking to enforce that these are correct. See the (now quite popular!) jaxtyping library on GitHub. And note that the name is now historical – it also supports PyTorch/TensorFlow/NumPy, and has no JAX dependency.| kidger.site
A couple of years ago I made the jump from PyTorch to JAX. Now, the skill of writing autodifferentiable code turns out to translate pretty smoothly between different frameworks. In this case, PyTorch and JAX really aren’t that different: replace torch.foo(...) with jax.numpy.foo(...) and you’re 95% of the way there! What about the other 5%? That’s the purpose of this article! Assuming you already know PyTorch, this is what I’ve found that you need to know to get up to speed with JAX.| kidger.site
If you’ve started a PhD then you probably already have a topic and a supervisor. Your supervisor should have handed you a few problems to cut your teeth on. Unfortunately, this isn’t always the case. Your supervisor may be in a slightly different field to you (this was the case for me). Or your supervisor may have taken on more students than they can handle – this is a common failing of many CS professors.| kidger.site
Introduction So I recently completed my PhD in Mathematics from the University of Oxford. (Hurrah! It was so much fun.) In 2-and-a-bit years I wrote 12 papers, received 4139 GitHub stars, got 3271 Twitter followers, authored 1 textbook – doing double-duty as my thesis – and got the coveted big-tech job-offer. On Neural Differential Equations If you’re interested in a textbook on Neural Differential Equations with a smattering of scientific computing, then my thesis is available online.| kidger.site
If you work in machine learning, then you will have noticed that score-based diffusion models are busy taking over the world: most notably through impressive projects like DALL·E 2 and Imagen. Correspondingly, the internet has become awash with how-tos and explainer posts for how score-based diffusions work. (e.g. one/two/three etc.) Now these posts are generally pretty complicated, and I haven’t seen much intuition offered on score-based diffusions. So, it’s time to throw my hat in the ...| kidger.site
A while ago there was an interesting thread on the Julia Discourse about the “state of machine learning in Julia”. I posted a response discussing the differences between Julia and Python (both JAX and PyTorch), and it seemed to be really well received! Since then this topic seems to keep coming up, so I thought I’d tidy up that post and put it somewhere I could link to easily. Rather than telling all the people who ask for my opinion to go searching through the Julia Discourse until the...| kidger.site