How do we know if we’re making progress towards an AI scientist I’ve always been fascinated by the prospect of automating science. In hindsight, it’s my favorite failed project: having several grants, blog posts, and side projects that have been rejected, never posted, and failed to produce anything interesting time and time again. I’ve never been satisfied by any approach to the problem. At its core it’s because automating science is a very ill-posed problem.| hyperparameter.space
The world is the enzymes' playground and we are but their vessels. We wage their wars, we nurture them, we help them evolve, grow, and replicate. Life really is all about them. Fortunately, we’ve also learned to harness them for our ends, using them to synthesize and edit DNA, digest harmful materials, and, crucially, produce a vast array of molecules with a plethora of applications ranging from agriculture to pharmaceuticals. Compared to what organic synthesis techniques can accomplish wit...| hyperparameter.space
Experimental approaches in biology tend to fall within broader conceptual frameworks that guide the logic of the experimental design. Each of these frameworks carries both a cost and some expected quality on the knowledge obtained from the results. For example, on one end we may have multifactorial perturbation frameworks where we collect samples with little or no control over the perturbations effected on each samples and we attempt to infer what variables are correlated in the system.| hyperparameter.space
A recent tweet from Ash Jogalekar got me thinking. List of compounds medicinal chemists wouldn't have bothered to pursue because they didn't fit "intuition" about "druglike" rules Aspirin Metformin ($400M revenue) Cyclosporin (>$1B) Dimethyl fumarate (>$4B) In drug discovery, there will always be enough exceptions to the rules — Ash Jogalekar (@curiouswavefn) June 24, 2019 Translating it to more ‘machine-learning-ish’ language this means that the problem of predicting ultimately success...| hyperparameter.space
Featurization by any other name would self attend as sweet Even before taking the world by storm through LLMs, transformers have for some time now been explored in various different domains, albeit with specific architectural adaptations. Eventually – and even more so with the advent of LLMs – it started to become commonplace to fix much of the transformer architecture and put the onus of domain application almost entirely into data featurization itself, in a way reminiscent of how tabula...| hyperparameter.space
Going from molecular structure to mass spectra Small molecules make most of our medicines, are the lingua franca of cell communication, metabolism, and signalling, and form an extremely diverse chemical landscape. And they’re everywhere in the environment. Even though we have millions of these little things cataloged in many databases, the chemical space of small molecules, even restricting to biological ones, remains relatively unexplored. It’s a brave ocean of uncharted waters full of p...| hyperparameter.space
It’s no secret that there’s something broken about how science is published and disseminated. Behind each paper there’s a hefty body of work, revisions, unpublished data, and back-and-forth argumentation that doesn’t make it to the final version. Unless a preprint is published or an openly-reviewed avenue is chosen (e.g. eLife, OpenReview, or the various journals that choose post-publication peer review), the whole discussion between authors and reviewers remains behind closed doors.| hyperparameter.space
A Hacker’s Guide to Equivariance Geometric deep learning is a field that has picked up considerable momentum recently. And with good reason, as it deals with ways on how to reason over objects (like graphs, meshes, and protein structures) that are tied to impactful tasks downstream (like predicting molecular properties and automating animation). Additionally, it’s setting up a framework that tries to retroactively explain many of the successes of deep learning with the hope of extrapolati...| hyperparameter.space
Why the machine metaphor has failed in biology and software and the concepts that are replacing it Much progress has been made in laboratory and computational techniques to probe on populations of cells, molecules, neurons, and training of machine learning models. Thanks to these advances, detailed data of both the dynamics and causal relationships in complex systems is beginning to be commonplace. I believe that the conceptual frameworks used to tackle these data are beginning to converge in...| hyperparameter.space
I first found out about probabilistic programming in my later years of grad school when, looking for good tutorial on Bayesian inference, I stumbled upon the excellent Bayesian Methods for Hackers, which heavily features PyMC. I was (and in many ways I still am) a neophyte Bayesian methods, having ignored the quasi-religious sermons that my friends in operations research and actuarial sciences gave to any passer by, swearing by the name of simulation with strange and arcane words ending in UG...| hyperparameter.space
I recently read an interesting entry in the Nintil blog that tries to frame our understanding of biology by asking several key questions. The questions were derived/inspired from Tinberg’s four questions, which are four general directions one can take, ( (evolutionary, proximate or individual) X (ontogenic, mechanistic) ), when studying biological traits. The questions were mainly written with animal behavior in mind (Tinberg was an ethologist) but are broadly applicable to any biological c...| hyperparameter.space
I really can’t help but smile when hearing folks talking about causal models recently. It looks like causal models are making a comeback! This is a pleasant surprise to me, since I’ve always been a fan of causal inference and I wasn’t sure Judea Pearl’s “Book of Why” was going to catch up or not. But now we even have Pearl on Twitter and there’s more light being shed on work that leverages causal models for various problems or that scales them to very high dimensional settings.| hyperparameter.space
MathJax.Hub.Config({ tex2jax: { inlineMath: [['$','$'], ['\\(','\\)']], displayMath: [['$$','$$']], processEscapes: true, processEnvironments: true, skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'], TeX: { equationNumbers: { autoNumber: "AMS" }, extensions: ["AMSmath.js", "AMSsymbols.js"] } } }); MathJax.Hub.Queue(function() { // Fix tags after MathJax finishes running. This is a // hack to overcome a shortcoming of Markdown. Discussion at // https://github.com/mojombo/jekyll/issu...| hyperparameter.space
There’s a famous line written by legend Marc Andreessen that summarizes the vast power of growth and disruption that commoditized computation has come to have: “Software is eating the world”. Earlier in the year, Jensen Huang from Nvidia ominously turned the phrase on its head: “Software is eating the world, but AI is going to eat software”. In many ways, I think this prophecy will indeed come to pass. Current software has become so pervasive because we have tools that translate tas...| hyperparameter.space
I know it’s a weird way to start a blog with a negative, but there was a wave of discussion in the last few days that I think serves as a good hook for some topics on which I’ve been thinking recently. It all started with a post in the Simply Stats blog by Jeff Leek on the caveats of using deep learning in the small sample size regime. In sum, he argues that when the sample size is small (which happens a lot in the bio domain), linear models with few parameters perform better than deep ne...| hyperparameter.space