Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.| ACL Anthology
Highlights the desire to replace tokenization with a general method that better leverages compute and data. We'll see tokenization's fragility and review the Byte Latent Transformer arch.| ⛰️ lucalp