GELU,全称为Gaussian Error Linear Unit,也算是RELU的变种,是一个非初等函数形式的激活函数。它由论文《Gaussian Error Linear Units (G...| spaces.ac.cn
The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-designed alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains. In this work, we propose to leverage automatic search techniques to discover new activation functions. Using a combination of exhaustive a...| papers.cool
今天水一点轻松的内容,它基于笔者这两天意识到的一个恒等式。这个恒等式实际上很简单,但初看之下会有点意料之外的感觉,所以来记录一下。基本结果我们知道$\newcommand{relu}{\math...| spaces.ac.cn
事实上,除了写博客内容,在这几年里,笔者是花了相当一部分时间来做科学空间的“表面功夫”,为此还专门学了一点php、css和js。虽然不敢说精益求精,但总体来说网站的浏览体验应该比前几年要好得多。...| spaces.ac.cn