Multi-layer perceptrons (MLPs) stand as the bedrock of contemporary deep learning architectures, serving as indispensable components in various machine learning applications. Leveraging the expressive power conferred by the universal approximation theorem, MLPs excel in approximating nonlinear functions, embodying a default choice for many tasks. However, despite their widespread adoption, MLPs harbor notable limitations. They often