
KAN or MLP: A Fairer Comparison
๐ Abstract
This paper provides a comprehensive comparison of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptrons (MLP) across various tasks, including machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation. The key findings are:
- Under the same number of parameters or FLOPs, KAN outperforms MLP only in symbolic formula representation tasks, while MLP generally outperforms KAN in other tasks.
- KAN's advantage in symbolic formula representation mainly stems from its use of the B-spline activation function. When B-spline is applied to MLP, its performance in symbolic formula representation significantly improves, surpassing or matching that of KAN.
- However, in other tasks where MLP already excels over KAN, the B-spline activation does not substantially enhance MLP's performance.
- KAN's forgetting issue is more severe than that of MLP in a standard class-incremental continual learning setting, contrary to the findings reported in the original KAN paper.
๐ Q&A
[01] Formulation of KAN and MLP
1. What are the key differences between the formulation of KAN and MLP? The key differences are:
- Activation functions: MLP uses fixed activation functions like ReLU or GELU, while KAN uses learnable B-spline activation functions that vary for each input element.
- Order of linear and non-linear operations: MLP performs linear transformation followed by non-linear activation, while KAN performs non-linear activation first followed by linear transformation.
2. How are the number of parameters and FLOPs calculated for KAN and MLP? The formulas for calculating the number of parameters and FLOPs for KAN and MLP are derived in Sections 3 and 4 of the paper. The key differences are:
- KAN has additional learnable parameters for the B-spline control points, shortcut weights, and B-spline weights.
- KAN has additional FLOPs for the B-spline computations using the De Boor-Cox algorithm.
[02] Performance Comparison
1. How do KAN and MLP perform on different task domains?
- Machine Learning: MLP generally outperforms KAN on 6 out of 8 datasets.
- Computer Vision: MLP consistently outperforms KAN with the same number of parameters or FLOPs.
- Audio and Natural Language: MLP outperforms KAN on audio datasets and most text classification datasets.
- Symbolic Formula Representation: KAN outperforms MLP on 7 out of 8 datasets when controlling for the number of parameters, but the advantage is reduced when controlling for FLOPs.
2. What insights do the architecture ablation studies provide?
- For computer vision tasks, using spline activation functions in MLP provides little improvement, but increases computational cost.
- For machine learning tasks, replacing MLP's activation with spline functions significantly boosts its performance, making it comparable to KAN.
- Applying spline activation functions in MLP can match or surpass KAN's performance on symbolic formula representation tasks.
[03] Continual Learning
1. How do KAN and MLP perform in a class-incremental continual learning setting? Contrary to the findings in the original KAN paper, the authors found that under a standard class-incremental continual learning setup on MNIST, KAN exhibited more severe forgetting issues compared to MLP. MLP was able to retain acceptable accuracy on the first two tasks, while KAN's accuracy on the first two tasks dropped to 0 after training on the third task.
</output_format>