# State-Free Inference of State-Space Models: The Transfer Function Approach

## ๐ Abstract

The paper proposes a state-free inference algorithm for state-space models (SSMs) using a rational transfer function (RTF) representation. The key contributions are:

- RTF provides a complete representation of linear time-invariant SSMs, including those with dense matrices, unlike previous diagonal or low-rank SSM representations.
- The proposed parallel inference algorithm for RTF has state-free space and time complexities, in contrast to previous scan-based or Cauchy/Vandermonde-based algorithms.
- Experiments show RTF achieves state-of-the-art performance on the Long Range Arena benchmark and improved perplexity on language modeling compared to other attention-free approaches.

## ๐ Q&A

### [01] State-Free Inference of State-Space Models

**1. What are the key limitations of current state-space models (SSMs) that this paper aims to address?**

- Many current SSM algorithms employ a modal (diagonal) representation, which can limit the model's expressive capacity.
- Scan-based parallel inference algorithms incur considerable memory costs at large state sizes.
- Algorithms like S4 and S4D that use fast Cauchy and Vandermonde matrix-vector products scale suboptimally.

**2. How does the proposed rational transfer function (RTF) representation address these limitations?**

- RTF encompasses the functional space of any linear time-invariant SSM, including those with dense matrices, unlike diagonal SSMs.
- The proposed parallel inference algorithm for RTF has state-free space and time complexities, avoiding the memory and computational bottlenecks of previous approaches.
- RTF solely relies on the widely optimized Fast Fourier Transform (FFT) algorithm for efficient parallel inference.

**3. What are the key properties of the RTF representation?**

- RTF is coordinate invariant, meaning that different state-space realizations of the same system will have the same RTF parameters.
- Any impulse response that can be represented using dense matrices can also be described using a rational transfer function with fewer parameters.
- The partial fraction decomposition of RTF provides insights into the expressivity of different SSM representations.

**4. How does the proposed parallel inference algorithm for RTF work?**

- The algorithm computes the truncated transfer function spectrum directly using the FFT, without the need to materialize the state.
- This results in state-free space and time complexities of O(N) and O(N log N) respectively, in contrast to the state-multiplicative or state-additive complexities of previous approaches.

**5. What is the benefit of the fast companion recurrence for RTF?**

- The companion form of the RTF state-space realization allows for a fast recurrent update with only O(1) time and space complexity per time step.
- This enables efficient autoregressive inference, which is important for applications like language modeling.

**6. How does RTF ensure stable dynamics?**

- Initializing the RTF parameters using a "zero initialization" scheme, which positions the poles as far as possible from violating the Montel stability constraint, improves training stability compared to other initialization methods.

### [02] Experimental Results

**1. How does the memory and latency profile of RTF compare to other SSMs like S5?**

- The memory consumption of RTF scales linearly with sequence length, unlike S5 which exhibits state-multiplicative memory scaling.
- RTF also maintains consistent inference latency across different state sizes, while S4D and S4 experience slower speeds at higher state sizes.

**2. How does RTF perform on the Long Range Arena (LRA) benchmark compared to other SSMs and attention-based models?**

- RTF achieves state-of-the-art performance on the Retrieval task and the highest average score among attention-free approaches on the LRA benchmark.
- RTF's state-free parallel inference algorithm allows it to scale to larger state sizes without impacting memory or training speed, unlike other SSMs.

**3. How does RTF perform on synthetic memorization tasks compared to S4?**

- At higher state sizes, RTF is able to more accurately copy and delay sequences compared to S4, which struggles on the Copying task.
- The results suggest RTF has stronger memorization capabilities than S4, especially as the state size is increased.

**4. How does incorporating RTF into the Hyena language model affect its performance?**

- Replacing the Hyena filters with RTF, in a model called Hyena-RTF, improves perplexity on the WikiText103 dataset compared to the original Hyena baseline.
- This demonstrates the potential of RTF to enhance the language modeling capabilities of convolutional sequence models like Hyena.