Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation
๐ Abstract
The article examines the efficiency and scalability issues of graph convolution networks (GCNs) in training recommender systems (RecSys), and introduces an innovative alternative called the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). The key findings are:
- The benefits of GCNs are more pronounced during testing rather than training.
- LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations.
- This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue.
- Experiments across several real-world datasets demonstrate that LightGODE outperforms GCN-based models in terms of efficiency and effectiveness, and significantly mitigates the embedding discrepancy.
๐ Q&A
[01] Investigation of the Graph Convolution for Recommendation
1. Questions related to the content of the section?
- What are the key findings from the preliminary experiments on the Amazon-Beauty and Amazon-Toys-and-Games datasets?
- How do the authors analyze the alignment force from a depth-first search (DFS) perspective to explain the unexpectedly superior performance of the MF model enhanced by post-training graph convolution?
- What are the trade-offs identified in designing post-training graph convolution, and how do the authors address the embedding discrepancy issue?
The preliminary experiments reveal that the benefits of graph convolution are more pronounced during the testing phase rather than the training phase. The authors find that the MF model with post-training graph convolution (MF-conv) can achieve comparable performance to the LightGCN model, highlighting that graph convolution may not be necessary during the training stage.
The authors analyze the alignment force from a DFS perspective and conclude that the alignment force applied on surrounding neighbors of positive pairs using graph convolution is the degree-weighted version of the direct alignment force applied on two clusters of nodes. This suggests that the MF training effectively achieves effects akin to GCN training that adopts information aggregation based on breadth-first search (BFS).
The authors identify the trade-off between integrating higher-order information and maintaining the original embedding distribution. They find that increasing the number of graph convolution layers significantly enlarges the difference between embeddings before and after convolution, termed the Embedding Discrepancy. This suggests that while additional convolution layers introduce more high-order information, they also risk perturbing well-trained embeddings.
[02] Light Post-Training Graph-ODE for Efficient Recommendation
1. What are the key components of the LightGODE framework? The LightGODE framework consists of three main components:
- Pre-training user/item embeddings: The authors train the randomly initialized ID embeddings using only the alignment and uniformity losses, without the need for graph convolution.
- Discrete GCN with self-loop: The authors devise a non-parametric graph convolution with a self-loop operation, which highlights the importance of the node representations of preceding layers in each message-passing process.
- Continuous graph-ODE: The authors propose a continuous graph ordinary-differential-equation (ODE) derived from the discrete non-parametric graph convolution, which allows for dynamic modeling of node representations and achieving an optimal trade-off between high-order information and embedding discrepancy.
2. How does LightGODE address the efficiency and scalability issues of GCN-based recommendation models? LightGODE significantly improves the efficiency and scalability of graph-based recommendation models in several ways:
- It skips the computation-intensive graph construction and adjacency matrix normalization steps during training, making the training process as efficient as traditional MF models.
- It does not involve the layer-by-layer graph convolution during training, which is the most time-consuming operation in GCN-based models.
- The continuous graph-ODE formulation enables precise and fine-grained graph convolution to achieve the optimal trade-off between capturing high-order information and minimizing the embedding discrepancy, without the need for coarse-grained discrete convolution layers.
3. How does the time complexity of LightGODE compare to other GCN-based baselines? The time complexity analysis shows that LightGODE significantly outperforms the GCN-based baselines in terms of efficiency:
- LightGODE does not require the computation-intensive adjacency matrix normalization and graph convolution during training, unlike LightGCN and GraphAU.
- The loss computation in LightGODE is more efficient than the BPR loss used in LightGCN and the alignment/uniformity loss computations in GraphAU.
- The overall training time of LightGODE is much shorter than the GCN-based baselines, especially on the large-scale Gowalla dataset.