Discrete Semantic Tokenization for Deep CTR Prediction
🌈 Abstract
The paper introduces a new semantic-token paradigm for click-through rate (CTR) prediction models, which aims to efficiently incorporate item content information while maintaining time and space efficiency. The proposed approach, called User-Item Semantic Tokenization (UIST), converts user sequences and item content into discrete tokens, providing a substantial memory compression compared to existing embedding-based approaches.
🙋 Q&A
[01] User–Item Semantic Tokenization
1. What are the key components of the UIST framework?
- UIST comprises three main modules: two semantic tokenizers (for items and users) and a hierarchical mixture inference (HMI) module.
- The semantic tokenizers transform dense and high-dimensional item and user embeddings into discrete tokens, achieving a significant memory compression.
- The HMI module dynamically adjusts the significance of various levels of granularity for user-item interactions to enhance the integration of hierarchical item and user tokens.
2. How does the discrete semantic tokenization work?
- The semantic tokenization process has two stages:
- Semantic Representation: An autoencoder network is used to learn the contextual knowledge of the input sequence (item content or user behavior) and obtain a unified representation.
- Discrete Tokenization: The dense sequence representation is discretized into concise tokens using a residual quantization technique (RQ-VAE).
3. What is the purpose of the hierarchical mixture inference (HMI) module?
- The HMI module is designed to effectively utilize user-item pairs at different levels of granularity in the click-through rate prediction task.
- It analyzes the contribution of each user-item token pair by constructing coarse-to-fine item and user embeddings based on the hierarchical tokens and using a deep CTR model to predict click scores.
- The module then employs a linear layer to automatically weigh these scores and compute the final click probability.
[02] Experiments and Evaluation
1. How did the authors evaluate the effectiveness of UIST?
- The authors conducted offline experiments on a real-world news recommendation dataset (MIND), comparing UIST against three modern deep CTR models: DCN, DeepFM, and FinalMLP.
- They evaluated the recommendation effectiveness using AUC and nDCG metrics, and also measured the inference time (latency) of each baseline.
2. What were the key findings from the experiments?
- The content-based paradigm exhibited unacceptable latencies (over 60ms) for industrial scenarios.
- The single-layered ID-based and embedding-based approaches had similar latency, but the embedding-based approaches performed better due to the use of content-based item representation.
- The proposed IST (item-only semantic tokenization) and UIST achieved substantial memory compression (around 200 times) compared to other paradigms, while maintaining up to 99% (IST) and 98% (UIST) accuracy compared to the state-of-the-art embedding-based paradigm.
- The hierarchical mixture inference (HMI) module outperformed simpler aggregation mechanisms for dual tokens.
3. What are the key advantages of the semantic-based approach (UIST) compared to other paradigms?
- UIST provides a streamlined approach to integrating item content into deep CTR models, offering significant improvements in efficiency, particularly in industrial scenarios.
- The substantial memory compression achieved by UIST (around 200-fold) makes it a promising solution for applications that require both time and space efficiency, such as dataset compression.