LLM-Select: Feature Selection with Large Language Models
๐ Abstract
The paper demonstrates that large language models (LLMs) are capable of selecting the most predictive features for supervised learning tasks, without accessing the downstream training data. The authors propose three LLM-based feature selection methods - selecting features based on LLM-generated importance scores, selecting features based on an LLM-generated ranking, and sequentially selecting features in a dialogue with an LLM. They find that these methods can achieve performance rivaling standard data-driven feature selection techniques, even with zero-shot prompting and no additional context about the dataset. The authors suggest that LLMs may be useful not only for selecting the best features for training, but also for deciding which features to collect in the first place, especially in domains like healthcare where data collection is costly.
๐ Q&A
[01] LLM-based Feature Selection Methods
1. What are the three LLM-based feature selection methods proposed in the paper? The three LLM-based feature selection methods proposed are:
- Selecting features based on LLM-generated importance scores (LLM-Score)
- Selecting features based on an LLM-generated ranking (LLM-Rank)
- Sequentially selecting features in a dialogue with an LLM (LLM-Seq)
2. How do these methods work?
- LLM-Score: The LLM is prompted to generate a numerical importance score between 0 and 1 for each input feature, and the top-scoring features are selected.
- LLM-Rank: The LLM is prompted to provide a ranking of all input features, and the top-ranked features are selected.
- LLM-Seq: The LLM is iteratively prompted to select the candidate feature that would maximally improve cross-validation performance, until the desired number of features are selected.
3. What are the key hypotheses behind each method?
- LLM-Score: The importance scores capture the marginal relevance of each feature for predicting the target, based on the knowledge encoded in the LLM.
- LLM-Rank: The rank of each feature reflects its relative importance for predicting the target, compared to the other features.
- LLM-Seq: Selecting features sequentially encourages choosing features that are maximally informative with respect to the features already selected.
[02] Experimental Evaluation
1. What datasets were used in the experiments? The experiments were conducted on 14 real-world datasets, including 7 binary classification tasks and 7 regression tasks, from various domains such as healthcare, criminal justice, and real estate.
2. How were the LLM-based feature selection methods evaluated? The effectiveness of each feature selection method was evaluated by measuring the test performance of a downstream prediction model (logistic/linear regression) as the number of selected features was varied from 10% to 100%. The authors compared the LLM-based methods against several traditional data-driven feature selection baselines.
3. What were the key findings from the experiments?
- LLM-based feature selection methods, especially with larger models like GPT-4, can achieve performance competitive with data-driven baselines, even without accessing the downstream training data.
- The three LLM-based methods (LLM-Score, LLM-Rank, LLM-Seq) generally perform similarly well, with the larger models exhibiting more consistent performance across the methods.
- Zero-shot prompting without dataset-specific context can be a sufficiently strong baseline for LLM-based feature selection.
- The feature importance scores generated by larger LLMs tend to exhibit higher correlation with standard feature importance metrics like Shapley values.
[03] Implications and Future Work
1. What are the potential benefits of using LLMs for feature selection? The authors suggest that LLMs may be useful not only for selecting the best features after data collection, but also for deciding what features to collect in the first place, especially in domains like healthcare where obtaining high-quality data is expensive.
2. What are some limitations and future research directions?
- LLMs may exhibit biases inherited from their pretraining data, potentially leading to biased feature selection and unfair downstream performance. Incorporating notions of group fairness into LLM-based feature selection is an important future direction.
- Combining LLM-driven feature selection with data-driven methods or using it in a human-in-the-loop setup may be a more reliable approach for mitigating bias concerns, especially in safety-critical domains.