Optimal Group Fair Classifiers from Linear Post-Processing
๐ Abstract
The article proposes a post-processing algorithm for fair classification that mitigates model bias under a unified family of group fairness criteria, including statistical parity, equal opportunity, and equalized odds. It achieves fairness by re-calibrating the output score of the given base model with a "fairness cost" - a linear combination of the (predicted) group memberships. The algorithm is based on a representation result showing that the optimal fair classifier can be expressed as a linear post-processing of the loss function and the group predictor, derived via using these as sufficient statistics to reformulate the fair classification problem as a linear program. The parameters of the post-processor are estimated by solving the empirical LP. Experiments on benchmark datasets show the efficiency and effectiveness of the algorithm at reducing disparity compared to existing algorithms, including in-processing, especially on larger problems.
๐ Q&A
[01] Optimal Group Fair Classifiers from Linear Post-Processing
1. What is the key idea behind the proposed post-processing algorithm? The key idea is to represent the optimal fair classifier as a linear post-processing of the loss function and the group membership predictor, which are used as sufficient statistics to reformulate the fair classification problem as a linear program. The parameters of the post-processor are then estimated by solving the empirical LP.
2. How does the algorithm achieve fairness? The algorithm achieves fairness by re-calibrating the output score of the given base model with a "fairness cost" - a linear combination of the (predicted) group memberships. This effectively discounts the loss of individuals belonging to groups that otherwise have a low rate of being classified as the target class, and increases the loss for groups with higher rates.
3. What are the key advantages of the post-processing approach? The post-processing approach is flexible and lightweight. It is model-agnostic and applied post hoc with small computation overhead, making it more appealing than pre-processing and in-processing in scenarios where the fairness criteria are to be determined after the model is trained and evaluated, or retraining is expensive or not possible.
4. What are the limitations of existing post-processing algorithms addressed by this work? Existing post-processing algorithms are often stylized to specific fairness criteria and problem settings. This work proposes a more general framework that can handle multi-class problems and arbitrary classification loss functions, under both attribute-aware and attribute-blind settings.
[02] Linear Program Reformulation
1. How does the authors reformulate the optimal fair classification problem as a linear program? The authors represent the randomized classifier as a lookup table/row-stochastic matrix, and express the fairness constraints linearly in terms of the loss function and the group membership predictor. This allows them to reformulate the problem as a linear program.
2. What is the role of the variable introduced in the linear program? The variable is introduced to reduce the number of constraints from quadratic to linear in the number of groups. It represents the optimal proportion of the population to be assigned a certain class under the parity constraint, or the centroid of these quantities across groups.
3. How does the linear program reformulation compare to prior work? Compared to prior work, this framework can handle multi-class problems and arbitrary classification loss functions, whereas previous approaches were limited to binary classification or specific fairness criteria. The linear program formulation also allows for efficient optimization of the fair classifier parameters, unlike prior methods that relied on grid search or heuristics.
[03] Representation and Optimality
1. What is the key result regarding the representation of the optimal fair classifier? The authors show that the optimal fair classifier can be represented by a linear classifier on features computed from the loss function and the group membership predictor, provided that these are Bayes optimal.
2. How does this representation result lead to an interpretation of the fair classification process? The representation result shows that the optimal fair classifier can be obtained by re-calibrating the loss function with an additive "fairness cost" - a weighted sum of the group membership probabilities. This can be seen as discounting the loss for underrepresented groups and increasing it for overrepresented groups.
3. What assumptions are required for the representation result, and how are they addressed? The key assumption is the uniqueness of the best class for each input. The authors show that this assumption can be satisfied by randomly perturbing the loss function, which injects randomness into the framework to handle cases where the optimal fair classifier requires randomization.
4. What are the optimality and fairness guarantees provided for the proposed algorithm? The authors provide bounds on the sub-optimality of the classifier obtained from the post-processing algorithm, as well as its violation of the fairness constraints. These bounds depend on the estimation error from finite samples and the deviation of the base model's predictions from the Bayes optimal.
[04] Experiments
1. What are the key findings from the experimental evaluation? The experiments show that the proposed post-processing algorithm outperforms existing post-processing and in-processing algorithms, especially on larger problem instances, in terms of the accuracy-fairness trade-off. The algorithm is also shown to be more effective at achieving high levels of fairness when the base model's predictions are well-calibrated.
2. How does the performance of the post-processing algorithm compare to in-processing approaches? While in-processing algorithms can theoretically achieve better performance under optimization oracles, the experiments demonstrate that they can be challenging to optimize in practice, especially on larger datasets. The post-processing approach is more efficient and effective in the experiments.
3. What is the impact of calibrating the group membership predictor on the algorithm's performance? Calibrating the group membership predictor is shown to allow the post-processing algorithm to achieve higher levels of fairness, by ensuring that the fairness constraints are satisfied more precisely on the population level.
4. What are the limitations and future directions discussed in the paper? The paper discusses the issue of threshold-invariance, where the fairness properties may not be preserved when changing the classification threshold after deployment. It also notes the challenge of improving the interpretability of the fair classifiers produced by parity-based algorithms.