Fair & Interpretable Representation Learning
Understandable neural networks for fair representation learning.
Representation learning algorithms based on neural networks are being employed extensively in information retrieval and data mining applications. The social impact of what the general public refers to as āAIā is now a topic of much discussion, with regulators in the EU even putting forward legal proposals which would require practitioners to ā[ā¦] minimise the risk of unfair biases embedded in the model [ā¦]ā.
The concern is that models trained on biased data might then learn those biases, therefore perpetuating historical discrimination against certain groups of individuals.
In this situation, one needs to be concerned with the fairness of a neural network, i.e. whether it is relying on sensitive, law-protected information such as ethnicity or gender to undertake its decisions. One possible approach is to remove information about the sensitive attribute from the modelās internal representations. These technique are commonly referred to as āfair representation learningā. These methodologies learn a projection \(f: \mathcal{X} \to \mathcal{Z}\) into a latent space where it can be shown that the information about $s$ is minimal.
However, one issue in the area of fair representation learning is interpretability. The projection into a latent space makes it hard to investigate why the decisions have been undertaken. This is in open contradiction with recent EU legislation, which calls for a right to an explanation for individuals which are subject to automatic decision systems (General Data Protection Regulation (GDPR), Recital 71).
In this context, DNNs that are fair might still not be transparent enough to be applied in real-world scenarios.
Our current proposal is to employ a custom neural architecture which performs feature corrections for fairness.
If you are interested in working in this topic, I have several possible avenues of research starting from this idea.
- Extending the architecture so that it may handle categorical and ordinal features naturally.
- Limit the maximum amount of correction for each feature as a parameter which can be chosen by the user.
- Investigate the connections to the counterfactual fairness literature.
Some of the skills you will develop and knowledge you will gain by working on this topic:
- Tensorflow/Pytorch fundamentals
- Experimental tracking
- Neural architectures for domain adaptation and fairness
Please contact me via email if interested: mcerrato at uni dash mainz dot de.