Aligning (medical) LLMs for (counterfactual) fairness

arXiv:2408.12055v1 Announcement Type: cross Abstract: Large language models (LLMs) have emerged as promising solutions for a variety of medical and clinical decision support applications. However, LLMs are often subject to different types of biases, which can lead to unfair treatment of individuals, worsening health disparities, and reduced trust in AI-augmented medical tools. To address this important issue, in this study we present a novel model alignment approach to align LLMs using a preference optimization method in a knowledge distillation framework. Before presenting our proposed method, we first use an evaluation framework to conduct a comprehensive empirical evaluation (the largest to our knowledge) to reveal the type and nature of biases existing in LLMs used for medical applications. We then propose a bias mitigation technique to reduce unfair patterns in LLM outcomes across different subgroups identified by protected attributes. We show that our mitigation method is effective in significantly reducing the observed biased patterns. Our code is publicly available at \url{https://github.com/healthylaife/FairAlignmentLLM}.