Skip to main content

Dataset and AI bias

To work with deep learning model in computer vision tasks it is necessary to have huge amounts of data in order to train a model from the ground up. As this datasets normally are human labeled and human curated, to grow them takes a big amount of effort and costs. One of the problems is that we do not always have a good income of proportionally sampled datasets, then we end up with some types of biases.

Unequal class balance

One form of bias that we can normally find in datasets is to have unbalanced classes. Meaning that we will end up with a model that probably will predict more to one class than another.

Solutions to this problem:

  • Re-sample the dataset keeping the classes balanced.
  • Repeat the samples of the classes that are scarce so that you can balance the dataset.

Domain shift/drift

When the dataset has an acquisition of one particular form and then the trained model is used in another scenario and then the model ends up not generalising so well.

Other ideas and questions

Patrick hebron presented in deeplearning.ai one vision of exploring the latent space this technic, I think, can help in the exploration of what types of associations the model is doing, and then we can give hints in the direction that we want to change the model.

Links

Introduction to Bias in AI

AI Access: Integrating Design and Technical Innovation in AI-First Products

Biases in AI Systems - A survey for practitioners - Ramya Srinivasan and Ajay Chander