In data science, bias refers to systematic errors in data or in the modeling process that can lead to inaccurate or unfair predictions or conclusions. Bias can arise from various sources, including the data collection process, the modeling assumptions, or the algorithm used.
There are different types of bias in data science, including:
Sampling bias: This occurs when the sample used to build a model is not representative of the population it is meant to generalize to. This can lead to inaccurate or misleading results.
Measurement bias: This occurs when the way data is collected or measured systematically produces inaccurate or incomplete information. For example, a survey question that is phrased in a way that leads to biased responses.
Model bias: This occurs when the modeling assumptions or the algorithm used systematically produces inaccurate or unfair results. For example, a model that unfairly discriminates against a certain group of people.
Confirmation bias: This occurs when researchers or analysts unconsciously favor data that supports their pre-existing beliefs or hypotheses, and ignore data that contradicts them.
Status:: #wiki/notes/mature
Plantations:: Data Science Metrics - 20230221095821
References:: Mastering Machine Learning with scikit-learn