low severitycleanlab.Datalab, cleanlab.classification.CleanLearning
Data quality/label quality scores do not behave as expected (e.g. not summing to 1, not probability-like, absolute values off); potentially flagging wrong % of data as issues.
Root cause
No root cause identified; likely user expectation mismatch. Scores estimate relative issue likelihood (0=likely bad, 1=likely good), provably accurate per theory but not probability-calibrated like ML predictions. Rescalings in releases improve readability, not fix calibration bugs.
cleanlabdatalablabel quality scorecalibrationrescale