Standard software testing works just fine when you know your outputs. What happens when success means “at least 95% accuracy at least 90% of the time”? Worse still, what happens when success means “This group liked the analysis it gave, so it is fine for stuff that looks like what came in yesterday”? This presentation shares some experiences building FINRA’s first frameworks, guidance, and case studies on monitoring machine learning models in production. Upon building context around how machine learning models are evaluated, the talk discusses experience developing and applying some of the most promising approaches to monitoring model predictions as well as data quality.
Video producer: https://www.associationforsoftwaretesting.org/