Deploying Contender: Early Lessons in Data, Measurement, and Testing of Multiple Call Flow Decisions

David Suendermann, Jackson Liscombe, Jonathan Bloom, Grace Li, and Roberto Pieraccini

Keywords

Contender, (commercial) spoken dialog systems, optimization, production deployment

Abstract

In a recent publication [1], we laid out the mathematical foundations for an optimization technique—Contender—applicable to commercially deployed spoken dialog systems similar to what the research community would refer to as a light version of reinforcement learning. In particular, we showed how Contender respects the notion of statistical significance and outperforms the common practice in deployed systems to collect data until results appear reliable and then to draw final conclusions. While [1] was a somewhat theoretical paper focusing on the proofs of above statements and the derivation of the optimization algorithm, the present work reports on lessons we learned from live deployments in production systems. Altogether, seven Contenders were installed in five different commercial dialog systems processing 2.9 million calls over a period of three to seven months. We have found that, depending on the location of the Contender in the application’s call flow and the performance difference of the competing alternatives, potentially large numbers of calls (hundreds of thousands) need to be processed to determine a winner. Furthermore, we have seen that Contender self-adapts in the event that a previously under-performing option becomes the preferred option.

Important Links:

Go Back