Match Plan Generation in Web Search with Parameterized Action Reinforcement Learning

Linfeng Zhao, Ziyan Luo, Wei Cheng, Sihao Chen, Qi Chen, Hui Xue, Haidong Wang, Chuanjie Liu, Mao Yang, Lintao Zhang

November 2020

Abstract

To achieve good result quality and short query response time, search engines use specific match plans on Inverted Index to help retrieve a small set of relevant documents from billions of web pages. A match plan is composed of a sequence of match rules, which contain discrete match rule types and continuous stopping quotas. Currently, match plans are manually designed by experts according to their several years' experience, which encounters difficulty in dealing with heterogeneous queries and varying data distribution. In this work, we formulate the match plan generation as a Partially Observable Markov Decision Process (POMDP) with a parameterized action space, and propose a novel reinforcement learning algorithm Parameterized Action Soft Actor-Critic (PASAC) to effectively enhance the exploration in both spaces. In our scene, we also discover a skew prioritizing issue of the original Prioritized Experience Replay (PER) and introduce Stratified Prioritized Experience Replay (SPER) to address it. We are the first group to generalize this task for all queries as a learning problem with zero prior knowledge and successfully apply deep reinforcement learning in the real web search environment. Our approach greatly outperforms the well-designed production match plans by over 70% reduction of index block accesses with the quality of documents almost unchanged, and 9% reduction of query response time even with model inference cost. Our method also beats the baselines on some open-source benchmarks.

Type

Conference paper

Publication

In The Web Conference (WWW) 2021

Reinforcement Learning Information Retrieval

Match Plan Generation in Web Search with Parameterized Action Reinforcement Learning

Abstract

Linfeng Zhao

CS Ph.D. Student

Related