Smart Index: A Hybrid Approach that Unifies Rule-based and Model-based Retrieval in Display Advertising System at JD.com

Published in Under Review, 2022

Recommended citation: Jianyu Su, et al.

Abstract

The display advertising system at JD.com adopts a widely used multi-stage cascade architecture that comprises retrieval, ranking, etc. The retrieval system is responsible for targeting relevant ads from tens of millions ads while the rest of the system concerns about business values (eg, CTR, eCPM) of those ads. The separation in objectives between the retrieval system and its subsequent counterparts often leads to a performance loss of the whole display advertising system. Regardless of the targeting method that is applied in the matching, the retrieval system is constantly experiencing the issue of truncating ads as it is restrained from applying targeting methods on every ad due to the latency limitation. Truncation of ads of interest (ads possess high CTR, eCPM, etc.) not only hurts advertisers’ user experience but also brings revenue losses for the platform. As such, we rethink the matching process and design a hybrid retrieval engine that balances a delicate trade-off between the inclusion of ads of interest and system limitations. The basic idea is to bridge the objective gap between retrieval and ranking, thereby preventing ads of the ranking system’s interests from being truncated at retrieval. Specifically, a learning-based module, which is trained with the objective of the ranking, is adopted to prioritize ads of interest, followed by a boolean expression match to conduct targeting. The advantage of such design is two-fold: (i) Introduction of the learning-based module allows the retrieval system to cooperate with its subsequent modules in an End-to-End fashion; (ii) Prioritization preserves ads of interests, achieving near truncation-free retrieval and attenuating the loss caused by truncation in the face of latency limitation. To cope with the abysmal latency resulted from the introduction of a learning-based relevance module, we exploit approximate nearest neighbor techniques to achieve efficient identification and retrieval of ads in the retrieval step. The newly designed retrieval system, entitled Smart Index (SI), is approved to be effective in our evaluation and has been deployed in the display advertising system at JD.com. This paper is devoted to elaborating on various aspects of the design and launching of the SI system.