Alibaba Group: Alibaba improves online ad performance by 45% using reinforcement learning to optimise real time bidding | Case Study

Context

"The majority of online display ads are served through realtime bidding (RTB) — each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time."

The Project

"Our main goal is to derive the optimal bidding policy in a reinforcement learning fashion. For most performance-driven campaigns, the optimization target is to maximize the user responses on the displayed ads if the bid leads to auction winning." According to the research paper, the bidding decision process was treated as a "reinforcement learning problem, where the state space is represented by the auction information and the campaign’s real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks."

AI Usage

"With an MDP formulation, the state transition and reward function are captured via modeling the auction competition and user click, respectively. The optimal bidding policy is then derived using dynamic programming. Furthermore, to deal with the large-scale auction volume and campaign budget, we proposed neural network models to fit the differential of the values between two consecutive states".

Data

"Two real-world datasets are used in our experimental study, namely iPinYou and YOYI. iPinYou is one of the mainstream RTB ad companies in China. The whole dataset comprises 19.5M impressions, 14.79K clicks and 16.0K CNY expense on 9 different campaigns over 10 days in 2013. We follow [31] for splitting the train/test sets and feature engineering. YOYI is a leading RTB company focusing on multi-device display advertising in China. YOYI dataset comprises 441.7M impressions, 416.9K clicks and 319.5K CNY expense during 8 days in Jan. 2016. The first 7 days are set as the training data while the last day is set as the test data."

Results

In our empirical study, the proposed solution has achieved 16.7% and 7.4% performance gains against the state-of-the-art methods on two large-scale real-world datasets. In addition, our proposed system has been deployed into a commercial RTB platform. We have performed an online A/B testing, where a 44.7% improvement in click performance was observed against a most widely used method in the industry.

Related Use Cases

Use Case

Optimise aggregate marketing mix and marketing spend

→

←Back to Case Studies