Digital display advertising Back in the day, advertising was not too efficient. But with digital advertising comes the opportunity to discover what works and what doesn’t via the data collected as users interact with online ads. Here, one will apply the acquired techniques to a large-scale real-world problem: optimizing an online advertising campaign. Several datasets are employed in the example. Unfortunately, only a few large datasets of this type are available to the public. The primary dataset in our example isn’t available for download, and even if it were, it would be too large for personal computing. One dataset that can be downloaded and used for non-commercial purposes is from the Kaggle Display Advertising Challenge (Criteo).
Display advertising Online advertising is delivered through a myriad of media. Display ads appear within web pages rendered in browsers, usually on personal computers or laptops. Because the rules for identifying users and the handling of internet cookies are different on mobile browsers, mobile ad technology relies on a different set of techniques and generates quite different historical data. Native ads, embedded in games and mobile apps, and pre-roll ads that precede online video content, are based on distinct delivery technologies and require analyses tailored to their unique processes. Our examples are limited to traditional display advertising. Much of the terminology of display advertising was inherited from the print advertising business. The websites on which ads can be purchased are known as publications, within which advertising space is characterized by size and format, or ad unit, and loca-tion within the site and page is referred to as placement. Each presentation of an ad is called an impression. Ads are sold in lots of 1,000 impressions, the price of which is known as CPM, (cost per thousand). When a user browses to a web page—say, xyz.com—it appears that the publisher of xyz.com delivers the entire page. In reality, the page contains placeholders for advertisements that are filled in by various advertisers through a complex network of intermediaries. Each web server that delivers ads maintains logs that include information about each impression, including the publisher, the internet address of the user, and information contained in internet cookies, where information about previous deliveries from the advertiser’s server may be stored.
Click is the target variable. One wants to predict the likelihood that impressions will result in clicks (sometimes called click-throughs or click-thrus). More specifically, given a specific user visiting a particular site, one’d like to know the probability that the user will click the advertisement. The are several choices in formulating the problem. One can try to predict the probability that a given user will click through, and one can try to predict the click-through rate (CTR) for each publisher that presents the ad. As is often the case, precisely what one models and the precise values one endeavours to predict will ultimately be driven by asking these questions: How will the prediction be used? In what manner will it be acted on? In this case, the advertiser has the option of blacklisting certain publications, so the advertiser’s primary concern is identifying the publications least likely to yield clicks. In recent years, real-time bidding technologies have been developed that enable advertisers to bid for individual impressions based on user and publication features provided by the bidding system, but our example advertiser hasn’t adopted real-time bidding yet. One may wonder at this point why the advertiser doesn’t just look at some historical data for all the publications and blacklist those with low CTRs. The problem is that when the overall CTR for a campaign is in the neighbourhood of 0.1%, the expected value of clicks for a publication with only a few impressions is zero. The absence of clicks doesn’t indicate a low CTR. Further, when we aggregate the best-performing, low-volume publications, we often observe above-average. One Is probably looking for a model that will enable him/her to predict publications’ performance without the benefit of a great deal of performance history. At first glance, it might seem like there are just impressions to be counted, clicks, and views for users, publishers, and operating systems. Maybe time of day or day of the week has some effect. But on further reflection, realizes that the domains a user visits are features that describe the user, and the users who visit a domain are features of the domain. Suddenly, one has a wealth of data to work with and a real-world opportunity to experience the curse of dimensionality—a phrase used to describe the tribulations of working in high-dimensional space. As one explores the data, he/she will see that a wealth of features can be, if not a curse, a mixed blessing. One may recognize the logic to apply here as the basis of recommenders, the systems that suggest movies on Netflix, products on Amazon, and restaurants on Yelp. The idea of characterizing users as collections of items, and items as collections of users, is the basis of collaborative filtering, in which users are clustered based on common item preferences, and items are clustered based on the affinities of common users. Of course, the motivation for recommenders is to present users with items they’re likely to purchase. The advertising problem is a variation; instead of many items, the same advertisement is presented in a wide variety of contexts: the publications. The driving principle is that the greatest likelihood of achieving user responses (clicks) will be on publications that are similar to those that have a history of achieving responses.