Intelligent Tracking Prevention

Note: Read about improvements to this technology in recent blog posts about Intelligent Tracking Prevention, and the Storage Access API.

The success of the web as a platform relies on user trust. Many users feel that trust is broken when they are being tracked and privacy-sensitive data about their web activity is acquired for purposes that they never agreed to.

WebKit has long included features to reduce tracking. From the very beginning, we’ve defaulted to blocking third-party cookies. Now, we’re building on that. Intelligent Tracking Prevention is a new WebKit feature that reduces cross-site tracking by further limiting cookies and other website data.

What Is Cross-Site Tracking and Third-Party Cookies?

Websites can fetch resources such as images and scripts from domains other than their own. This is referred to as cross-origin or cross-site loading, and is a powerful feature of the web. However, such loading also enables cross-site tracking of users.

Imagine a user who first browses example-products.com for a new gadget and later browses example-recipies.com for dinner ideas. If both these sites load resources from example-tracker.com and example-tracker.com has a cookie stored in the user’s browser, the owner of example-tracker.com has the ability to know that the user visited both the product website and the recipe website, what they did on those sites, what kind of web browser was used, et cetera. This is what’s called cross-site tracking and the cookie used by example-tracker.com is called a third-party cookie. In our testing we found popular websites with over 70 such trackers, all silently collecting data on users.

How Does Intelligent Tracking Prevention Work?

Intelligent Tracking Prevention collects statistics on resource loads as well as user interactions such as taps, clicks, and text entries. The statistics are put into buckets per top privately-controlled domain or TLD+1.

Machine Learning Classifier

A machine learning model is used to classify which top privately-controlled domains have the ability to track the user cross-site, based on the collected statistics. Out of the various statistics collected, three vectors turned out to have strong signal for classification based on current tracking practices: subresource under number of unique domains, sub frame under number of unique domains, and number of unique domains redirected to. All data collection and classification happens on-device.

Actions Taken After Classification

Let’s say Intelligent Tracking Prevention classifies example.com as having the ability to track the user cross-site. What happens from that point?

If the user has not interacted with example.com in the last 30 days, example.com website data and cookies are immediately purged and continue to be purged if new data is added.

However, if the user interacts with example.com as the top domain, often referred to as a first-party domain, Intelligent Tracking Prevention considers it a signal that the user is interested in the website and temporarily adjusts its behavior as depicted in this timeline:

Intelligent Tracking Prevention Timeline

If the user interacted with example.com the last 24 hours, its cookies will be available when example.com is a third-party. This allows for “Sign in with my X account on Y” login scenarios.

This means users only have long-term persistent cookies and website data from the sites they actually interact with and tracking data is removed proactively as they browse the web.

Partitioned Cookies

If the user interacted with example.com the last 30 days but not the last 24 hours, example.com gets to keep its cookies but they will be partitioned. Partitioned means third-parties get unique, isolated storage per top privately-controlled domain or TLD+1, e.g. account.example.com and www.example.com share the partition example.com.

This makes sure users stay logged in even if they only visit a site occasionally while restricting the use of cookies for cross-site tracking. Note that WebKit already partitions caches and HTML5 storage for all third-party domains.

What Does This Mean For Web Developers?

With Intelligent Tracking Prevention, WebKit strikes a balance between user privacy and websites’ need for on-device storage. That said, we are aware that this feature may create challenges for legitimate website storage, i.e. storage not intended for cross-site tracking. Please let us know of such cases and we will try to help (contact info at the end of this blog post).

To get you started, here are some some guidelines.

Storage Requires User Interaction

Check to make sure that you aren’t relying on cookies and other storage to persist if the user does not interact directly with your website on a regular basis. Requiring user interaction covers most legitimate uses of client-side storage. It also provides better transparency and gives users more control over who gets to store data on their devices.

Web Analytics

Make sure to configure your web analytics to not rely on third-party cookies from domains that don’t get user interaction. A popular way to do cross-site analytics for a family of sites is to use link decoration, i.e. pad links with information that needs to be carried across origins and navigations.

Ad Attribution

We recommend server-side storage for attribution of ad impressions on your website. Link decoration can be used to pass on attribution information in navigations.

Managing Single Sign-On

If you run a single sign-on system with a centralized session, the user needs to interact with the domain that controls the session. Otherwise you run the risk of Intelligent Tracking Prevention treating your session controller domain as a tracker.

Intelligent Tracking Prevention and Single Sign-On

Imagine a scenario as depicted above; a central session at account.com used for the three sites SiteA.com, SiteB.com, and SiteC.com. Session information can be propagated from account.com to the dependents during account.com’s 24-hour exemption from cookie partitioning. From that point on the sites must maintain sessions without account.com cookies, or they must re-authenticate daily with a brief stop at account.com to acquire new user interaction. You can grant the sites the ability to propagate session information between themselves through navigations and new cookies set in HTTP responses. Single sign-out needs to invalidate the account.com session on the server.

Feedback and Bug Reports

Please report bugs through bugs.webkit.org and send feedback to our evangelist Jon Davis if you are a web developer or a web user and Intelligent Tracking Prevention isn’t working as intended for your website. If you have technical questions on how the feature works you can find me on Twitter @johnwilander.