The following is a guest blog post from Ran Canetti, a professor of Computer Science at Boston University and the Director of the Center for Reliable Information System and Cyber Security. At the Computing Community Consortium (CCC) we know that everyone is dealing with a lot in these unprecedented times. We are continuing to work on behalf of the computing research community to catalyze research, but we also want to provide ways to help the community. This blog is from a series of posts about ways computing researchers are using computing to adapt and help in these times. We hope you find something that may help you, either now or in the future.
In 1945, the atomic bomb brought a swift end to World War 2. It also changed our society forever, forcing us to reckon with the fact that the Human race is now capable of self-destruction and societal checks and balances are the only way to prevent a catastrophe. Will automated contact-tracing, that appears to be a key part of any plan for returning to productive life while containing the COVID-19 pandemic, turn out to be the tipping point in yet another race to deploy a potentially-destructive technology?
Ideally, the best way to contain the COVID-19 pandemic is to have a readily available infection test that returns immediate results. This way, people can repeatedly test themselves and go under quarantine as soon as they are sick. However, while testing capacity increased dramatically in a short time period, ubiquitous testing is still a remote prospect.
By early March 2020, it was evident that the few countries that managed to contain the spread of the pandemic did so using an aggressive system for tracing the movements and contacts of individuals that tested positive during the two-week period prior to testing, and then placing individuals that might have come in contact with an infectious person or location under immediate quarantine. However, the effectiveness of these contact tracing systems heavily relies on the existence of an infrastructure for surveillance and tracking the movements of individuals. (See e.g. [Reuters, New-Yorker] for some public accounts, including violations of privacy and autonomy).
This raised some immediate questions: Can effective contact tracing be done without pre-existing surveillance infrastructure? Can it be done automatically, thus reducing the administrative burden? Can it be done in a way that preserves the privacy, autonomy, and liberty of individuals? And, wait, what do individual privacy, autonomy and liberty even mean in this context?
I’ll return to this last question in a bit.
By mid-March, more public-facing automated contact tracing mechanisms have emerged, as a complement to the existing, surveillance-based ones, in Singapore and shortly after in Israel [TraceTogether,Hamagen]. Both mechanisms are based on voluntary participation via cell-phone apps, and both guarantee some level of privacy to users as long as they are not tested positive or collocated with a positive-tested user. In the Israeli scheme (which is GPS-based), once a user tests positive, large portions of her movements in the preceding two-week period are broadcast to all users. While a thin veil of privacy is provided (users are only referred to via pseudonyms, e.g. “patient 15’’), there is no guarantee against re-identification (see e.g. [NYT]). More fundamentally, this scheme is still based on a government-controlled secret database that contains the moment-by-moment movements of individuals even before they test positive.
While the Israeli scheme is based on GPS data, the Singapore scheme is based, to a large extent, on Bluetooth Low Energy phone-to-phone transmission: Phones constantly broadcast a system-provided identifier, and collect identifiers received from nearby phones. Once a user tests positive, their phone uploads its transmitted and collected identifiers to a central database along with other location information. The center then contacts users who were collocated with the infected individuals for potential quarantine and treatment.
Bluetooth-based proximity tracing significantly reduces the false positive rate: Indeed, typical GPS accuracy is 5 meters in an open area, and degrades in more urban ones. Also, GPS signal cannot typically detect vertical separation (e.g. different floors of a building) and is useless in large indoor spaces. In contrast, the energy of Bluetooth signal can, in principle, be sufficiently lowered to make the effective range be as small as 2 meters.
Perhaps even more importantly, while the Singapore scheme still involves a government-controlled database that contains privacy-invasive information on the population, the phone-to-phone nature of the Bluetooth-based tracing mechanism opens the door to an almost completely decentralized contact tracing method that does not involve any private database whatsoever! Indeed, the following idea has been proposed and developed by a number of teams around the world almost simultaneously, with small variations [CTV, COVID-watch, DP3T, PACT, UW-PACT, TCN]: Have each participating phone continuously broadcast, using Bluetooth, a (pseudo)random identifier that’s changing every few minutes. (Following [PACT], we’ll call these identifiers chirps.) Phones also record the chirps received from nearby phones. No location or time information is recorded. Staggered collocation (say, via “surfaces”) can captured in a number of ways, e.g. by placing re-chirping devices in the relevant locations. When a user tests positive, she obtains a permission code from the medical facility, which allows her to upload the chirps that she generated in the past two weeks (or another specified time period) to a public database. All users can then periodically download the recent updates to the database and find out whether they have been collocated with an infectious individual by comparing the chirps their phone collected with those in the database Notice that the check is done locally on the user’s phone, so no one else but the user learns of collocation. The user is then given the autonomy to decide to contact health authorities to get checked and potentially quarantined. Crucially, this approach (let me call it the decentralized approach) does not require the government to collect or store any private information on the population.
The astute reader will have probably already noticed that, at least as described, the scheme does leave users open to a number of attacks by misbehaving fellow users. However, most of these attacks can be mitigated by simple counter-measures. Others require somewhat more sophisticated cryptographic tools, see e.g. [CTV, V, PACT]. Furthermore, these attacks are more local in nature, and a far cry from the harms that a government-controlled database with private (and not necessarily COVID-related) information on much of the population inflicts on civil liberty.
As governments around the world are scrambling to adopt some automated contact tracing app, they are faced with the choice of a centralized app (such as the Israeli or the Singapore ones), or a decentralized one as sketched above. While some governments have already made a clear choice (e.g. the UK has chosen a centralized app), at the time of writing these lines most others are still debating. A particularly heated debate is taking place at the EU, where the decentralized approach is supported by one group of scientists [DP3T], while the decentralized approach is supported by another group [PEPP-PT]. Even the Data Protection Unit of the European Commission has weighed in.
Apple and Google have announced a joint design and API that follows the decentralized approach almost exactly as described [CTV, DP3T, PACT, UW-PACT]. Indeed, having these two giants on board provides a significant push to this effort.
It should be noted that, while much of this blog is dedicated to the protocol design and its privacy aspects, designing an effective automated contact tracing app is challenging on at least two additional levels: First, Bluetooth has proven to be rather onerous to work with, so distance estimation turned out to be challenging and requires innovation. Second, coming up with a formula that determines when a person is in significant enough danger to have been infected, given the number, approximate distance, and durations of collocation events, turns out to be non-trivial even for medical experts. Here one should take into consideration both the protection of the alerted individual and the overall dampening effect of the app on the spread of the pandemic. There are also some delicate tradeoffs here between individual privacy, effectiveness of the system, and the level to which a more traditional, manual contact tracing is engaged. For instance, by exposing herself a bit more, a positive-tested user can upload more information, that will be more useful to the alerted users, and might avoid the need in engaging manual tracing (which has its own set of side-effects). This tradeoff is rather delicate and complex. I will address it in a separate blog post.
Still, from a societal point of view, the main issue at stake here seems to be the level of autonomy and control that an individual is given regarding their privacy and fate. On the one hand, we are all in it together. On the other hand, we can only flourish as a society if we can maintain our autonomy and liberty as individuals.
So this seemingly simple task of automated contact tracing is anything but. Instead, it opens a number of new and exciting research directions: in low energy wireless, cryptography, computational epidemiology, ethics, law, public policy, and the intimate relationships between them.
Given the current sprint to market, the first generations of automated contact tracing apps are bound to have quirks, vulnerabilities, and unexpected consequences. Hopefully they will still be effective, and also help us understand the issues better and design improved next-generation apps.
I’d like to end with a seemingly unrelated story from a very interesting workshop on racial bias and algorithms, held at the Simons institute for Theoretical Computer Science last spring. One of the speakers, an American researcher who studied racial disparities in medical treatment in France, described her methodology and findings, and then remarked that her work was made so much harder by the fact that French medical records did not specify the race of the patient. She expressed frustration and noted that this made no sense to her since this is pertinent information, especially if one wants to guarantee fair treatment. Then an older member of the audience said with a French accent: “But of course, it’s because of the war…” This was a great reminder how invasive databases, created with the best of intentions, can be easily turned into tools of oppression and death when the regime turns darker. It was also a reminder of how easy it is to forget and repeat past mistakes.
Brookline, MA, 4/22/2020