Categories
blog

With Data Anonymization Becoming A Myth, How Do We Protect Ourselves In This World Of Data?

With Data Anonymization Becoming A Myth, How Do We Protect Ourselves In This World Of Data?

With humanity moving into the world of big data, it has become increasingly challenging, if not impossible, for individuals to “stay anonymous”.

Every day we generate large amounts of data, all of which represent many aspects of our lives. We are constantly told that our data is magically safe for releasing as long as it is “de-identified”. However, in reality, our data and privacy are constantly exposed and abused. In this article, I will discuss the risks of de-identified data and then examine the extent to which existing regulations effectively secure privacy. Lastly, I will argue the importance for individuals to take more proactive roles in claiming rights over the data they generate, regardless of how identifiable it is.

What can go wrong with “de-identified” data?

Most institutions, companies, and governments collect personal information. When it comes to data privacy and protection, many of them assure customers that only ”de-identified” data will be shared or released. However, it is critical to realize that de-identification is no magic process and cannot fully prevent someone from linking data back to individuals — — for example via linkage attacks. On the other hand, there are also new types of personal data, like genomic data, that simply cannot be de-identified.

Linkage attacks can re-identified you by combining datasets.

A linkage attack takes place when someone uses indirect identifiers, also called quasi-identifiers, to re-identify individuals in an anonymized dataset by combining that data with another dataset. The quasi-identifiers here refer to the pieces of information that are not themselves unique identifiers but can become significant when combined with other quasi-identifiers [1].

One of the earliest linkage attacks happened in the United States in 1997. The Massachusetts State Group Insurance Commission released hospital visit data to researchers for the purpose of improving healthcare and controlling costs. The governor at the time, William Weld, reassured the public that patient privacy was well protected, as direct identifiers were deleted. However, Latanya Sweeney, an MIT graduate student at the time, was able to find William Weld’s personal health records by combining this hospital visit database with an electoral database she bought for only US$ 20 [2].

Another famous case of linkage attack is the Netflix Prize. In October 2006, Netflix announced a one-million-dollar prize for improving their movie recommendation services. They published data about movie rankings from around 500,000 customers between 1998 and 2005 [3]. Netflix, much like the governor of Massachusetts, reassured customers that there are no privacy concerns because “all identifying information has been removed”. However, the research paper How To Break Anonymity of the Netflix Prize Dataset” was later published by A. Narayanan and V. Shmatikov to show how they successfully identified Netflix records of non-anonymous IMDb users, uncovering information that could not be determined from their public IMDb ratings [4].

Some, if not all, data can never be truly anonymous.

Genomic data is some of the most sensitive and personal information that one can possibly have. With the price and time it takes to sequence a human genome advancing rapidly over the past 20 years, people now only need to pay about US$ 1,000 and wait for less than two weeks to have their genome sequenced [5]. Many other companies, such as 23andMe, are also offering cheaper and faster genotyping services to tell customers about their ancestry, health, traits etc [6]. It has never been easier and cheaper for individuals to generate their genomic data, but, this convenience also creates unprecedented risks.

Unlike blood test results having an expiration date, genomic data undergoes little changes over and individuals’ lifetime and therefore has long-lived value [7]. Moreover, genomic data is highly distinguishable and various scientific papers have proven that it is impossible to make genomic data fully anonymous. For instance, Gymrek et al. (2013) argue that surnames can be recovered from personal genomes by linking “anonymous” genomes and public genetic databases [8]. Lippert et al. (2017) also challenge the current concepts of genomic privacy by proving that de-identified genomes can be identified by inferring phenotypic measurements such as physical traits and demographic information [9]. In short, once someone has your genome sequence, regardless of the level of identifiability, your most personal data is out of your hands for good — unless you could change your genome the way you would apply for a new credit card or email address.

That is to say, we, as individuals, have to acknowledge the reality that simply because our data is de-identified doesn’t mean that our privacy or identity is secured. We must learn from linkage attacks and genomic scientists that what used to be considered anonymous might be easily re-identified using new technologies and tools. Therefore, we should proactively own and protect all of our data before, not after, our privacy is irreversibly out of the window.

Unfortunately, existing laws and privacy policies might protect your data far less than you imagine.

Understanding how NOT anonymous your data really is, one might then wonder how existing laws and regulations keep de-identified data safe. The answer, surprisingly, is that they don’t.

Due to the common misunderstanding that de-identification can magically make it safe to release personal data, most regulations at both the national or company levels do not regulate data that doesn’t relate to an identifiable person.

At the national level

In the United States, the Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA) protects all “Individually Identifiable Health Information (or Protected Health Information, PHI)” held or transmitted by a covered entity or its business associate, in any form or media. The PHI includes many common identifiers such as name, address, birth date, Social Security Number [10]. However, it is noteworthy that there are no restrictions on the use or disclosure of de-identified health information. In Taiwan, one of the leading democratic countries in Asia, the Personal Information Protection Act covers personal information such as name, date of birth, ID number, passport number, characteristics, fingerprints, marital status, family, education, occupation, medical record, medical treatment etc [11]. However, the Act doesn’t also clarify the rights concerning “de-identified” data. Even the European Union, which has some of the most comprehensive legislation for protecting data, states in its General Data Protection Regulation (GDPR) that “the principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable” [12].

Source: Privacy on iPhone — Private Side (https://www.youtube.com/watch?v=A_6uV9A12ok)

At the company level

A company’s privacy policy is to some extent the last resort for protecting an individual’s rights to data. Whenever we use an application or device, we are complied to agree with its privacy policy and to express our consent. However, for some of the biggest technology companies, whose business largely depends on utilizing users’ data, their privacy policies tend to also exclude the “de-identified data”.

Apple, despite positioning itself as one of the biggest champions of data privacy, states in its privacy policy that Apple may “collect, use, transfer, and disclose non-personal information for any purpose [13].” Google also mentions that they may share non-personally identifiable information publicly and with partners — like publishers, advertisers, developers, or rights holders [14]. Facebook, the company that has caused massive privacy concerns over the past year, openly states that they provide advertisers with reports about the kinds of people seeing their ads and how their ads are performing while assuring users that Facebook doesn’t share information that personally identifies the users. Fitbit, which is argued to have 150 billion hours of anonymized heart data from its users [15], states that they may share non-personal information that is aggregated or de-identified so that it cannot reasonably be used to identify an individual [16].”

Overall, none of the governments or companies are currently protecting the de-identified data of individuals, despite the foreseeable risks of privacy abuses if/when such data gets linked back to individuals in the future. In other words, none of those institutions can be held accountable by law if such de-identified data is re-identified in the future. The risks fall solely on individuals.

An individual should have full control and legal recourse to the data he/she generates, regardless of identifiability levels.

Acknowledging that the advancement of technology in fields like artificial intelligence makes complete anonymity less and less possible, I argue that all data generated by an individual should be seen as personal data despite the current levels of identifiability. In a rule-of-law and democratic society, such a new way of viewing personal data will need to come from both bottom-up public awareness and top-down regulations.

As the saying goes, “preventing diseases is better than curing them.” Institutions should focus on preventing foreseeable privacy violations when “anonymous” data gets re-identified. One of the first steps can be publicly recognizing the risks of de-identified data and including it in data security discussions. Ultimately, institutions will be expected to establish and abide by data regulations that apply to all types of personally generated data regardless of identifiability.

As for individuals who generate data every day, they should take their digital lives much more seriously than before and be proactive in understanding their rights. As stated previously, when a supposedly anonymous data is somehow linked back to somebody, it is the individual, not the institution, who bears the costs of privacy violation. Therefore, with more new apps and devices coming up, individuals need to go beyond simply taking what is stated in the terms and conditions without reading through, and acknowledge the degree of privacy and risks to which they are agreeing. Some non-profit organizations such as Privacy InternationalTactical Technology Collective and Electronic Frontier Foundation may be a good place to start learning more about these issues.

Overall, as we continue to navigate the ever-changing technological landscape, individuals can no longer afford to ignore the power of data and the risks it can bring. The data anonymity problems addressed in this article are just several examples of what we are exposed to in our everyday lives. Therefore, it is critical for people to claim and request full control of and adequate legal protections for their data. Only by doing so can humanity truly enjoy the convenience of innovative technologies without compromising our fundamental rights and freedom.

Reference

[1] Privitar (Feb 2017). Think you ‘anonymised’ data is secure? Think again. Available at: https://www.privitar.com/listing/think-your-anonymised-data-is-secure-think-again[2] Privitar (Feb 2017). Think you ‘anonymised’ data is secure? Think again. Available at: https://www.privitar.com/listing/think-your-anonymised-data-is-secure-think-again[3] A.Narayanan and V. Shmatikov (2008). Robust De-anonymization of Large Sparse Datasets. Available at:https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf [4] A.Narayanan and V. Shmatikov (2007). How To Break Anonymity of the Netflix Prize Dataset. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.3581&rep=rep1&type=pdf[5] Helix. Support Page. Available at: https://support.helix.com/s/article/How-long-does-it-take-to-sequence-my-sample [6] 23andMe Official Website. Available at: https://www.23andme.com/[7] F. Dankar et al. (2018). The development of large-scale de-identified biomedical databases in the age of genomics — principles and challenges. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5894154/[8] Gymrek et al. (2013). Identifying personal genomes by surname inference. Available at: https://www.ncbi.nlm.nih.gov/pubmed/23329047 [9] Lippert et al. (2017). Identification of individuals by trait prediction using whole-genome sequencing data. Available at: https://www.pnas.org/content/pnas/early/2017/08/29/1711125114.full.pdf [10] US Department of Health and Human Services. Summary of the HIPAA Privacy Rule. Available at: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html[11] Laws and regulations of ROC. Personal Information Protection Act. Available at: https://law.moj.gov.tw/Eng/LawClass/LawAll.aspx?PCode=I0050021[12] GDPR. Recital 26. Available at: https://gdpr-info.eu/recitals/no-26/ [13] Apple Inc. Privacy Policy. Available at: https://www.apple.com/legal/privacy/en-ww/ [14] Google. Privacy&Terms (effective Jan 2019). Available at: https://policies.google.com/privacy?hl=en&gl=tw#footnote-info [15] BoingBoing (Sep 2018). Fitbit has 150 billion hours of “anonymized” health data. Available at: https://boingboing.net/2018/09/05/fitbit-has-150-billions-hours.html [16] Fitbit. Privacy Policy (effective Sep 2018). Available at: https://www.fitbit.com/legal/privacy-policy#info-we-collect

By Hsiang-Yun L. on April 29, 2019.
Categories
blog

What consent should look like when you share your data.

What consent should look like when you share your data.

Bitmark handles consent differently than most apps: we take greater measures to empower the user to have full control over their personal data.

We’re getting ready to launch a beta of our Data Donation App that will make it easy for individuals to donate their personal data to public health studies. Initially there will be two studies from researchers at the UC Berkeley School of Public Health.

This article describes some of the new methods we’re using to make data sharing safe. The Bitmark app uses blockchain technology to keep the ownership of your data secure. The provenance of your data is recorded in the blockchain and then your data is transferred to the recipient using end-to-end encryption. This records clear consent via an authenticated “chain-of-title” — meaning you always know who has rights to access your data.

Most importantly the Bitmark blockchain provides a framework of standardized property rights, rules and infrastructure for your personal data — now you can own your digital data in the same ways you can own physical property.

How the Data Donation App works.

Individuals can browse public health studies and learn about how their data will be used (Women’s reproductive health; Diabetes remission and prevention; etc). A research study has a shareable URL that links directly into the Data Donation app:

Individuals that meet the eligibility requirements can tap a button to participate in the study. The App will then request permission from the participant to access their data. Each time data is shared the participant will be required to sign the transfer.

Taking a step back.

It’s worth pausing for a moment and comparing how different this process is from the other mobile apps. After the initial request to access your data, most apps don’t inform users what personal data is being collected. Accessing user data is like the Wild West. Big companies make money by tapping into the enormous amount of “free” information created by individual mobile users, bundling it together and selling it to the highest bidder.

When apps gain access in perpetuity to personal data individuals lose their freedom. Yes, it’s possible to revoke access. But that requires significant effort the user’s side. Even then, the choice is binary: grant or deny access to all requested data.

Here’s how we can do better.

The Bitmark app makes consent to transfer data an explicit action. When you join a study, you agree to donate data in regular intervals. Yet each time before your data is transferred, you will be asked to sign.

Why do we require your signature each and every time your personal data is transferred? Because we want you to be in control and know what is going on. When you donate data you are issuing a new digital property title, or “bitmark” for your data that will be recorded in the Bitmark blockchain. When you transfer that bitmark to the researcher they can access that specific data set. Your signature is your consent.

A signed transfer is recorded into the blockchain and linked to your signed issuance. This “chain of title” protects both parties, without relying on a central intermediary. (The Bitmark server cannot decrypt donor data or use it for any purpose.) Both sides get clarity as to where the data came from and who gets to use it.

Here’s a diagram of this process:

Note: At anytime during the course of a study, participants can simply choose not to donate data or withdraw from the study entirely. No further data will be collected after that point.

A new model for data consent.

We believe explicit consent through chain-of-title is how the exchange of data should happen. Not just for research, but for all personal data transfers.

In the academic world, when a study is considered to evaluate “human subjects research” it must have approval by the Institution to be conducted. This approval process protects the institution administering the studies and the participants of the research. (UC Berkeley has a great article on this http://cphs.berkeley.edu/review.html).

Bitmark believes similarly that individual internet users should be be able to safely and privately share their digital data. The Bitmark blockchain can enable a new model of consent for transferring personal data:

  1. Public keys are used to identity participants, instead of real names or even usernames.
  2. End-to-end encryption protects the data during storage and transport.
  3. No third parties can access personal data, even Bitmark.
  4. Participants consent is signed and recorded in the Bitmark blockchain every time their data is share to a researcher of their choosing.
  5. Participants can always opt-out and no further data transfers happen.


If you are interested in how blockchain technology can be used as titles (the Bitmark blockchain) versus the ever-popular use as tokens (Bitcoin and/or Ethereum blockchains), look for a blog post coming soon that explains the difference. Follow us on Twitter, @BitmarkInc, to see what else we’re thinking about.

We are thrilled to have UC Berkeley as our first partner for this blockchain application. If you are interested in participating in the Bitmark Data Donation App, either by listing your study, or incorporating this new technology into your project, please email support@bitmark.com.

By Bitmark Inc. on October 2, 2017.
Categories
blog

Bitmark + IFTTT: How to take ownership of your digital life and plan your estate.

Bitmark + IFTTT: How to take ownership of your digital life and plan your estate.

What would happen to your personal data and digital assets if something happened to you?

The process of preparing for the transfer of assets after death is known as estate planning. Estate, a common law term, means an individual’s property, entitlements, and obligations. In modern society, legal systems have elegant solutions for handling the assets that we accumulate and create in the physical world. But increasingly the stuff we create and value most exists only in our digital lives, where there’s no system for individual ownership. In the digital environment estate planning is a minefield.

Individual humans create value by living their lives online — producing works of art, sharing ideas and opinions, uploading personal financial and health information, or buying and storing things like music and movies.

But we don’t own our stuff on the internet. We give it away for free, and, in the process, we’re losing our ability to plan for our future.

The Bitmark mission is to empower universal digital ownership, and we’re making simple tools that help you gain freedom and control of your most valuable data within the digital environment. If we could own our digital lives just like we own everything we buy and build in the physical world, wouldn’t this add to our wealth and freedom? We think so. To make digital estate planning more accessible and automated for everyone, Bitmark has partnered with IFTTT, an IoT service that gives users greater control of their personal data across a wide variety of apps and online services.

“We’re excited to have Bitmark as a partner. They’re a unique service, and doing something incredibly ambitious. Applets will help reach a broader audience that’s just beginning to think about digital data ownership and attribution.”
—Linden Tibbets, CEO of IFTTT

Start your digital estate in 5 minutes.

To get you going, we have an initial set of IFTTT Applets that interface with the digital environments where you create and share things: social media, fitness and health apps, productivity and financial software, and much more:

Bitmark IFTTT Applets

These Applets apply a mark of accepted ownership to your data and embed it into Bitmark’s standardized, universal digital property system. It’s an automated process that transforms your data into an asset that you own and pass down to loved ones.

We recommend you experiment with a few of these Applets first, and then decide which data and assets are most valuable to you. (If you’re lacking ideas, we published a blog post earlier this summer about what two of our Bitmark team members would choose.)

Here is how this process will look:

Step 1: Turn on some Applets and authorize IFTTT.

Once IFTTT is authorized, it automatically bitmarks your new property via the connected Applets. You can view your property in Bitmark’s app, where you can also issue new bitmarks for any other document type on your computer or phone:

Step 2: Check out your digital properties.

Next steps: Grant access to your estate (coming soon).

When property ownership is clear, the access and management rights to your estate, (known as fiduciary duty) is more easily worked out. These details will depend on local regulation, in the same manner as the things we own in the physical world.

Usually it requires a long, expensive, legal process for your loved ones to access your accounts — your emails, cloud storage, and digital data that’s in your name. Not to mention that, in many cases (read Twitter’s deceased user article, or Wiki article about Death and the Internet), your loved ones will never be allowed access to your accounts, and if they try, it will be a criminal offense. Ouch.

Bitmark for digital estate planning has two goals: 1) provide individuals with a structured, secure system for assigning ownership to your digital assets and data; 2) pave the way to a more free and fair legal framework for our digital lives and valuables.

Think of what we are providing today as a basic first step. Bitmark’s tools provide a framework and infrastructure to begin organizing and protecting your digital property. In the future we will add more options that make it easier to assign access to your digital estate with your lawyer, spouse, and loved ones.

“Bitmark’s work with IFTTT confirms and tracks ownership of online data, which is a significant step towards intentional management of any digital estate and future planning for incapacity and death.”
 — Megan L. Yip, Attorney, Estate planning and digital assets

Bitmark is empowering the individual to take back ownership.

Bitmark is the property system for the digital environment. As a system to manage digital property, Bitmark makes it possible to own and transfer title to anyone. For individuals, ownership is power. By establishing ownership to your data, you can in turn derive value from your digital property just as you do from the things you own in the physical world: selling, buying, transferring, donating, licensing, passing down, protecting, and much more.

This tool for digital estate planning is just one piece of our larger mission to empower universal digital ownership so we can live free online. Digital property will level the playing field for who can achieve success online — creating new avenues for wealth, prosperity, and achievement on the Internet that are not currently possible for the vast majority of people.

Read our “Defining Property in the Digital Realm: parts one, two, and three for a more in-depth context to this post.

If you would like to stay posted on future applications of Bitmark and how we are transforming the Internet into a new system built on individual freedom and empowerment, subscribe to our newsletter.

By Bitmark Inc. on July 18, 2017.
Categories
blog

What’s in a property title?

What’s in a property title?

Best practices for safely bitmarking your data & organizing your digital assets.

When I talk to people about Bitmark there is confusion about digital property. I think most of this confusion boils down to what exactly is in a property title (or in the digital realm, a “bitmark”), and what is in the asset itself.

A title is a public ownership claim over an asset. The asset itself can be made public or kept private — that’s totally up to the owner. Titles are always public. One function of the title is to uniquely identify that asset. (You can think of that like the address to your home on its deed or the vehicle ID number on your car title). But titles do more. Titles make property rights transferrable from one owner to another. That has massive value which we will explain later. In this article, I want to focus on clarifying what should and should not be in a digital title, thus how you should and shouldn’t bitmark your stuff. Let’s use an example to get started.

The other day, I read that location information is super valuable so I wanted to start an Applet bitmarking my location data:

The Applet automatically created the following digital property:

The property bitmark (title) represents rights to access my location data. This record is visible from the Registry. Thus, if I wanted to give or sell my location data (asset), I would not want that data embedded in the title itself. Yet that is what this Applet did. Inside the public metadata of the bitmark contains a link to my actual location:

Location is a data set that most people would think of as sensitive. I know I do. Putting sensitive data into the title is not what we want.

What should go in a title?

It is important that the title describes the asset, usually from an economic value perspective, without revealing potentially sensitive information about the asset itself.

Here are three examples to help clarify:

  1. Fitbit daily activity — Put things like date and device type, maybe a defining characteristic like age or gender (your preference) in the title. Everything else (step count, calories burned, food ate, sleep cycle, and heart information, …) should go in the asset itself. Folks who want the metadata for their research purposes, can ask your permission, you give consent and they get the full asset.
  2. Instagram photos — Similar to health data, you’ll want to name your photo title something that defines the asset like a caption. You can include the time, location, date or the bare minimum of information that makes it memorable and valuable. If the Bitmark property system becomes a sort of marketplace one day, a gallery buyer can potentially search relevant titles for something they want to highlight in their next show. The asset is the photo itself.
  3. Medium stories — Include sparse but important information about the piece, date, author, or title perhaps. The metadata is the story itself. By being bitmarked, we hope someday these titles will be checked or authenticated, so that when content is shared over messaging apps, the reader knows they are reading a verified source. Think similarly to the blue check box next to certain handles on Twitter.

Bitmark is the universal property ownership system for digital environment.

One of the most important functions a formal property system does is to transform assets from a less accessible state to a more accessible state, so that ownership can be easily communicated and assembled within a broader network. When an individual asserts ownership over their data control points change: networks become economies.

Converting an asset such as a house into an abstract concept such as a property right requires a complex system to record and organize the socially and economically useful attributes of ownership. The act of embodying an asset in a property title and recording it in a public ledger facilitates a consensus among actors as to how assets can be held, used, and exchanged.

Bitmark is about imagining a future where individual internet users will take back ownership of their digital lives — a new internet built on individual freedom and empowerment where everyone has a chance at success. This freedom stems from ownership of digital property just as we own everything we buy and build in the physical world.

If you’re interested in going deeper, one of the best features of the IFTTT platform is that you can create your own Applets, extending core functionality that the service provides. For anyone looking to extend our service here is a list of metadata options to consider:

Created (date), Contributor, Coverage, Creator, Description, Dimensions, Duration, Edition, Format, Identifier, Language, License, Medium, Publisher, Relation, Rights, Size, Source, Subject, Keywords, Type, Version, Other*

These options come from our web app and they work well for most personal data and digital assets. We recommend using them, but you can also create your own metadata. Just be clear on what will always be public (titles) and what can be kept private (assets). A good analogy to keep in mind is that the deed (title) to your home doesn’t reveal what’s inside your home, but it does explain where to find your home.

Join us in our efforts to democratize the digital economy. Sign up for our beta and try our IFTTT service. If you have any questions, we’re @BitmarkInc on Twitter.

By Bitmark Inc. on July 15, 2017.