The Challenge of Online Identity: Part 3

FingerprintThis is the third and last of a series of blog posts (see part one and part two) in which I set out to examine the current state of identity management in our industry and where it’s going. The real point of this series has been to answer the question (which will be familiar to any parent of children who drives) ‘Are we there yet?’ – the destination in this case being not Legoland, but a much-discussed concept in our industry, Online Identity 2.0.

Are we there yet?

Along the way I’ve surveyed the current landscape and looked at the multi-dimensional influence of Web 2.0. The journey has highlighted tensions caused by Web 2.0, and the new software models of the API-driven web, and the pressure it puts on existing models of identity management which, as I have attempted to show, struggle to cope with the complexity of this new universe.

In this post I want to delve a little deeper into the privacy and security implications of that environment and to look forward to the semantic web, before making some recommendations for how I think identity management needs to develop in order to get us to our goal.

Lock-in, ownership & control

The providers of social network services have been quick to understand the potential of being online identity hubs for their users. This is a natural function of their prime aim of driving up usage; being an identity provider is just one more service that Google or Yahoo can deliver and one which keeps them firmly at the centre of our online worlds.

As we move more of our identity information online it becomes potentially much easier to move that information around. With that ease of moving data around, concern increases that our personal data could be passed on to third parties. At best it might then be used to spam us, or be sold to our competitors. At worst, we might loose control over ownership of our online identity altogether, and it could be used for fraudulent purposes. These data protection issues are furthermore set in a global context where national legal frameworks may no longer make sense.

Who to trust

In this federated environment of identity providers there are a number of important questions that must be addressed, including:

  1. How much do we trust these identity providers with our personal details?
  2. Who audits services such as Facebook or LinkedIn to ensure security issues are addressed?
  3. How much to we trust the downstream sites using these identity services with our personal details?

With these questions in mind, it is possible to imagine an ‘identity supply chain’ where different entities within the chain only know the smallest parts of a given identity that are needed to perform their function. For example, I could log in to a website without the website itself knowing my password. Similarly I could order goods without disclosing my full identity to the shipping agent, and I could leave commentary on a blog without the blog system needing to know my postal address.

The semantic web

In the web of linked data, identity is centrally important for determining trust, provenance and authenticity. Understanding who made a particular assertion is essential within scientific communication, for example, which is necessarily a continuous debate. In such a discourse, degrees of trust and certainty are necessarily important in evaluating and combining facts from different sources. Identity is needed for:

  • Annotation – making commentary and building discussion around facts and data
  • Augmentation – adding new data and assertions to existing data sets based on new evidence and experiments
  • Refutation – allowing statements to be contradicted according to new evidence or new interpretation of existing evidence, based on different degrees of trust within a system

All of the above types of communication are quite possible using the linked data semantic web model we have today. For example, I am free to publish any kind of statements I like which refer to other statements. However in doing this it is critical that the notion of my identity is preserved in relation to the statements I make, just as its is critical that the identity of the author of the original statements is clear.

Wikipedia or Schizopedia?

The huge success of Wikipedia has shown that collaboration and openness combined with low cost of usage can combine to produce true value. The centralised model of Wikipedia allows the tracking of individual edits but not necessarily on a named or identified user basis. Essentially, the decreased end user cost of building the information resource has been traded against a consequent lack of traceability and provenance.

Compare this with the semantic web linked data model. Here no centralisation is expected (or even possible). I can publish some facts, you can comment, agree, refute or augment these facts, and by publishing your assertions in the linked data cloud you can also join the conversation. So can many other people, with many other agreements, contradictions and additional observations. Here the conversation could start to resemble schizophrenia, with many voices talking at once. Without a solid notion of identity and provenance it is impossible to build a consistent and coherent model of the facts.

New licence models

Many publishers have traditionally licensed their intellectual property to third parties in the form of data sets. This could be, for example, to provide language translation devices to language students, or alternatively to provide an abstract service to complement the primary source materials in a particular discipline. These licence deals inevitably involve the simple transfer of the published content as a set of static files from licensor to licensee.

However, as the service offered by publishers matures to include APIs to their data and services instead of simple file transfer, the need to address identity issues arises. Typical identity issues include:

  • The need to ensure the API is used only by the licensee
  • The need to record usage metrics by each licensee

Additional issues may include the need to track the licensee’s individual users and their usage patterns. Furthermore if the API delivers services to the licensee’s end users beyond simple search and content retrieval, then it may be necessary to exchange authentication and identity information about the end user of the site.

Conclusions

Drawing all the threads together that I have explored in this and my previous posts on Identity Management I have come up with the following conclusions.

The answer the question ‘are we there yet?’ is of course no, not yet. The more interesting question is whether we are even on the right road to get there. In order for us to reach our destination, a rational and usable system for managing identities on the web, the following needs to happen in my view.

Publishers and information professionals need to collaborate to design an identity framework what meets the needs of all stakeholders including contributors, researchers and institutions. This framework should be built on existing open standards such as OpenID and DOI but must not sacrifice usability. The solution must be built on an organisational infrastructure which is credible and can be trusted across the industry, and should ideally be based on open source software which can be independently audited for security concerns by any interested party.

The starting point for such a collaboration may already be in place with the recently announced ORCID (Open Researcher and Contributor ID) initiative. I for one will be watching this project closely over the coming months and I look forward to their developments and progress.

And in the meantime … keep quiet you kids there in the back!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>