Combining identity and access management

identity and access management

Identity and access management introduction

Publishing is an increasingly fast-moving industry, with user expectations outstripping the ability of businesses to keep pace in certain key areas. Increasingly, seamless access is becoming central to library and end-user expectations in engaging with publisher content [1,2]. Standalone Single Sign-On (SSO) offerings – otherwise known as identity and access management (IAM) systems – are being tested by many librarians and publishers, in lieu of older monolithic access and entitlement functionality, which is typically enmeshed within legacy online publishing platforms. In this briefing, we define the differences and explain the benefits of the combined Identity and Access Management approach taken within SAMS Sigma.

Common terms

To ensure we’re being clear in a field that is in the midst of change we think it’s important to clarify some common terms before going further. We have noticed that historical practices in our industry have generated the potential for confusion and we would like to help set that straight.
Here are our definitions of four key terms:

Identity

For our purposes, individuals have sets of personal attributes such as name, ORCID identifiers or Twitter handles. This set of personal attributes contributes to an individual’s identity. Individuals may have multiple identifiers – email addresses are a great example – but only one identity.
Importantly, organizations also have identities, again associated with multiple attributes and identifiers. To deliver flexibility and control to publishers, a well-designed IAM system must enable the management of both individual and organization identities, respecting the fact that organisations are very different to individuals. The record of an individual or organisational identity within an IAM system is typically called a profile or an account.

Authentication

Within IAM systems both individuals and organizations may have particular attributes, called variously ‘authentication identifiers’ or ‘credentials’. The best known of these is the combination of username and password, often shortened as ‘user-pass’. When arriving at a website or app protected by an IAM system, a user must enter their authentication identifier in order to gain access to the site or app. This authentication process establishes to the website that the user is who they claim to be – that is, they have an identity that can be communicated to the website by the IAM system.

Authorization

On publisher websites and apps, simple confirmation of identity is not enough; the user also needs to be authorized in order for them to use the specific content licences or other functionality to which they may be entitled.
Within a publisher focussed IAM system, profiles are associated with entitlement information which detail the content licences or other entitlements available to either the individual or the organisation. Once a user is authenticated the IAM system provides the website or app with information about that user’s entitlements. For publishers, these might include subscriptions, pay-per-view purchases, trials or other types of licences.

Single-sign-on

A SSO system removes the need for users enter their credentials at every website they visit. The idea is simple – the user signs on once, establishing their identity with the IAM system, and their authentication persists for a set period of time, typically 30 days or more. During this time, the user can visit each website or app connected to the IAM system without having to re-enter their username and password.
Systems which allow a user to use the same username and password across multiple websites are not SSO if the user is still required to sign in on each site they visit.

Figure 1 explains the connections between identities, identifiers, access and entitlement.

Combining identity and access management

Figure 1 Defining identities, access and entitlement

Defining the space

B2C identity management

B2C identity management services – Salesforce Identity, Gigya, or Forgerock, for example – are built on the back of the social media explosion, and offer SaaS based cross-domain, cross-device SSO for individual identity management only. This is combined with a role-based entitlement system, whereby users are allowed access to different services based on their individual role. Roles-based entitlements are usually quite simple – ‘read only’, ‘administrator’, and ‘super-user’ are common examples. Clearly this does not solve the entitlement problem, as publishers still need to model complex content licences. Similarly, these systems also do not model organisational access, with all the attendant complexities of IP addressing, Shibboleth and consortium membership to mention just a few. With these B2C IAM systems therefore this gap in functionality can only be filled by complex bespoke development by the publisher.

Legacy access management

Legacy access management services – such as Atypon’s eRights, or Semantico’s SAMS 6 – have been around for almost as long as publishers have had digital services. They offer institutional authentication and access control via IP address recognition, Shibboleth federations, and organizational username/password combinations, but do not allow for flexible individual identity management or SSO. These are also not SaaS systems and are therefore subject to slower and more costly update cycles. Access management services usually permit publishers to manage quite granular entitlements (e.g. subscriptions and trials), such that different end users are able to access different content items based on their institution’s purchases.

Combined identity, access and entitlement management

SAMS Sigma is a SaaS based comprehensive identity, access and entitlement management solution, combining 12 years of investment in institutional access and entitlement management with best-of-breed, cross-domain SSO for individual identity management. SAMS Sigma allows publishers to accurately model content licence sales to both individuals and organisations together with the rich variety of relationships that exist between these types of profiles, including remote access, organisation hierarchy and consortium membership.

Comparison of services

Authentication: identity and access management

When users arrive at a protected site, they must first be authenticated – that is, the site must identify the user, either as an individual or as a user from a specific institution.

Item SAMS Sigma B2C identity
management
Legacy access
management
SSO through OpenID Connect yes yes
SSO through OAuth yes yes
SSO through social sign-on Forthcoming yes
Institution authentication
through IP address
yes yes
Institution authentication
through trusted referrer list
yes yes
Institution authentication
through library cards
yes yes
Institution authentication
through Shibboleth federations
(e.g. OpenAthens)
yes yes
Institution authentication
through organization username
and password
yes yes
Token-based access yes yes yes
Cross-domain authentication yes yes
Cross-device authentication yes yes
SAML compliant yes yes yes
API-based authentication yes yes yes

Table 1 Comparison of Identity and Access Management authentication services

Entitlement management

Publishers protect their content in various ways, giving users access to content based on a complex mesh of subscriptions, free content, Open Access and so on. Simple role-based entitlements rarely fit the bill.

Item SAMS Sigma B2C identity
management
Legacy access
management
Subscriptions yes yes
Trials yes yes
Text & Data mining yes yes
Custom content collections yes yes
Open Access yes yes yes
Freemium yes
Free and Gratis yes yes
eCommerce yes (with Scolaris) yes yes
Perpetual access yes yes yes
Time-limited licenses yes yes
Meterage limits yes
Signed link sharing yes
Access vouchers and tokens yes yes
Concurrency limits yes yes
Granular access down to the article yes

Table 2 Comparison of Identity and Access Management entitlement management services

User profile management

Your users need to be able to self-serve, saving your staff time and effort better spent in growing your business.

Item SAMS Sigma B2C identity
management
Legacy access
management
Self-service profile management yes yes yes
Password reset workflows yes yes
Brandable login screen yes yes
Inherited entitlements yes yes
Personalization tools (e.g. saved search) yes yes

Table 3 Comparison of Identity and Access Management user profile management services

Modelling connections

Publishers operate in a complex space, where institutions have multiple inter-relationships not only with other institutions, but also with individuals.

Item SAMS Sigma B2C identity
management
Legacy access
management
Organization hierachy models yes yes
Consortia models yes yes (partial)
Individual memberships yes yes

Table 4 Comparison of Identity and Access Management connections modelling services

Integration

Publishers typically use an ERP/ERM system as a central location for storing and updating their information. Sometimes an internal subscription management system also fulfils this role. API integration to these locations is key.

Item SAMS Sigma B2C identity
management
Legacy access
management
API support for user provision yes yes
API support for entitlement provision yes yes yes
Synchronization support via API yes yes
Business analytics yes yes yes
COUNTER reports yes yes

Table 5 Comparison of Identity and Access Management integration services

Operational

You need to know that your system is going to be operational and secure. SaaS based systems have the benefit of being able to automatically scale resources using cloud hosting technologies to maintain service levels.

Item SAMS Sigma B2C identity
management
Legacy access
management
Cloud-based yes yes
High availability (99.95%) yes yes yes
High performance with 10K concurrent users (single to low 2-digit ms latency) yes yes
24x7x365 support yes yes
Data secured to UK Data Protection Act standards yes yes
OWASP ASVS Level 2 security compliant yes yes
WCAG 2.0 AA and Section 508 accessibility standard compliant yes yes

Table 6 Comparison of Identity and Access Management operational services

Pricing

We benchmarked the prices of B2C identity management solutions based on 1.5 million active user identities per annum, or approximately 250,000 unique users per month.

Solution Model Estimated price
Salesforce identity Unique users per month (250,000 unique users per month) $105,000 to $140,000 pa
Janrain Enterprise User profiles (1.5 million identities) $186,000 to $235,000 pa
Gigya User profiles (1.5 million identities) $280,000 to $350,000 pa
Forgerock User profiles (1.5 million identities) $300,000 to $340,000 pa
SAMS Sigma Sessions per month (250,000 sessions per month) $76,000 to $100,000 pa

Table 7 Comparison of Identity and Access Management pricing

References

[1] Eduserv. (2015) Librarians’ experiences and perceptions of identity and access management 2015.
[2] Schonfeld, R.C. (2014) Meeting researchers where they start.

What can dynamic data visualization do for us?

Data visualization

The fourth paradigm of science brings with it an onslaught of data. Quantitative, qualitative, direct and anecdotal, it’s an often-acknowledged fact that the ability to collect and share vast quantities of data is the greatest change in scientific research of our times. With this new opportunity comes inherent challenges in the comprehension of data.

Enter data visualisation. For the purposes of this blog, data visualisation is defined as the dynamic manipulation of data to aid understanding and provide context; data visualisations that are formed on demand from criteria set by the user.

This is distinct from the many online tools that create graphs, charts, storyboards and timelines available for free or on a subscription which function using inputted data to create a static visualisation for publication. To further narrow the focus, this also excludes data visualisation tools that are interactive, but limited to the exploration of a single set type of visualisation; an interactive map for example.

In academic publishing we are only just setting out on the dynamic data visualisation journey. There are some great examples of visualisations that provide context and clarity to the exploration of datasets. Among them are SchoolDash, which provides maps, dashboards, statistics and analysis of schools data in England for the use of the public, journalists, policymakers, and the schools themselves, created by Timo Hannay, the founder of Digital Science.

School dash

Also notable is a recent project completed by Semantico for McGraw-Hill Education, where a dynamic data visualisation tool – DataVis Material Properties – was added to their Access Engineering database to provide students and researchers alike with configurable visualisations of material properties data, including cost, to provide instant visual context to materials data. With the functionality to save multiple visualisations, and to ‘dig down’ into more detailed information, the tool supports students, educators and researchers to tell stories with the visualisations.

Data visualization

The potential applications of tools of this type are myriad. In his excellent 2010 TED talk, The Beauty of Data Visualisation, David McCandless quoted research by Tor Norretranders that found the bandwidth of our visual senses was equable to that of a computer network (for comparison, our sense of taste has the throughput of a pocket calculator), and that data visualisations take advantage of our brains’ capacity for spotting patterns and making connections. Using dynamic data visualisations for databases and datasets could and should become a mainstream aspect of publishing research in the future.

By simply concluding, “Data Visualisations are brilliant! We should have more!”, we would be missing the real untapped potential of dynamic data visualisation. As it stands, visualizations are taken from cleaned – and therefore closed off – datasets. Imagine then if visualizations could be made from the vast raw datasets languishing in data dumps. If, instead of neatly fencing off data as an analysis reporting tool, it was scraped from raw datasets as an integral part of the research process. Indeed, in an ideal world, if these datasets could be stitched. This would free the data from the bounds of perspective and ideology.

Scaling back these heady imaginings to what is possible within current data quality conventions, the application of data visualizations to raw data is an opportunity that as it stands, we are missing. Dynamic visualisations, taken from raw data and outputted as html are achievable. These would answer a number of needs. The need for the rapid availability of data results, the need for further context, and the need for the research output in one scholarly discipline to be available and understandable to others.

There are barriers to this. Data quality is imperative, and of course, the need for raw data to be reviewable. Perhaps the main barrier for publishers though, is our continued attachment to the print mental model. The main method of publishing research is the journal article, submitted online, reviewed online, produced online and published online – as a pdf. A flat online facsimile of the printed journal article.

Dynamic data visualisation is the scholarly, educational and indeed, information, publishing opportunity of our times. Integrated into publishing workflows and online publishing platforms it could springboard the effectiveness and usability of information to the next level.

Why is integrated content important?

integrated content
Integrated content is a principle we stand by at Semantico, here’s why.

What is integrated content?

‘Integrated content’ is one of those terms that does the rounds in scholarly and professional publishing, while meaning vastly different things depending on the context. The term has stood variously for the combination of content types on a platform, such as journal articles and ebooks, or for the merging of in-house editorial and production procedures. Defined here as the intelligent integration of different kinds of content on the same platform, or even within the same content container, such as a webpage or monograph, integrated content is the future of digital publishing.

This is nothing new. Since the heady days of the first eBook, publishing pundits have predicted that content will become bitesize, with searches returning multiple snippets of content to answer queries exactly. We are not there yet, and given the difficulties of consuming granular pieces of information from a variety of sources and contexts without feeling seasick, it may take quite some time and a lot of effort on the part of publishers in imposing a compatible editorial “voice” across content types to reach this container-free nirvana.

Where we are is a pragmatic, user-centric approach to content integration. Users demand powerful search and machine readable accessibility, intuitive content discoverability within the platform, meaningful linking to related content, smart tools such as annotation, the interoperability to support researcher workflows such as sharing, but above all, simplicity.

An integrated approach

Our approach to integrated content starts with our technical architecture. Built for flexibility in order to enable and facilitate multiple types of content and an optimized user experience, it supports customer-driven platform design within a software-as-service (SaaS) infrastructure.

In the scoping and planning phases, XML files are used for domain modelling. Essentially a process of understanding the shape and form of content, from a classic journal article through to an individual scene in a play, to explore how it interacts and how value can be added to the user experience. This is primarily to ensure that the content’s complexity is fully understood, but also to maximize the content’s potential.

Of course, the potential of a body of content in the age of text mining and API interoperability is massive, so it is the role of UX to understand what structure and features the end-users actually need and want. Once this is established a mental model of what users expect to find and where can be mapped against the domain model to identify the optimum structure, taking into account established conventions and accessibility standards. An interface can then be designed that either hides the complex structure, or makes the technology and underlying data transparent, depending on the user requirements.

Why is integrated content important?

The importance of this approach is that it builds a great user experience that delivers on the richness of a content body as a whole, as opposed to its component parts. In addition to search and discovery, this empowers a whole raft features that enhance the understanding and impact of the content, among them data visualization and linked multimedia or overlay content.

The vast majority of platforms are accessed by distinctly different user groups, or by users with varying levels of understanding of the subject matter. An integrated approach to content makes it possible to address multiple groups’ needs with features such as adaptable content curation or faceted search. Further, the full diversity of a body of content can be realized, through interdisciplinary linking, or the cross-promotion of content that is of interest in more than one discipline.

As a use-case, integrated content is an ideal publishing platform model for societies, when supported by flexible identity and access management. Frequently with a membership body comprising of professionals, academics, commercial organisations and students, their content is necessarily diverse. Though their original purpose is in whole or part as a professional body, it is often the academic users who are more aware of their content and access it more frequently. The ability to cross-promote and enhance content through integration grows usage, protects membership subscriptions – and potentially drives further subscriptions by developing must-have content sets.

Integrated content case studies

The Royal Marsden Manual

Known by many in the healthcare sector as “The bible of nursing”, The Royal Marsden Manual is a reference work used by students and practitioners throughout the UK for study and reaccreditation. The online edition of the 9th Edition of the Royal Marsden Manual Online was built by Semantico in partnership with publishers, Wiley. Shortlisted for a prestigious 2016 Association of Publishers Digital Publishing Award in the Best Use of Mobile category, integrated content was central to the success of the work.

Though there had been a previous online edition of the work, the interface was cluttered and did not facilitate simple discovery and intuitive use. A simple and easy to use interface was essential for the work’s users, which includes clinical and support staff with varying levels of IT literacy. Further, a key purpose of the online edition was to allow users to upload new procedure documents. The process for process updates had been complex and as a result, the feature little utilized in the previous edition.

Collaboration between the publishers and our UX and technical teams resulted in a simplified interface designed around progressive information disclosure. This supported point of care use on mobile by integrating the disparate content types, including guidelines, best practise, procedural notes, location-specific and other custom procedural notes and user-updates in a drill-down hierarchical list for intuitive discovery. On the page, user-generated content was integrated through a simplified interface for procedure information, with clear user signposting and a direct-entry custom content upload facility that eliminated the requirement for documents to be uploaded in order to make updates.

Drama Online

Built by Semantico in collaboration with publishers Bloomsbury, Drama Online is a digital library of playtexts, filmed live performances, audio plays, theory and practice. Designed for students, it features a combination of original playtext with films and performances, workshop videos and resources, and academic and practical written content. In addition, the resource allows bookmarking for fast access to key content and personalized annotation to aid study.

The simplicity of the interface hinges around its summary pages, for playtexts, playwrights & practitioners, genres, periods, context & criticism and theatre craft, where content is sub-categorized by format and type within. Widely acclaimed, the platform has won and been shortlisted for a number of awards, including the ALPSP 2013 Publishing Innovation Awards, who awarded it Highly Commended and described the work as “A tool which clearly enhances the study and performance of drama.”

When is integrated content not important?

There is an argument that for homogenous bodies of content, such as stand-alone journals or eBooks content, integrated content is not important. This is not the case as within each of these types of content are embedded media which will benefit from the integrated approach; graphs, charts, figures, tables and bibliographies all deserve special indexing and presentation to maximise their utility and usefulness. Every body of content is different, whether those differences are its strength in a particular subject matter, its interdisciplinary potential, or its appeal to particular user groups. By understanding that content and optimizing its potential with a flexible platform architecture, both the user experience – and ultimately, its commercial value – can be maximized.

Towards a universally adopted institutional identifier

Institutional identifier

Managing the accurate accreditation of individuals, institutions and funding bodies without a universally adopted institutional identifier is a daily challenge for publishers. Although there is a universally accepted identifier for individuals in the form of the Open Researcher Contributor ID (commonly known as ORCiD), and the CrossRef-produced Open Funder Registry for research funding bodies, there is no single solution for institutional identifiers.

This is becoming increasingly untenable. The pressure from funding bodies – and the institutions themselves – to make their research output quickly and easily identifiable has brought this issue up the agenda. The problem is that most institutions are known under multiple names, few of which are actively incorrect.

The story so far

The ORCiD identifier has created an open database of researchers, linked to all of their published work, that can be used by third parties to build services, all based on a standardised numeric identifier. The Open Funder Registry has similarly created an open and searchable list of funding bodies, linked to the work they have funded. Unlike ORCiD, however, the Open Funder Registry uses a text identifier, rather than a numerical one, leaving the system open to error – how many end users are going to know which of the four ‘Ronald McDonald House Charities’ they are supposed to cite as their funding body?

The introduction of a standardised institutional identifier would eliminate any room for uncertainty about which institution an author is affiliated with. If funding bodies were also assigned a numerical institutional identifier this would close the loop, linking individuals, institutions and funding bodies to one another. This three-dimensional view would give publishers and institutions invaluable insight into research and publishing trends. Potential applications of this data include identifying interdisciplinary research opportunities within institutions, or mapping research output by field and institution, among many others.

Further, an institutional identifier would improve content discovery by making research easily searchable by institution. Bearing in mind that researchers have multiple affiliations, and indeed can also have more than one funding body, this would be a significant improvement.

Why now?

So why has an institutional identifier not been put in place already? The answer is, for the most part, a lack of commercial drivers. To ensure any system is universally adopted it has to be open access, and the pay-off for putting the system in place has not been great enough to motivate industry co-operation. Until now: the success of the ORCiD initiative and the momentum behind improvements to content discovery mean that an institutional identifier is imminent.

Towards an institutional identifier

At the STM Annual US Conference 2016 our own Co-Founder and Chairman, Richard Padley, is moderating a panel discussion called ‘Identifying the identifiers!’, where Laurel Haak, Executive Director of ORCiD, Chuck Koscher, Director of Technology at CrossRef, and Howard Ratner, Executive Director of CHORUS, are to discuss the future of identifiers and unveil plans for a universal standard. Stay tuned for more news!

Existing identifiers:

FundRef
International Coalition of Library Consortia (ICOLC)
International Standard Name Identifier (ISNI)
Libraries.org
NCSU Organization Name Linked Data (NCSU-ONLD)
OrgRef
Ringgold
Shibboleth federations (multiple)
Virtual International Authority File (VIAF)
DBpedia
Wikidata