What can dynamic data visualization do for us?

Data visualization

The fourth paradigm of science brings with it an onslaught of data. Quantitative, qualitative, direct and anecdotal, it’s an often-acknowledged fact that the ability to collect and share vast quantities of data is the greatest change in scientific research of our times. With this new opportunity comes inherent challenges in the comprehension of data.

Enter data visualisation. For the purposes of this blog, data visualisation is defined as the dynamic manipulation of data to aid understanding and provide context; data visualisations that are formed on demand from criteria set by the user.

This is distinct from the many online tools that create graphs, charts, storyboards and timelines available for free or on a subscription which function using inputted data to create a static visualisation for publication. To further narrow the focus, this also excludes data visualisation tools that are interactive, but limited to the exploration of a single set type of visualisation; an interactive map for example.

In academic publishing we are only just setting out on the dynamic data visualisation journey. There are some great examples of visualisations that provide context and clarity to the exploration of datasets. Among them are SchoolDash, which provides maps, dashboards, statistics and analysis of schools data in England for the use of the public, journalists, policymakers, and the schools themselves, created by Timo Hannay, the founder of Digital Science.

School dash

Also notable is a recent project completed by Semantico for McGraw-Hill Education, where a dynamic data visualisation tool – DataVis Material Properties – was added to their Access Engineering database to provide students and researchers alike with configurable visualisations of material properties data, including cost, to provide instant visual context to materials data. With the functionality to save multiple visualisations, and to ‘dig down’ into more detailed information, the tool supports students, educators and researchers to tell stories with the visualisations.

Data visualization

The potential applications of tools of this type are myriad. In his excellent 2010 TED talk, The Beauty of Data Visualisation, David McCandless quoted research by Tor Norretranders that found the bandwidth of our visual senses was equable to that of a computer network (for comparison, our sense of taste has the throughput of a pocket calculator), and that data visualisations take advantage of our brains’ capacity for spotting patterns and making connections. Using dynamic data visualisations for databases and datasets could and should become a mainstream aspect of publishing research in the future.

By simply concluding, “Data Visualisations are brilliant! We should have more!”, we would be missing the real untapped potential of dynamic data visualisation. As it stands, visualizations are taken from cleaned – and therefore closed off – datasets. Imagine then if visualizations could be made from the vast raw datasets languishing in data dumps. If, instead of neatly fencing off data as an analysis reporting tool, it was scraped from raw datasets as an integral part of the research process. Indeed, in an ideal world, if these datasets could be stitched. This would free the data from the bounds of perspective and ideology.

Scaling back these heady imaginings to what is possible within current data quality conventions, the application of data visualizations to raw data is an opportunity that as it stands, we are missing. Dynamic visualisations, taken from raw data and outputted as html are achievable. These would answer a number of needs. The need for the rapid availability of data results, the need for further context, and the need for the research output in one scholarly discipline to be available and understandable to others.

There are barriers to this. Data quality is imperative, and of course, the need for raw data to be reviewable. Perhaps the main barrier for publishers though, is our continued attachment to the print mental model. The main method of publishing research is the journal article, submitted online, reviewed online, produced online and published online – as a pdf. A flat online facsimile of the printed journal article.

Dynamic data visualisation is the scholarly, educational and indeed, information, publishing opportunity of our times. Integrated into publishing workflows and online publishing platforms it could springboard the effectiveness and usability of information to the next level.

Why is integrated content important?

integrated content
Integrated content is a principle we stand by at Semantico, here’s why.

What is integrated content?

‘Integrated content’ is one of those terms that does the rounds in scholarly and professional publishing, while meaning vastly different things depending on the context. The term has stood variously for the combination of content types on a platform, such as journal articles and ebooks, or for the merging of in-house editorial and production procedures. Defined here as the intelligent integration of different kinds of content on the same platform, or even within the same content container, such as a webpage or monograph, integrated content is the future of digital publishing.

This is nothing new. Since the heady days of the first eBook, publishing pundits have predicted that content will become bitesize, with searches returning multiple snippets of content to answer queries exactly. We are not there yet, and given the difficulties of consuming granular pieces of information from a variety of sources and contexts without feeling seasick, it may take quite some time and a lot of effort on the part of publishers in imposing a compatible editorial “voice” across content types to reach this container-free nirvana.

Where we are is a pragmatic, user-centric approach to content integration. Users demand powerful search and machine readable accessibility, intuitive content discoverability within the platform, meaningful linking to related content, smart tools such as annotation, the interoperability to support researcher workflows such as sharing, but above all, simplicity.

An integrated approach

Our approach to integrated content starts with our technical architecture. Built for flexibility in order to enable and facilitate multiple types of content and an optimized user experience, it supports customer-driven platform design within a software-as-service (SaaS) infrastructure.

In the scoping and planning phases, XML files are used for domain modelling. Essentially a process of understanding the shape and form of content, from a classic journal article through to an individual scene in a play, to explore how it interacts and how value can be added to the user experience. This is primarily to ensure that the content’s complexity is fully understood, but also to maximize the content’s potential.

Of course, the potential of a body of content in the age of text mining and API interoperability is massive, so it is the role of UX to understand what structure and features the end-users actually need and want. Once this is established a mental model of what users expect to find and where can be mapped against the domain model to identify the optimum structure, taking into account established conventions and accessibility standards. An interface can then be designed that either hides the complex structure, or makes the technology and underlying data transparent, depending on the user requirements.

Why is integrated content important?

The importance of this approach is that it builds a great user experience that delivers on the richness of a content body as a whole, as opposed to its component parts. In addition to search and discovery, this empowers a whole raft features that enhance the understanding and impact of the content, among them data visualization and linked multimedia or overlay content.

The vast majority of platforms are accessed by distinctly different user groups, or by users with varying levels of understanding of the subject matter. An integrated approach to content makes it possible to address multiple groups’ needs with features such as adaptable content curation or faceted search. Further, the full diversity of a body of content can be realized, through interdisciplinary linking, or the cross-promotion of content that is of interest in more than one discipline.

As a use-case, integrated content is an ideal publishing platform model for societies, when supported by flexible identity and access management. Frequently with a membership body comprising of professionals, academics, commercial organisations and students, their content is necessarily diverse. Though their original purpose is in whole or part as a professional body, it is often the academic users who are more aware of their content and access it more frequently. The ability to cross-promote and enhance content through integration grows usage, protects membership subscriptions – and potentially drives further subscriptions by developing must-have content sets.

Integrated content case studies

The Royal Marsden Manual

Known by many in the healthcare sector as “The bible of nursing”, The Royal Marsden Manual is a reference work used by students and practitioners throughout the UK for study and reaccreditation. The online edition of the 9th Edition of the Royal Marsden Manual Online was built by Semantico in partnership with publishers, Wiley. Shortlisted for a prestigious 2016 Association of Publishers Digital Publishing Award in the Best Use of Mobile category, integrated content was central to the success of the work.

Though there had been a previous online edition of the work, the interface was cluttered and did not facilitate simple discovery and intuitive use. A simple and easy to use interface was essential for the work’s users, which includes clinical and support staff with varying levels of IT literacy. Further, a key purpose of the online edition was to allow users to upload new procedure documents. The process for process updates had been complex and as a result, the feature little utilized in the previous edition.

Collaboration between the publishers and our UX and technical teams resulted in a simplified interface designed around progressive information disclosure. This supported point of care use on mobile by integrating the disparate content types, including guidelines, best practise, procedural notes, location-specific and other custom procedural notes and user-updates in a drill-down hierarchical list for intuitive discovery. On the page, user-generated content was integrated through a simplified interface for procedure information, with clear user signposting and a direct-entry custom content upload facility that eliminated the requirement for documents to be uploaded in order to make updates.

Drama Online

Built by Semantico in collaboration with publishers Bloomsbury, Drama Online is a digital library of playtexts, filmed live performances, audio plays, theory and practice. Designed for students, it features a combination of original playtext with films and performances, workshop videos and resources, and academic and practical written content. In addition, the resource allows bookmarking for fast access to key content and personalized annotation to aid study.

The simplicity of the interface hinges around its summary pages, for playtexts, playwrights & practitioners, genres, periods, context & criticism and theatre craft, where content is sub-categorized by format and type within. Widely acclaimed, the platform has won and been shortlisted for a number of awards, including the ALPSP 2013 Publishing Innovation Awards, who awarded it Highly Commended and described the work as “A tool which clearly enhances the study and performance of drama.”

When is integrated content not important?

There is an argument that for homogenous bodies of content, such as stand-alone journals or eBooks content, integrated content is not important. This is not the case as within each of these types of content are embedded media which will benefit from the integrated approach; graphs, charts, figures, tables and bibliographies all deserve special indexing and presentation to maximise their utility and usefulness. Every body of content is different, whether those differences are its strength in a particular subject matter, its interdisciplinary potential, or its appeal to particular user groups. By understanding that content and optimizing its potential with a flexible platform architecture, both the user experience – and ultimately, its commercial value – can be maximized.

Towards a universally adopted institutional identifier

Institutional identifier

Managing the accurate accreditation of individuals, institutions and funding bodies without a universally adopted institutional identifier is a daily challenge for publishers. Although there is a universally accepted identifier for individuals in the form of the Open Researcher Contributor ID (commonly known as ORCiD), and the CrossRef-produced Open Funder Registry for research funding bodies, there is no single solution for institutional identifiers.

This is becoming increasingly untenable. The pressure from funding bodies – and the institutions themselves – to make their research output quickly and easily identifiable has brought this issue up the agenda. The problem is that most institutions are known under multiple names, few of which are actively incorrect.

The story so far

The ORCiD identifier has created an open database of researchers, linked to all of their published work, that can be used by third parties to build services, all based on a standardised numeric identifier. The Open Funder Registry has similarly created an open and searchable list of funding bodies, linked to the work they have funded. Unlike ORCiD, however, the Open Funder Registry uses a text identifier, rather than a numerical one, leaving the system open to error – how many end users are going to know which of the four ‘Ronald McDonald House Charities’ they are supposed to cite as their funding body?

The introduction of a standardised institutional identifier would eliminate any room for uncertainty about which institution an author is affiliated with. If funding bodies were also assigned a numerical institutional identifier this would close the loop, linking individuals, institutions and funding bodies to one another. This three-dimensional view would give publishers and institutions invaluable insight into research and publishing trends. Potential applications of this data include identifying interdisciplinary research opportunities within institutions, or mapping research output by field and institution, among many others.

Further, an institutional identifier would improve content discovery by making research easily searchable by institution. Bearing in mind that researchers have multiple affiliations, and indeed can also have more than one funding body, this would be a significant improvement.

Why now?

So why has an institutional identifier not been put in place already? The answer is, for the most part, a lack of commercial drivers. To ensure any system is universally adopted it has to be open access, and the pay-off for putting the system in place has not been great enough to motivate industry co-operation. Until now: the success of the ORCiD initiative and the momentum behind improvements to content discovery mean that an institutional identifier is imminent.

Towards an institutional identifier

At the STM Annual US Conference 2016 our own Co-Founder and Chairman, Richard Padley, is moderating a panel discussion called ‘Identifying the identifiers!’, where Laurel Haak, Executive Director of ORCiD, Chuck Koscher, Director of Technology at CrossRef, and Howard Ratner, Executive Director of CHORUS, are to discuss the future of identifiers and unveil plans for a universal standard. Stay tuned for more news!

Existing identifiers:

International Coalition of Library Consortia (ICOLC)
International Standard Name Identifier (ISNI)
NCSU Organization Name Linked Data (NCSU-ONLD)
Shibboleth federations (multiple)
Virtual International Authority File (VIAF)

Is Frankfurt too big?

An exhibition hall at the Frankfurt Book Fair
Source: Toni Rodrigo. License: http://bit.ly/1ryPA8o

As huge a fan I am of the venerable, 500-year old institution that is the Frankfurt Book Fair, it occurred to me to marvel, slightly, when I was there this year, at its massivity and capaciousness.

After all, just look at the different categories of visitors who come each year: publishers, agents, booksellers, librarians, academics, illustrators, service providers, film producers, translators, printers, professional and trade associations, institutions, artists, authors, antiquarians, software and multimedia suppliers – not to mention journalists and members of the general public.

In other industries, the pattern has tended to be that smaller shows grow up, focused on more particular niche interests, to serve more targeted audiences. In the last fifteen years or so we have also seen the emergence (and sometimes fast demise) of shows focused specifically on the use of digital technology within particular sectors.

So why hasn’t this happened in publishing? Why is the Frankfurt Book Fair so big and all-encompassing?

Read more