AI and Science Publishing: â€œCutting through the clutter has never been more importantâ€

Librarians
Home

Products

Open science

Authors

Editors

Peer Reviewers

Policies

Librarians

Booksellers

Societies

Partners

Shop

What we do

Contact

-------

Corporate Site â†—

Media Centre â†—

Careers â†—
Products
Journals

Springer journals

Nature Portfolio journals

Adis journals

Academic journals on nature.com

Palgrave Macmillan journals

Journal archives

Open access journals

eBooks

eBook collections

Book archives

Open Access Books

Proceedings

Reference Modules

Textbooks

Databases & Solutions

AdisInsight

Data Solutions

protocols.io

SpringerMaterials

SpringerProtocols

50¶È»Ò Experiments

Corporate & Health

50¶È»Ò Video

Services

Research Data Services

Nature Masterclasses

Products overview
Licensing
Academic, Government & Corporate

Journals

eBooks

Databases

Request a trial

Request a demo

Request a quote

Corporate & Health solutions

eBook and Journal collections

Content on Demand (CoD)

Quote, trial or demo

Journals catalog

Serials Update

New Starts

Take-overs

Publishing Model Changes

Cessations & Transfers

Licensing A-Z

How it works

Talk to a Licensing Manager

Docusign

Digital preservation

Licensing overview
Tools & Services
Implement

Discovery at 50¶È»Ò

MARC Records

Librarian Portal

Remote Access

Promote

Content Promotion

Library Promotion

Learn

Tutorials & User Guides

Webinars & Podcasts

White papers

Support your users

Evaluate

Account Development

Usage reporting

Tools & Services overview
Blog
DEI

eBooks

Journals

Open Science

R&D

Research Management

Researcher Support

SDGs

Technology

Tools & Services

All posts

Overview Page
Contact
Stay informed

T

The Link

By: Saskia Hoving, Mon Aug 1 2022

Author: Saskia Hoving

There is more published research available than ever before. And while itâ€™s a vital resource during times of crisis â€“ like the Covid-19 pandemic â€“ the sheer volume of information means itâ€™s all too easy for important findings to be overlooked. In the second of two blogs on using AI to tackle information overload, we turn the focus on times of crisis and a unique solution developed to cut through the clutter.

The first wave of the Covid-19 pandemic in 2020 saw an explosion in published articles tackling the tricky topic of Sars-Cov-2. The number of articles published about Covid-19 grew from zero to 28,000 in just the first six months of the pandemic. In mid-May, nearly 3,000 papers were published in a single week. The Director-General of the World Health Organization (WHO), Tedros Adhanom Ghebreyesus, addressed this at the 2020 Munich Security Conference, stating, â€œWe're not just fighting a pandemic; we're fighting an infodemicâ€. The vast quantities of research output were overwhelming to the research community. But more than that, the immediate and widespread sharing of medical and other scientific information outside of expert circles before it has been thoroughly vetted (for example, with the steady rise of preprints at this time) was dangerous for the public.

Even academics called for restraint at the time, with that the COVID-19 pandemic was leading to a â€œflood of â€˜uselessâ€™ scienceâ€. And while there were already some solutions to this issue available â€“ such as and â€“ unfortunately, researchers we spoke to were mostly unaware of them. So, the question for us as publishers was â€“ is there something we can do about this? The answer lay in an innovative approach developed to produce the first machine-generated book,

In a webinar, Markus Kaindl, 50¶È»Òâ€™s Group Product Manager for Research Intelligence, explained the development of an app that supported researchers during the pandemic. Here, we take a look at what he covered and the developments that have followed.

Creating a simple overview using an AI-based report

In March 2020, we started by creating a simple overview of recent 50¶È»Ò publications on Covid-19 using an automated report. A broad spectrum of 144 English publications that passed our critical filtering was used as the source material. This included original papers, news, snippets, editorial notes, and brief communications.

To make it as useful as possible, Markus explained that the team spoke to biologists and virologists to better understand the community's pain points. That enabled them to move fast and create an â€œearly daysâ€ prototype for feedback as quickly as possible.

The team used technology similar to that used to produce the machine-generated book, which helped them group content that had been pre-filtered in a meaningful way around the outbreak. As citations couldnâ€™t be used to identify the most compelling content (as many had only just been published), other metrics were used â€“ such as platform downloads or digital media mentions.

The result provided extractive summary snippets for a quick inspection, as well as a link directly through to the original publication.

Creating something more personalized

The question remained, how could we make this tool findable and accessible for researchers and ensure it didnâ€™t become another one of those helpful apps that are never used?

"To us, it became clear, we needed to move from a dynamic, but still, somehow, static report that we generated using AI to an app,â€ explained Markus Kaindl in the webinar. â€œWe are in the fortunate position to be able to leverage a unique combination of centuries-old brands, strong credibility within our communities, a good understanding of our user's needs, and technology solutions at hand that were developed for other products in-house."

So, after receiving positive feedback, both externally and internally on the tool prototype, the next step for the team was to experiment in various directions, creating a framework for personalized research exploration as part of an app.

Some of the key areas explored include:
â—   Domain, persona and task-specific content recommendations
â—   Content across all publishers using the public
â—   Reading lists, automatic summaries
â—   Most prolific potential collaborators

"The beauty of this approach is that it will not only make a difference for Covid-19 research,â€ explained Markus. â€œWe believe [it will be useful] for many other urgent areas of research like sustainability and climate change, for example."

Extreme summarization and TLDRs

There is so much potential in AI to support researchers and weâ€™re only at the beginning of that journey. Another area explored by Markus in his part of the webinar was â€œTLDRsâ€. (Classically known as TL;DR which stands for â€˜too long; didnâ€™t readâ€™.)

TLDRs are a form of â€˜extreme summarizationâ€™ and act as an alternative to abstracts. TLDRs of scientific papers leave out the non-essential background or methodological details and capture the key important aspects of the paper.

The challenge with creating them is that writing a TLDR of a scientific paper requires expert background knowledge and complex domain-specific language understanding. This helps in identifying the salient aspects of the paper while maintaining faithfulness to the source and the correctness of the written summary. An initial pilot using AI to create TLDRs was run within the computer science subject area, with great success.

"We fed in the abstract, introduction, and conclusions of individual sample papers,â€ explained Markus. â€œAnd then asked the authors of those papers about the resulting TLDR. Not only were they judged as correct, but also as highly useful."

Natural language models and AI content generation

Another area Markus explored in the webinar was whether AI could be used to generate scientific content. To do this, he started by explaining natural language models â€“ which most of us know best for easements like search autocomplete and typing suggestions now omnipresent in online products

Essentially, you give a language model a â€˜primerâ€™ (such as the start of a search query) and it will then suggest the rest. The question Markus posed was, â€œCan we use this to automate scientific research generation?â€

To test this, Markusâ€™ team set out to train a language model with 20,000 paper introductions from the Association of Computational Linguistics. The result was impressive, with the program able to produce accurate, compelling text from a simple primer sentence. But how could this be used, particularly considering the ethical considerations involved in ensuring that fake science isnâ€™t produced by such a programme?

"It is clear that we cannot, and also do not want to, delete the human from the loop,â€ said Markus. â€œMy hope, if we manage to master this as a science publisher, is that we will be able to support researchers when kickstarting the writing â€“ helping them overcome writersâ€™ block. The machine can just generate a suggestion and the human can edit it to its final perfection."

This approach could also work well in applications like science journalism or to create automatically generated and dynamically curated topical pages â€“ for example on a topic like a climate change.

What does this all mean for librarians?

One of the first questions asked during the webinar was where librarians fit into these discussions regarding AI?

"This means a paradigm shift,â€ answered Markus. â€œI think the focus will move from providing content to researchers to providing services that help cut through the clutter."

Markus went on to say that one of the challenges for librarians will be to train and educate younger authors and researchers about the options. Another is being aware of the risks â€“ checking your sources are always right, actively addressing plagiarism and intellectual property questions, and so on.

"Librarians are â€˜Knowledge Incubatorsâ€™,â€ concluded Markus. â€œAnd they can be the research translators too. So the message is to embrace this technology and learn how to use it."

Itâ€™s only the start for AI in publishing

Building on the examples weâ€™ve looked through here, most recently Nature has developed Research Intelligence â€“ a new suite of AI-powered solutions that summarize research trends to allow organizations to quickly measure their success, uncover hidden connections, and guide their strategy. You can read all about it in this blog post.

Enjoyed this blog? Donâ€™t forget to read our on this topic, where we go into more detail on the role of publishers in managing information overload and look at the first machine-generated academic books.

Author: Saskia Hoving

In the Dordrecht office, Marketing Manager Saskia Hoving is chief editor of The Link Newsletter and The Link Blog, covering trends & insights for all facilitators of research. Focusing on the evolving role of libraries regarding SDGs, Open Science, and researcher support, she explores academia's intersection with societal progress. With a lifelong passion for sports and recent exploration into "Women's inclusion in today's science", Saskia brings dynamic insights to her work.

50¶È»Ò