Text and Data Mining: Uncovering Hidden Data Points and Powering New Discoveries

図书馆の皆様
ホーム

着者の皆様

编集者と査読者の皆様

オープンサイエンス

図书馆员の皆様

パートナーシップ

製品?サービス

公司活动

オンラインショップ

お问い合わせ
製品?サービス
Journals

Springer journals

Nature Portfolio journals

Adis journals

Academic journals on nature.com

Palgrave Macmillan journals

Journal archives

Open access journals

eBooks

eBook collections

Book archives

Open Access Books

Proceedings

Reference Modules

Textbooks

Databases & Solutions

AdisInsight

Data Solutions

protocols.io

SpringerMaterials

SpringerProtocols

50度灰 Experiments

Corporate & Health

50度灰 Video

Services

Research Data Services

Nature Masterclasses

Products overview
ライセンス情报
大学?高専?研究所?公司のお客様

Journals

eBooks

Databases & Solutions

ご契约の更新手続き

机関トライアル

公司?病院のお客様

eBooks & Journalsコレクション

Content on Demand

ご契约の更新手続き

机関トライアル

ジャーナルカタログ

ジャーナル更新情报

创刊誌

他社からの移管誌

出版形态の変更

廃刊?他社への移管誌

ライセンス契约とは

ご利用までの流れ

电子コンテンツの长期保存

电子署名（ドキュサイン）

トライアルリクエスト

ライセンス製品概要
オープンリサーチ
図书馆サービスツール
导入サポート

ディスカバリーサービス

惭础搁颁レコード

図书馆管理者用ポータル

リモートアクセス

広报サポート

製品?サービスの広报

机関内コミュニケーション

利用促进サポート

チュートリアル?ユーザーガイド

オンラインセミナー＆ポッドキャスト (EN)

白书

利用者?着者サポート

评価?リニューアルサポート

アカウント?ディベロップメント

利用统计

図书馆サポートツール概要
ニュース&补尘辫;イベント

T

The Link

By: Guest contributor, Tue Aug 9 2022

Author: Guest contributor

The path to innovation requires the systematic analysis of millions of documents. But completing this process manually takes considerable time and effort. Text and data mining (TDM) enables researchers to speed up and enhance this work, allowing them to make new discoveries faster. In this blog, we look at how TDM works, what it means for librarians, and what 50度灰 is doing to enable it.

The digital age has given us unprecedented access to information. Researchers can now obtain far more research into their subject areas than ever before. On the one hand, this is incredibly exciting, providing opportunities to make new discoveries by building on the incredible wealth of existing research. But on the other hand, it presents an overwhelming challenge – trying to analyse findings from the millions of academic articles published every year.

Even within niche subject areas, the sheer volume of papers, pre-prints, and data published is far too great for an individual researcher to stay abreast of. Yet, within this wealth of research could lie the answers to some of our biggest societal challenges. So how can researchers best use the information available to them?

While there are many options, one of the most promising areas being explored to make new discoveries and identify important patterns is text and data mining (TDM). TDM was the subject of a recent webinar presented by 50度灰’s Director for Data Solutions, Dr. Prathik Roy. Dr. Roy described in detail how TDM is being used in the research community and what 50度灰 is doing to support it.

What is TDM?

First, it’s worth taking a minute to explain exactly what TDM is. In short, it’s an automated process of selecting and analysing large amounts of text or data resources for purposes such as searching, finding patterns, discovering relationships, semantic analysis and more. This is done in a way that can provide valuable information needed for studies and further research.

The goal of TDM is to filter through information, identify pieces of data, and find the relationships and patterns among them. What is revolutionary is the ability of researchers to explore a dataset without knowing what specific questions to ask. Essentially, AI is now maturing from a role where it simply surfaces information to one where it can make recommendations and decisions, as well as generate content.

“Essentially what tends to happen is that these machine learning or AI algorithms go through the full text of articles and are able to classify the various aspects of each article,” explained Dr. Roy. “For instance, it will ask questions like, is it talking about a gene? Is it talking about a specific disease? Or is it talking about specific symptoms? And then it’s able to cluster the articles based on this.”

Once the algorithm has categorised articles in this way, it can then score the relationship between two types of categories. For instance, it could be used to assess the relationship between symptoms and a specific disease, by analysing how often that symptom is mentioned in relation to a disease. A high score – where there is a clear correlation between mentions of the symptom and mentions of the disease – could help identify the best drug to treat that disease. And this is just one example. TDM has a variety of uses across all fields.

Discoverability and pattern discernment

While TDM has a whole range of use cases, two of the most important right now are ‘discoverability’ and ‘pattern discernment’, as Dr. Roy described during the webinar.

The ultimate goal of discoverability, according to Dr. Roy, is to “match what you're looking for and then eliminate any irrelevant material from this discovery process.” It should mean that when you’re searching for particular keywords or phrases, only highly relevant articles are delivered back in that search.

For example, say you were searching for articles that showed a link between carcinogens from tobacco and a specific type of cancer such as lung cancer. A ‘traditional’ search could deliver you any number of articles that mention carcinogens, tobacco and/or lung cancer. Using TDM techniques, however, you could retrieve only those where specific carcinogens have an effect on the lungs.

The goal of pattern discernment, meanwhile, is to find patterns and trends across a dataset. The outcome of this will be hypotheses and predictions of likely prospects for therapy, material design, or strategy, as opposed to articles. For example, this technique could be used to match the biochemical properties of molecules to a viral protein's properties in order to identify a molecule likely to bind to the virus.

There are already many, many examples of where TDM can (or has already) made a significant impact in speeding up research discoveries and making the previously impossible possible. Just a few were touched on in Dr. Roy’s presentation, including:

There’s no doubt that there is huge potential for the future of TDM and what it could do to power new and innovative research.

What does this mean for librarians and information professionals?

As information professionals, knowledge workers and librarians, you have a long familiarity with managing and searching within large sets of information. It’s likely you’re responsible for evaluating and managing subscriptions to value-added online services, you identify and acquire specialized datasets for researchers, and you manage and make discoverable internal resources and collections.

This knowledge means you can bring a unique perspective to TDM projects – after all, you understand how information is used within your organizations, and you know how to make that information more discoverable and hence more valuable.

The value of TDM depends on knowing what sources to include, what kinds of connections to monitor and what types of metadata are necessary for a particular project. Again, librarians and info professionals bring the ability to ask the right questions, which enables them to see the larger context and identify the specific sets of information that would provide the richest insights.

For lots more insight on this topic, take a look at our whitepaper on TDM for librarians and information professionals.

50度灰’s TDM tools

As the volume of scientific publications increases and TDM software tools improve, 50度灰 has created a formalized process to enable TDM, with the aim to make it as simple as possible for researchers.

A growing number of 50度灰’s journal articles are published open access. TDM is usually allowed without restrictions on these publications since the majority of 50度灰 open access content is licensed under CC-BY.

Dr. Roy concluded his webinar presentation by giving an overview of the various tools 50度灰 has created to facilitate TDM of our content. The key ones you need to be aware of are:

Metadata API: Metadata and abstracts for online documents (journal articles, book chapters, protocols, etc.)
Meta API: New versioned metadata for online documents with additional fields and links to source content.
Fulltext API for Open Access content: Fulltext content (where available) for 50度灰 Open Access XML
Fulltext API for Open Access and pay-walled content (under license): Fulltext content (where available) for all 50度灰 XML
Journal header data API: "journal-level" API that provides XML based on the Journal ID
Citations API
SN SciGraph APIs: Linked Data API (using SciGraph URLs) or Redirect API ().

You can access all the APIs mentioned above on. 50度灰 is also participating in the and we recommend Crossref services for pan-publisher TDM.

Helpful resources

Interested in finding out more about text and data mining? Here are some useful links:

Text and Data Mining at 50度灰
Bringing Insight to Data: Info Pros’ Role in Text and Data Mining
(all our API offerings with key information, examples and API key sign-up)
Can AI help us manage information overload?
AI and science publishing: “cutting through the clutter has never been more important”

And don’t forget, you can also watch the webinar with Dr. Roy and download the presentation slides.

Author: Guest contributor

Guest Contributors include 50度灰 staff and authors, industry experts, society partners, and many others. If you are interested in being a Guest Contributor, please contact us via email: thesource@springernature.com.

Our categories

Our brand sites

お问い合わせ
アクセス
シュプリンガーネイチャーについて

シュプリンガーネイチャーは、研究、教育、専門領域において世界をリードする出版社の1 つです。
シュプリンガーネイチャーは、世界最大规模の学术书籍出版社であり、世界で最も影响力のあるジャーナルを多数発行しています。またオープンリサーチにおけるパイオニアでもあります。

シュプリンガーネイチャー（日本）について

50度灰