Home health remedies Mutually empowering – semantic-based machine learning and subject matter expertise

Mutually empowering – semantic-based machine learning and subject matter expertise

3
0
SHARE

Posted on May 7th, 2021 by in AI & Data

In a day dedicated to emerging science and technologies at the Pistoia Alliance virtual conference Collaborative R&D in Action, SciBite CTO James Malone opened the program with a compelling exploration of use cases for semantic-based machine learning (ML). A simple but elegant ML strategy based on “seeding” named entity recognition (NER) can facilitate ontology creation, drive language translation, take a crack at gaining insights from social media platforms, and generate answers to questions faster. His most important take-away: semantic-based ML and subject matter expertise (SME) are mutually empowering.

NER learns a new domain. Or language.

In this strategy, “seed” terms are passed to an ML model which
then identifies candidate terms in ingested text that are similar to the seed.
That similarity may be based on location relative to the seed term, word root,
or other. The resulting cluster of terms is a bit noisy and requires review,
but in an iterative process of term seeding, candidate generation, and
candidate review and pruning, those clusters grow into meaningful categories.
Scaling up the process by training a transformer model, Malone and his team
were able to construct a 6000-term ontology for genetic variation. Essential to
this incremental machine learning is the SME to review and improve term
clusters at each iteration. With both elements – ML and SME – models built on
this strategy are highly flexible in terms of application. For example, the
strategy turned out to be very good at translating Japanese to English, but not
without the supervision of native speakers.

Open-minded NER for real-world language

Another example comes from models trained using ontology annotated
data to extract insights from broad data sources as real-world evidence – like
Facebook, Reddit, Twitter or patient network forums. These often stumble over
the mismatch between standardized scientific terminology and the looser
language used by the general public. Because the models have been trained to
understand how genes or diseases (for instance) appear within a sentence, they
are able to infer phrases that look like they should be genes or diseases
because of the language used. So, for example, posts can be scanned for drug
names and sentences which appear to describe an adverse event can be identified,
even if the phrase is not in an ontology – such as “could not sleep,” as
opposed to seeking the specific term “insomnia”. This more flexible approach to
NER may open new opportunities to fine-tune the analysis of these far broader
and content-laden sources. However, the necessarily looser semantics calls for
SME to validate outcomes.

Know your data. Know your semantics.

Beyond term extraction, the utility of such a “forgiving”
semantic-based ML may support Bidirectional Encoder Representations from
Transformers (BERT) in situations where a question may have multiple answers
hidden in a very large body of text, or when answers are conflicting. The
ML-supported NER can narrow down paragraphs of text from which answers are most
likely to be found to streamline real-time processing.

Regardless of application, however, it remains essential to
understand the problem you are tackling, know the data you use to solve it, and
apply SME to know if your output is correct and useful. Malone described
a foray into Wikipedia content that underscored the importance of that
foresight even when working with structured content. Consider Wikipedia’s
topical hierarchy, which makes “hearing” a sub-category of “perception” but
then positions “Peruvian folk music” under “hearing.” That’s likely surprising
to humans, but is it to machines?

Discover more about SciBiteAI

Please enable JavaScript to view the comments powered by Disqus.

R&D Solutions for Pharma & Life Sciences

We’re happy to discuss your needs and show you how Elsevier’s Solution can help.

Contact Sales

!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function()
{n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}
;if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version=’2.0′;n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,’script’,’https://connect.facebook.net/en_US/fbevents.js’);
fbq(‘init’, ‘533182150132648’);
fbq(‘track’, “PageView”);
!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function()
{n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}
;if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version=’2.0′;n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,’script’,’https://connect.facebook.net/en_US/fbevents.js’);
fbq(‘init’, ‘1737613393127776’,
{ em: ‘insert_email_variable,’ }
);
fbq(‘track’, ‘PageView’);

Source link