Home health remedies Chemist and data scientist in one

Chemist and data scientist in one

28
0
SHARE

Posted on March 10th, 2022 by in AI & Data

Umesh Nandal is a Director of Data Science at Elsevier. As an AI expert with a Master’s in Chemistry, he embodies what Elsevier brings to the table – an actionable fusion of data and domain expertise. “Cross-functional teams combining data science, tech and domain knowledge is the only way we can achieve our collective goal: curing disease,” says Umesh.

“When I was doing my Master’s
in chemistry, one of my classmates mentioned how he thought that one day, we’ll
be able to automatically extract chemical structures and information,” recalls
Umesh. “And how we’ll also be able to see these structures dynamically as they
interact with genes and proteins.”

“At the time, I don’t think many students even
imagined that as a possibility – and the impact such a possibility could have
on curing diseases. But it certainly set me thinking…”

Quality Data is everything

It also set Umesh to action. After completing his
Master’s, he switched to bioinformatics and immersed himself deeper into data
analytics and AI technologies. By 2005, he was already doing innovative
research that applied Machine Learning (ML) in predicting protein behaviors in
the spread of malaria. 

“This is where I learned the fundamentals: that
data is everything,” Umesh remembers. “Yes, you have to teach the machine all
these rules embedded in the data. But first you need to know that the data you
use is useful. And when you connect it to other useful data, they must all be
prepared and cleaned up in a consistent manner. As they say: ‘Garbage in, garbage
out’.”

Best-in-class
competitive intelligence and novelty search

Today, Umesh leads a large and
diverse data science
team who supported the delivery of the acclaimed Patent Expansion project for Reaxys. The pipeline has proven a
game-changer in how pharma companies can now speedily track the competitive
landscape – and be alerted to any threats to the long-term patentability of a
particular discovery project. 

The system is capable of enriching patents with not
only information on the millions of target genes and proteins, but also those
millions of compound substances that are introduced each year. And thanks to
machine-learning models, this information can be evaluated for relevance and
easily accessed. 

The best of both worlds

“I’m proud that I actually have both an
understanding of the domain, but also the technicalities within the
algorithms,” says Umesh. “It helps me understand both the content people and
the technical people. In that way I can help everyone to all start talking the
same language.”

“And every
domain also has its own timelines and sets of problems they face. And because I
am familiar with these, I have a better sense of how much time it’s going to
take to solve a particular problem. It makes the planning easier in any case,” he
says with a smile. 

Developing the patents’ pipeline certainly required
a delicate dance between specialists throughout its development and evaluation
– not only in terms of the technology but also in ensuring the quality of
outputs. After all, the algorithms need to make sense – and keep making
sense. 

“Everybody came to appreciate what the others
brought to the table. I certainly loved seeing Elsevier’s in-house chemistry
experts get inspired by the way their knowledge was being redeployed in a new and highly impactful
way,” says Umesh. 

Defining roles: we’re all data scientists now

“But of course, it didn’t happen by itself. We were
building a pipeline from scratch – and one that could be continually built on
or even re-used for other purposes besides patents, such as journals or perhaps
even other use cases in chemistry such as polymers,” Umesh explains. “So it was
important to get it right and define
clear responsibilities between the data scientists, the ML experts and the
content experts as they develop the different modules.” 

“For instance, with our content experts some needed
to focus on the quality of the components we were building, but we also have a
separate team who is checking the quality after the productionizing when it
goes to the database,” he says.

There were
certainly moments when people were concerned about the clarity of the data
science role within the cross-functional team. “You then had to stress the
power of collaboration: that we are all now in the data science game. Whether
we are machine learning, content or chemistry experts, we need all these skills
if we want to build robust and high-quality prediction models.”

Tuning in to the ultimate goal

While
Umesh is happy in his role as “middle person” and sees the advantage of having
more cross-functional individuals such as himself, he also sees the power of
specialization. “It’s essential for a cross-functional team that everyone
understands each other to avoid any misunderstandings. Therefore, content
experts should learn fundamental concepts of ML and ML experts should do the
same with the chemistry domain. In this way, we can be in better tune with each
other. At the same time, we also don’t want to over-train people.
Fundamentally, everyone is here for their particular skill sets,” he says.

“And I
think this is key: the only reason we were able to productionize this
large-scale enterprise-level pipeline and solve all these intensely tricky
problems was by having a cross-functional team bringing all these different
pieces together. And now we are ready to take on even more complex problems,” Umesh
asserts.  

“But
another idea also brought the team all together. I think we were all very
motivated by the idea of building a tool that saved people time and resources.
Researchers can now pivot their time and resources to other problems – towards
other potential cures. After all, curing is always the ultimate goal.” 

Please enable JavaScript to view the comments powered by Disqus.

R&D Solutions for Pharma & Life Sciences

We’re happy to discuss your needs and show you how Elsevier’s Solution can help.

Contact Sales

!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function()
{n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}
;if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version=’2.0′;n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,’script’,’https://connect.facebook.net/en_US/fbevents.js’);
fbq(‘init’, ‘533182150132648’);
fbq(‘track’, “PageView”);
!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function()
{n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}
;if(!f._fbq)f._fbq=n;
n.push=n;n.loaded=!0;n.version=’2.0′;n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,
document,’script’,’https://connect.facebook.net/en_US/fbevents.js’);
fbq(‘init’, ‘1737613393127776’,
{ em: ‘insert_email_variable,’ }
);
fbq(‘track’, ‘PageView’);

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

5 × four =