Scientific – Data Scientist
Published | April 12, 2023 |
Location | Lawrenceville, United States of America |
Category | Data Science |
Job Type | Full-time |
Description

Job Description
Leads discovery and optimization (LDO) is a diverse group of scientists and engineers, providing critical assay information to therapeutic research centers (TRCs) throughout research and early development (R&ED). We are seeking a highly motivated and innovative data scientist to join the data science and advanced analytics team within LDO until the end of 2023. The individual will develop a machine learning and Bayesian statistics-based approach to model assay variability using medium to high throughput screening datasets. The individual will work in a highly dynamic environment at the center of the R&ED drug discovery engine to develop cutting edge tools applied to complex drug discovery problems.
Roles and Responsibilities
- Write python scripts to enable rapid cleaning and analysis of medium and high throughput datasets
- Utilize machine learning (Client) approaches to generate small molecules features
- Utilize Bayesian statistics approaches to estimate uncertainties in assay datasets, based on results on above Client outputs
- Write and document programming code (python preferred) to facilitate data preparation / cleaning, model development, and evaluation
- Produce high quality scripts, documentation, and processing pipeline by the end of 2023
- Create deployable version of processing pipeline for near term use as a stand-alone application and ultimately future integration with enterprise suite
Qualifications
- Ph.D. in quantitative sciences/engineering (computer science, mathematics, statistics, or engineering)
- 5+ years of relevant professional experience with a proven track record in machine learning and data science – experience in drug discovery machine learning is desirable but not required
- Strong knowledge of one or more scripting programming languages, with a focus on machine learning (e.g., Python (preferred), R, Matlab, C/C++)
- Experience utilizing molecular features of small molecules in machine learning models
- Experience with the use and application of Bayesian statistics and simulation methods in generating probabilistic outcomes
- Able to extract information from databases using a variety of software packages (e.g., Oracle SQL developer)
- Ability to build and maintain databases aligned with enterprise solutions is desirable but not required
- Strong analytical and problem solving skills to understand technical business problems and implement solutions
- Ability to work effectively on matrixed teams to collaboratively solve challenging problems, while also able to work independently with minimal resources
- Has good interpersonal, communication, writing and organizational skills
- Strong preference for on-site presence to enable colocation with data science team