Scientific – Data Scientist

at TechDigital
Published April 12, 2023
Location Lawrenceville, United States of America
Category Data Science  
Job Type Full-time  


Job Description

Leads discovery and optimization (LDO) is a diverse group of scientists and engineers, providing critical assay information to therapeutic research centers (TRCs) throughout research and early development (R&ED). We are seeking a highly motivated and innovative data scientist to join the data science and advanced analytics team within LDO until the end of 2023. The individual will develop a machine learning and Bayesian statistics-based approach to model assay variability using medium to high throughput screening datasets. The individual will work in a highly dynamic environment at the center of the R&ED drug discovery engine to develop cutting edge tools applied to complex drug discovery problems.

Roles and Responsibilities

  • Write python scripts to enable rapid cleaning and analysis of medium and high throughput datasets
  • Utilize machine learning (Client) approaches to generate small molecules features
  • Utilize Bayesian statistics approaches to estimate uncertainties in assay datasets, based on results on above Client outputs
  • Write and document programming code (python preferred) to facilitate data preparation / cleaning, model development, and evaluation
  • Produce high quality scripts, documentation, and processing pipeline by the end of 2023
  • Create deployable version of processing pipeline for near term use as a stand-alone application and ultimately future integration with enterprise suite


  • Ph.D. in quantitative sciences/engineering (computer science, mathematics, statistics, or engineering)
  • 5+ years of relevant professional experience with a proven track record in machine learning and data science – experience in drug discovery machine learning is desirable but not required
  • Strong knowledge of one or more scripting programming languages, with a focus on machine learning (e.g., Python (preferred), R, Matlab, C/C++)
  • Experience utilizing molecular features of small molecules in machine learning models
  • Experience with the use and application of Bayesian statistics and simulation methods in generating probabilistic outcomes
  • Able to extract information from databases using a variety of software packages (e.g., Oracle SQL developer)
  • Ability to build and maintain databases aligned with enterprise solutions is desirable but not required
  • Strong analytical and problem solving skills to understand technical business problems and implement solutions
  • Ability to work effectively on matrixed teams to collaboratively solve challenging problems, while also able to work independently with minimal resources
  • Has good interpersonal, communication, writing and organizational skills
  • Strong preference for on-site presence to enable colocation with data science team