Location:
Hibrid / Office in Barcelona [Spain].
Date of Publication:
01/11/2024
About Us:
CNAG (Centro Nacional de Análisis Genómico)
The Centro Nacional de Análisis Genómico (CNAG) is one of the largest Genome Sequencing Centers in Europe.
The CNAG Consortium aims to carry out large-scale projects in DNA/RNA analysis for the improvement of quality of life in collaboration with the Spanish, European and International Research Community. CNAG researchers participate in major International Genome Initiatives such as the Human Cell Atlas (HCA), the International Cancer Genome Consortium (ICGC), the International Human Epigenome Consortium (IHEC), the International Rare Diseases Research Consortium (IRDiRC), the European Reference Genome Atlas (ERGA) and the European Infrastructure for life-science information (ELIXIR), as well as in several EU-funded projects.
The Role
We have an opening for a Data Engineer to play a key role in several cancer and rare diseases related projects, such as Genomed4all (https://genomed4all.eu/) and EJP-RD (https://www.ejprarediseases.org/).
For Genomed4all we are developing a platform for federated learning based on flower (https://flower.ai/) and mlflow (https://mlflow.org/). In EJP-RD we are further developing the RD-Connect GPAP (https://platform.rd-connect.eu/) and contributing to the EJP-RD Virtual Platform of data and resources. With the supervision of the lead of the Data Platforms and Tools Development team and in collaboration with cancer specialists, bioinformaticians and software engineers, the successful candidate will implement the data infrastructure and back-end of the product for the federated platform and cancer platform.
The Team
The successful candidate will join the Data Platforms and Tools Development team, coordinated by Dr. Davide Piscia (https://www.cnag.crg.eu/teams/bioinformatics-unit/data-platforms-and-tools-development). The team is part of the CNAG Bioinformatics Unit (led by Dr. Sergi Beltran), which has over 30 members and offers continuous growth and support on a professional level.
The team works in a stimulating scientific environment, applying state-of-the-art technologies to breakthrough research projects in Genomics that have an impact on people’s health.
Responsibilities
Implement pipelines in Apache Spark
Integrate Machine learning models into a federated learning platform
Integrate pipelines in Jenkins pipeline or NextFlow workflow manager systems
Collaborate with back-end developers and bioinformaticians to integrate data into platforms
Benchmark, develop and implement services and queries on SQL (Postgres) and NoSQL databases (Clickhouse, Elasticsearch, MongoDB, etc.)
Gather and address technical and design requirements
Follow emerging technologies
Requirements
Bachelor degree or Master degree in Computer science or related fields
A minimum experience of 2 years in a related position on software development, preferentially as a Data engineer.
Hands on experience with programming languages like Python, Scala, Rust and similar
Understanding of pipeline orchestration
Knowledge of distributed computing (Apache Spark, Apache Flink or similar)
Good organisational, prioritising, communication and interpersonal skills
Good spoken and written English
Nice to have
Experience with genomics and clinical data
Experience with federated learning framework ( flower, pysyft,etc..)
Experience with work-flow orchestrator (Jenkins pipeline, Nextflow, Airflow, prefect, snakemake, etc.)
Experience with databases (Postgres, Clickhouse, Elasticsearch, Cassandra, etc.)
Experience with MlOps ( mlflow)
Experience with data pipeline testing
The Offer
Contract duration: Open-ended contract
Estimated annual gross salary: Salary is commensurate with qualifications and consistent with our pay scales.
Target start date: as soon as possible
Benefits
Highly stimulating environment with state-of-the-art infrastructures, and unique Professional Career Plan and development opportunities.
We offer and promote a diverse and inclusive environment and welcomes applicants regardless of age, disability, gender, nationality, race, religion or sexual orientation, in a collaborative and supportive environment.
We are committed to reconcile a work and family life for our employees and are offering the opportunity to benefit from annual leave, full health and dental Insurance, flexible schedule, and the possibility of remote work.
We look forward to receiving your application and discovering how you can contribute to CNAG's success!
How to Apply:
All applications must include:
A complete CV including contact details.
Contact details of two referees.
Cover Letter.
All applications must be addressed to People Department – mireya.fernandez@cnag.eu
Deadline:Please submit your application by 31/01/2025
Interview:Shortlisted candidates will be invited for interview at CNAG on 01/02/2025
See the CNAG Career site at our website: https://www.cnag.eu/jobs
# sequencing for a better life