Dev-Ops HPC Engineer
Infos sur l'emploi
- Date de publication :19 novembre 2024
- Taux d'activité :100%
- Type de contrat :Temporaire
- Lieu de travail :Lausanne
EPFL, the Swiss Federal Institute of Technology in Lausanne, is one of the most dynamic university campuses in Europe and ranks among the top 20 universities worldwide. The EPFL employs more than 6,500 people supporting the three main missions of the institution: education, research and innovation. The EPFL campus offers an exceptional working environment at the heart of a community of more than 17,000 people, including over 12,500 students and 4,000 researchers from more than 120 different countries.
The SCITAS platform (Scientific IT & Application Support) provides EPFL researchers and partners with access to infrastructure and expertise in High Performance Computing (HPC). SCITAS also contributes to research and development activities so as to maintain the EPFL's reputation as a leading research facility, including Swiss Twins scientific activities.
The Swiss Twins project"”a collaboration between ETH Zurich, PSI, EPFL, and CSCS"”aims to advance High-Performance Computing (HPC) and cloud technologies with a focus on cloud abstractions and geo-redundancy.
SCITAS infrastructure includes:
- Over 2,000 compute nodes (CPU + GPU)
- Large-scale storage systems
- Automatic deployment and configuration management tools
As an Dev-Ops HPC Engineer, you will join the SCITAS Systems team to manage the deployment, operations, and evolution of geo-redundant scientific HPC solutions within the Swiss Twins project. This role focuses on automation, infrastructure optimization, cloud technologies, and modern infrastructure practices.
- Design, build, and deploy portable HPC environments for on-premises and cloud.
- Implement provisioning layers using Terraform and manage container orchestration.
- Troubleshoot across hardware, operating systems, and cloud services.
- Develop automated tests to ensure system stability and reliability.
- Lead cloud abstraction and geo-redundancy initiatives.
- Train and support users in adopting new technologies.
Must-Have Qualifications:
- Proven systems/devops experience.
- Familiarity with container technologies like Docker or Singularity.
- Programming skills in languages such as Bash and Python, with a solid understanding of algorithms and data structures.
- Proficiency with configuration management tools, CI/CD pipelines, Git, and provisioning tools.
- Strong networking fundamentals (HTTPS, DNS, TCP/IP).
- Extensive GNU/Linux systems experience.
- Ability to document procedures and share knowledge effectively.
- Proficiency in English or French.
Preferred Qualifications:
- Bachelor's or Master's degree in a relevant field.
- Experience in HPC or HTC environments, including batch systems.
- Deep knowledge of cloud technologies (AWS, Azure, GCP).
- Experience with Infrastructure as Code (IaC) practices, particularly with Terraform.
- Familiarity with workload management systems like Slurm.
- Knowledge of parallel file systems.
- Security-focused mindset.
- Proficiency with testing practices, including automated test development.
- Experience with distributed systems and monitoring/alerting systems.
- Enjoys thorough documentation.
- Actively shares knowledge and supports team members.
- Provides technical expertise and proposes innovative solutions.
- Languages: EPFL operates in English and French; non-bilingual applicants are encouraged to learn the other language.
- Application: Only applications submitted through EPFL's internal website will be considered.
- Equality Commitment: EPFL actively promotes gender equality in its workforce.
- Start Date: to be agreed upon or 1.1.2025
- Employment Term: Fixed-term (CDD)
- Work Rate: 100%
- Contract Duration: 1 year, renewable
- EPFL offers the possibility to work remotely up to 2 days a week.
- Reference : 1183