Site Reliability Engineer (Junior) -Montpellier, France

Context

DecisionBrain develops custom decision-support applications for various clients. While each solution is unique, they share common architectural traits:

  • Web-based applications with a microservice architecture
  • Deployment on various Kubernetes environments
  • Built using DB Gene, our proprietary development platform

As a Site Reliability Engineer (SRE), you will play a crucial role in ensuring the stability, scalability, and reliability of our applications. You will primarily provide L2 support, assisting users and resolving infrastructure-related issues.

Beyond client-facing support, you will also monitor internal tools and deployments to ensure smooth operations for both our customers and our colleagues. We use Slack for real-time communication and rapid response to incidents.


Key Responsibilities

Technical Support (L2 & Incident Response)

The first point of contact for users (L1) is managed by Business Analysts, who handle basic troubleshooting and initial issue triaging. As an L2 Support Engineer, your role is to:

  • Investigate and resolve technical issues beyond the scope of L1 support.
  • Analyze software bugs, configuration problems, and system performance issues.
  • Review logs, monitor infrastructure health, and validate system components.
  • Provide actionable insights and recommendations to improve platform stability and production readiness.
  • Document findings, actions taken, and resolutions in tickets for both users and L3 support.
  • Escalate critical software bugs or advanced issues to the L3 engineering team while providing structured analysis to aid in resolution.

Infrastructure & Application Troubleshooting

Your role will involve diagnosing problems and ensuring service reliability across both infrastructure and application layers.

Infrastructure Troubleshooting:

  • Monitor system health via Grafana, Loki and Prometheus (our stack) or other observability tools.
  • Check Kubernetes components (pods, jobs, volumes) and logs for errors.
  • Perform actions such as restarting pods, adjusting memory allocations, or resizing volumes to restore services.
  • Work on the alerting stack, integrating it with internal tools to ensure proactive issue detection and resolution.
  • Contribute to an Internal Developer Platform (IDP) approach, where we map and maintain the knowledge of our software assets, configurations, deployments, credentials, and related issues.

Application Troubleshooting:

  • Analyze logs and error messages from microservices.
  • Run data validation checks and attempt to reproduce issues in test environments.
  • Work closely with the Platform team, contributing to discussions on architecture improvements and bug fixes.
  • Collaborate in the development of tools and automation scripts to enhance system observability and reliability.

Required Skills & Qualifications

Education:

  • Bachelor or Master’s degree in Computer Science, Information Technology, or a related technical field.

Technical skills:

  • Understanding of microservice architecture: front-end, back-end, databases, REST API interactions.
  • Knowledge of infrastructure & software components:
      • Memory (Java heap, stack, native memory, etc.)
      • CPU performance and throttling
      • Disk usage, logs, error handling, HTTP status codes
  • Knowledge of Docker
  • Familiarity with Kubernetes: Deployment management, Helm charts, command-line usage (kubectl).
  • Experience with monitoring tools such as Grafana, Prometheus, Loki.
  • Experience with infrastructure-as-code tools (optional) such as Terraform.
  • Scripting skills in languages such as Bash, Python, or equivalent.

Personal skills:

  • Excellent written communication in English (support documentation, ticketing, and user communication).
  • Problem-solving mindset: Ability to troubleshoot issues methodically and document solutions.
  • Customer-oriented: Ability to work with users of varying technical expertise.
  • Organized and resourceful: Strong investigative and documentation skills.

Language Requirements:

  • English (Proficient, written & spoken)
  • French and/or Italian is a plus

Job Details

Working Conditions

  • Workplace location: Montpellier, France
  • Contract type: Permanent
  • Work schedule: Full-time (39h), up to 2 days / week working from home

Compensation

  • Gross annual salary : from 28 to 33 K€
  • Benefits : Meal voucher / mutual 60% / profit-sharing …
  • Technical equipment : laptop (Mac or PC) / double screen

Recruitment Process

  • Online meeting – Company presentation, role discussion, candidate motivation
  • Technical test – Hands-on problem-solving
  • Manager interview – Review of technical test & discussion
  • Final interview

Practical Information

  • Start Date: Immediate
  • To apply: Please fill out the form below and attach your CV

Application

60+ employees
15 nationalities
14 languages

At DecisionBrain, we operate with the agility and innovation of a startup, empowered by a team deeply committed to the transformative potential of AI-powered software solutions. Our collective strength lies not just in our advanced analytics solutions, but in a culture that values collaboration, agility, flexibility and fun.

As a self-funded, international company, we cherish our diverse, multicultural workforce. We offer a collegial atmosphere that fosters innovation, enabling you to excel in your domain while maintaining a healthy work-life balance.

Bluesky