The source for tech and startup jobs in Miami

FRESHEN UP YOUR CAREER

Sign up for Refresh Miami's Jobs & Hiring Newsletter to be notified of local tech related job opportunities, career insights, and more.

Site Reliability Engineer

Royal Caribbean Group

Royal Caribbean Group

Software Engineering
Miami, FL, USA
Posted on Nov 26, 2024

Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world.

We are proud to be the vacation-industry leader with global brands — including Royal Caribbean International, Celebrity Cruises and Silversea Cruises — the most innovative fleet and private destinations, and the best people. Together, we are dedicated to turning the vacation of a lifetime into a lifetime of vacations for our guests.

Royal Caribbean Group’s Digital team has an exciting career opportunity for a full time Site Reliability Engineer reporting to the Senior Manager, Site Reliability Engineer - Digital Operations.

This position will work on-site in Miami, Florida.

Position Summary:
As a Site Reliability Engineer (SRE) at Royal Caribbean, you will play a critical role in ensuring the reliability, performance, and seamless operation of our digital ecosystem. This includes our guest-facing mobile apps, websites, and the backend systems that power them. You will work collaboratively with development, operations, and product teams to build and maintain a highly resilient and scalable digital experience for our guests.

Essential Duties and Responsibilities:

  • Incident Response and Resolution: Respond to and resolve production incidents, prioritizing guest-facing issues to minimize disruption. Conduct root cause analysis with guidance from senior team members and implement preventive measures to avoid recurrence.
  • Monitoring and Observability: Build, maintain, and enhance monitoring tools and dashboards (using Prometheus, Grafana, or similar) to provide visibility into system health, performance, and guest impact. Proactively detect and address potential issues.
  • Automation and Tooling: Develop and implement automation scripts and tools to streamline operations, reduce manual intervention, and improve system reliability. Utilize configuration management tools and infrastructure as code principles.
  • Collaboration: Work closely with product teams to incorporate reliability principles into new feature development. Collaborate with operations teams to ensure smooth deployments and transitions.
  • Documentation and Knowledge Sharing: Create and maintain clear documentation on system architecture, troubleshooting guides, and incident postmortems. Share knowledge and best practices with the team.
  • On-Call Support: Participate in on-call rotation as defined by team needs, primarily focusing on acknowledging and escalating incidents, with guidance from senior team members.
  • Working Hours: Expectations of non-standard working hours which include mornings, nights, and weekend rotations.

Qualifications, Knowledge, and Skills:

  • 3+ years of experience in IT operations, software development, or a related field.
  • Bachelor’s degree in computer science or a related field preferred.
  • Technical Expertise: Strong knowledge of mobile (iOS, Android) and web technologies, backend systems, cloud infrastructure (AWS, Azure, etc.), and database technologies.
  • Programming: Proficiency in one or more programming languages (e.g., Python, Java, Go, Jenkins) for scripting and automation.
  • Working knowledge of Kubernetes is a high plus.
  • Monitoring and Observability: Experience with tools like Prometheus, Grafana, Splunk, or similar.
  • Incident Management: Experience with incident management tools like PagerDuty, ServiceNow, or similar.
  • Security: Understanding of security best practices, vulnerability identification, and incident response.
  • Communication: Excellent written and verbal communication skills for collaborating with diverse teams and stakeholders.
  • Customer Service: Understands and is aligned to the purpose of providing a great client experience (client focused attitude)
  • Detailed Oriented: The ability to understand and appreciate the fine, granular details.
  • SQL Database: Ability to work with large volumes of customer data. Ability to use Oracle SQL (or similar) to query databases and perform edits to SQL queries.

Preferred Qualifications:

  • Experience in the hospitality or travel industry.
  • Familiarity with Royal Caribbean's digital ecosystem.
  • Experience with high-traffic, guest-facing systems.
  • Previous experience in working with ticket-based incident systems.
  • ITIL v3 or v4 Foundations Certification

We know there's a lot to consider. As you go through the application process, our recruiters will be glad to provide guidance, and more relevant details to answer any additional questions. Thank you again for your interest in Royal Caribbean Group. We'll hope to see you onboard soon!

It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.

#LI-CF1