TraydstreamTR

Site Realibility Engineer (Azure Cloud)

Traydstream
India only
Apply now

Reliability and Stability:

  • Own and operate our application stack Azure infrastructure to orchestrate and manage our hosted customer instances of Metabase.
  • Debug runtime issues across the different levels of our application stack and hosting stack.
  • Continuously improve our automated deployments and testing.
  • Carry out all activities pertaining to supporting our Application and Cloud Infrastructure that our platform runs on, including but not limited to monitoring the Application, investigating and resolving Alerts and Outages, configuring the Monitoring/Alerting tooling, investigating external and internal client reported issues and carrying out BAU maintenance activities.
  • Deploy application and infrastructure upgrades and enhancements to UAT and Production environments.
  • Provision new / manage existing UAT and Production Environments.
  • Coordinate and carry out Security Incident Management related to our application and infrastructure in accordance with our Security Incident Management processes.
  • Maintain our SOC2 compliance and security posture.
  • Where necessary, be prepared to work in shifts (early/late, weekends) to provide 24x7 Support for our platform.

Service-Level Objectives (SLOs):

  • Develop and build our internal tooling and automation to manage the lifecycle of a hosted Metabase installation, from purchase to deployment, zero-downtime upgrades, and general operational health.

Automation and Tooling:

  • Continuously improve our automated deployments and testing.
  • Automate EKS and AKS cluster provisioning.
  • Extend our CRDs and Operators.
  • Improve the RDS sharding strategy for our multi-tenant platform.
  • Unify and improve our CI/CD platforms.

Capacity Planning:

  • Continually seek and implement improvements in the environment – cost control, automation, rationalizing the estate, and processes.

Collaboration:

  • Collaborate with core application developers on changes to improve our application metrics, deployment speeds, and CI integration.

Performance Optimization:

  • Collaborate with core application developers on changes to improve our application metrics, deployment speeds, and CI integration.

Requirements

Must Haves

  • 2-5 years’ experience building and operating production infrastructure, ideally on public cloud and Microsoft Azure cloud.
  • Experience supporting business-critical systems (Incident, Change and Problem management process) in a large-scale operations team.
  • Broad knowledge of IT Operations concepts, architecture & information security (ITIL/ Security).
  • Hands-on commercial experience of supporting cloud-based SaaS systems (Microsoft Azure).
  • Experience in setting up EC2, SNS, Database Instances, securing of VPC, implementation of Security Groups, Identity and Access Management, Backups, Restore and Disaster Recovery, and the equivalent technologies on Azure.
  • Hands-on commercial experience in both Linux and Windows systems administration and automation scripting.
  • Hands-on commercial experience managing Kubernetes Clusters
  • Good understanding of DevOps principles (CI/CD, release automation).
  • Knowledge of Clusters, Storage, Backups, Data Export/Import, Monitoring tools and Disaster Recovery.
  • Hands-on commercial experience using a wiki (ideally Confluence) to document processes that comprise our Knowledge Base.
  • Experience with TCP/IP network and various fundamental network services such as DNS, DHCP, SMTP, NTP, telnet, SSH, etc.

Nice to Haves

  • AWS is good to have
  • Ability to read/understand & debug Python and Java.
  • Working experience with MongoDB, MariaSQL and PostgresSQL.
  • Working experience with Application Monitoring tools
  • Practical application of scripting (e.g. Python, cron), to automate repeated tasks.
  • ITIL Foundation Qualified.

Elevate your application

Let our AI craft your perfect cover letter and align your resume to this job's criteria.

By using our AI tools, you consent to sharing your profile with our AI partner for this purpose.

Apply now

Please let Traydstream know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jul 16, 2024

Posted on

May 17, 2024

Job type

Full Time

Experience level

Mid-level

Location requirements

Hiring timezones

India +/- 0 hours
Claim this profileTraydstream logoTR

Traydstream

View company profileVisit traydstream.com

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

3 remote jobs at Traydstream

Explore the variety of open remote roles at Traydstream, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Traydstream

Remote companies like Traydstream

Find your next opportunity by exploring profiles of companies that are similar to Traydstream. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join thousands of other remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan