Rust Job: SRE,Site Reliability Engineer,Network

Job added on

Location

Santa Clara, CA - United States of America

Job type

Full-Time

Rust Job Details

Position: Sr. Network Site Reliability Engineer

Location: Santa Clara, California, (Hybrid role)

Duration: 6-12+ Months to CTH

8-10 Years

Sr. Network Site Reliability Engineer - Hybrid role

Design, Build and Operate scalable software systems to manage Client’s network infrastructure

  • Lead sustainable incident response, blameless postmortems, and production improvements that result in direct business opportunities
  • Provide guidance to other team members on managing end-to-end availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions.
  • Building network and systems automation software for managing a multi-tenant cloud infrastructure
  • Debugging complex problems across full stack and creating solid solutions via the ability to to identify and and delve deeper into Root Cause Analysis efforts on network incidents with a strong network background is good to have. 
  • Automating work across a variety of infrastructure needs such as testing, failover, policy modifications and deployment. 
  • Writing, updating, and using documentation, including runbooks/playbooks with the ability to respond consistently via the regular creation of runbooks/playbooks with an eye towards additional automation opportunities in the environment is a must have skill .

What we need to see:

  • 7-10+ yrs of experience with designing and building distributed software systems.
  • BS/MS degree in Computer science or related areas (or equivalent experience)
  • Demonstrated ability to write code in a mainstream systems programming language such as C, C++, Go, Python, Java, Rust, etc.
  • Demonstrated ability to use, design and implement maintainable APIs including use of tools such as Git, NetBox, Cloud Vision Portal, SaltStack, Victoria Metrics. SNMP and HashiVault
  • Practical experience with asynchronous programming, type safety, threading models, state machines.
  • Understanding of underlying Linux Internals: Kernel scheduling, memory management, and networking subsystems.
  • Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, tunneling protocols (VXLAN, Geneve, GRE)  in a multi-vendor environment as implemented on platforms such as Arista, Cumulus, Cisco, HP Palo Alto and others
  • Understanding of data persistence (SQL or similar).
  • Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
  • Demonstrated ability to reach cross-functional consensus without all the details

 Ways to stand out from the crowd:

  • Experience in a Hyperscale Cloud Service Provider (public facing or not)
  • Experience with high level compiled languages such as Go or Java
  • Experience with Kubernetes and/or distributed task scheduling
  • Experience with host security services and security principles such as TPM, TXT, SecureBoot
  • Knowledge of SRE principles (observability, SLOs, SLIs, logging, etc)
  • Knowledge of software interface design & documentation for less technical end-users