TIAA Site Reliability Manager in CHARLOTTE, North Carolina
TIAA is the leading provider of financial services in the academic, research, medical, cultural and government fields. We offer a wide range of financial solutions, including investing, banking, advice and education, and retirement services.
For more information about TIAA,.
TIAA’s Production Services and Architecture Organization has an opportunity for a Site Reliability Manager to build, lead, and grow the globally distributed team. This position will be located in our Charlotte, NC office.
This position will provide effective technical leadership; oversight for the availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning of our Public AWS Cloud platform. This role will also define processes and create ways to automate repetitive tasks and seek out ways to drive efficiency and improvements. Come and join our team!
KEY RESPONSIBILITIES AND DUTIES:
Set the direction and strategy for the team andthe overall SRE Program
Serve as the primary point of contact forsenior management stakeholders across development and cloud operations
Collaborate with diverse stakeholders on thestrategic vision for the enterprise leveraging cloud, managed solutions andtraditional capabilities; will make recommendations on new solutions andtechnologies
Design, build and manage systems,infrastructure and applications through automation that support self-service
Recruit and develop staff; build a culture ofexcellence in site reliability and automation
Manage complex technical projects and a team ofSREs’
Own site stability, performance and capacityplanning
Lead by example – roll-up your sleeves bydebugging and helping with root cause analysis; participate in on-call rotationand occasional travel
Participate in a 24x7 on-call rotation
Be the product owner for the team’s Agile Board
Troubleshoot priority incidents, facilitateblameless post-mortems and ensure permanent closure of incidents
Design automated software and product upgrades,follow change management and release management processes
Deploy, support and monitor new and existingservices, platforms and application stacks
Develop tools to improve our ability to rapidlydeploy and effectively monitor services in a large-scale Public Cloud environment
3 years of demonstrated experience leading sitereliability and performance in large-scale, high-traffic cloud environments
3 years experience with AWS
7 years of technology experience
Experience leading/managing a team
Prior experience with working on large-scaledistributed systems including multi-tiered architecture
Demonstrated understanding of SRE concepts andthe DevOps culture, with a focus on leveraging software engineering tools,methodologies and concepts
In-depth understanding of automation and CI/CDprocesses to go along with excellent reasoning and problem-solving skills
Experience with UNIX/Linux/Windows environmentswith possess an in-depth grasp on system internals
Ability to perform debugging andtroubleshooting
Experience with scripting/development in onegeneral purpose programming language such as Python or Java or C or C++ or Goor shell scripting
Experience with development code such asAnsible or Puppet or CloudFormation templates or Terraform
AWS Professional certification
Experience with other cloud provides such as MSAzure or Google Cloud
Hands-on experience with implementation ofCI/CD, Automation and cloud-native services
Experience working in an agile environment
Experience with ITIL practices
Familiar with Kubernetes
Equal Employment Opportunity is not just the law, it’s our commitment. Read more about the.
If you need assistance applying due to being visually or hearing impaired, please email.
We are an Equal Opportunity/Affirmative Action Employer. We will consider all qualified applicants for employment regardless of age, race, color, national origin, sex, religion, veteran status, disability, sexual orientation, gender identity, or any other legally protected status.
- Requisition ID: 1723762
Post Date: Nov 18, 2019