|
Search Jobvertise Jobs
|
Jobvertise
|
Cloud Infrastructure Engineer Location: US-CA-San Francisco Jobcode: 7e292502674abfd82b8eae7fc8b8e374-122020 Email Job
| Report Job
Reporting to the Head of Infrastructure, Network & Security, the ideal candidate is a highly driven, self-motivated, technically hands-on individual who is truly excited about creating meaningful impact, willing to build and lead a small team of Engineers. In this role you will combine a startup mindset with the scale of an industry leader, providing you with hands-on exposure to how key organization decisions are made and the challenges of operating and securing critical cloud infrastructure and services. The Infrastructure, Network & Security team is part of the Engineering division and is in charge of the overall infrastructure (provisioning, cost, reliability, business continuity, disaster recovery, backup strategy, … hosted mainly on AWS Cloud), databases, network (VPCs, IPSec VPN, Layer 3 & 4 routing), monitoring (SRE), incident management and security (all layers of the stack, patching, vulnerability scanning, IDS/IPS/SIEM). Working closely with the entire Engineering team, needs, weaknesses and risks are identified and an action plan defined to bring the platform to the next level using the latest tools and technologies.
Key Role Responsibilities
- Manage, maintain, upgrade and monitor the critical infrastructure of the Company in a highly available environment to achieve an SLA of 99.99%+ availability
- Deploy and manage Cloud infrastructure to serve business needs, optimizing performance and cost in a highly available environment
- Simplify and automate the provisioning of the platform to support the engineering team with their requirements and needs
- Work closely with the rest of the Engineering team to design and architect the platform
- Perform maintenance and system upgrades including patches, hot fixes, configuration updates, backups, … to keep resources current and secure
- Employ multiple patching strategies, patch and build new AMIs for cloud-aware applications that can be easily restarted, and resort to in-place patches for the rest
- Design the backup and restoration strategy, and the business continuity plan in the event of a failure to protect the business
- Build and maintain both Unix and Linux systems to provide critical infrastructure services such as FTP/SFTP, NFS, DNS, SMTP, and Proxy Services.
- Implement relevant KPI and metrics to assess and follow on the performance of the platform and systems (Infrastructure Reliability Engineering)
- Identify risks and weaknesses on the infrastructure early on and ensure they are addressed before they become actual problems
- Work within established configuration and change management policies to ensure awareness, approval and success of changes made to the infrastructure
- Maintain and support all enterprise monitoring technologies and establish associated policies governing both advanced notifications and escalation procedures
- Create and maintain clear and accurate system and process documentation
- Configure logging and monitoring based on best practices
- Setup, monitor, correlate and investigate alerts to detect and resolve incidents
- Keep up to date with trends and innovation in engineering, including containers and orchestration, serverless and other programming paradigms, microservices, DevSecOps/DevOps/SRE, etc.
DESIRED QUALIFICATIONS
- Degree in Computer Science or equivalent
- 5+ years of experience in a similar role
- 3+ years of experience supporting and securing large scale and critical systems and APIs in production
- Strong experience with AWS Cloud infrastructure management and related services
- Experience installing, configuring, and maintaining services such as Bind, Squid, Apache, MySQL, and HAProxy in a Linux/Unix environment
- Ability to utilize a scripting language (e.g. Bash, Perl, Python) to automate regular tasks and processes
- Experience designing networks, systems and application architectures
- Strong hands-on understanding and experience of Linux administration, command line interface, shell scripting
- Strong understanding of application protocols (e.g. DNS, SSH, HTTPS, SFTP, SMTP) and their behaviors across network environments
- Experience supporting the following technology stack and services (Amazon AWS, Terraform, Ansible, Docker, HAProxy, Nginx, ELB/ALB, ELK, Prometheus, Grafana, ECS/EKS/Kubernetes, Fluentd, Elasticsearch)
- Experience in designing, integrating, developing web services and APIs in the cloud
- Programming experience in one or several of the following languages (GoLang, JavaScript, Perl, Python or Ruby) is a plus
- A strong multi-tasker with a keen eye for detail
- Strong analytical, problem-solving skills and willingness to investigate complex problems
- Strong strategic thinking skills to handle both the big picture and crucial decisions
- Ability to thrive on a high level of autonomy and responsibility
- Ability to work very well cross-functionally, to think rigorously and make hard decisions and tradeoffs when required
- Sustain learning and knowledge sharing culture in the organization and aim at achieving a high level of technical excellence and stability
- Excellent written and verbal communication skills in English
TopTalentFetch
|