| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
EMAIL AVAILABLE PHONE NUMBER AVAILABLE Alpharetta, GASUMMARYSeasoned Site Reliability Engineer with 13 years of experience, including recent roles at Last9, Morgan-Stanley, and SIEMENS, seeking to leverage expertise in SRE practices, observability, and automation. Proficient in configuring monitoring/alerting with PromQL, automating deployments with GitHub, and managing AWS infrastructure using Terraform. Adept at collaborating with DevOps teams, evangelizing SRE principles, and defining SLIs/SLOs to enhance system reliability and performance.WORK EXPERIENCELast9 (US)Site Reliability Engineer Jul 2022 - Current date Assisted customers with the configuration and utilization of SRE-focused products to meet their Service Level Objectives (SLOs). Collaborated with clients to identify high cardinality metrics and implemented streaming aggregation strategies to address them effectively. Developed automated solutions, including Bash and Python scripting, Github-based deployment pipelines, and infrastructure provisioning with Terraform for EKS deployments. Morgan-Stanley (US)Consulting Site Reliability Engineer Jan 2022 - Jul 2022 Enhanced observability and fine-tuned alerting mechanisms using an APM tool, leading to improved system monitoring and performance insights. Automated key manual processes, resulting in a 20% reduction in repetitive workload for the team. Developed and maintained Splunk dashboards for real-time monitoring, enabling proactive alerts on critical application transactions.SIEMENS (US)Consulting Site Reliability Engineer Sep 2021 - Apr 2022 Implemented Site Reliability Engineering methodologies to enhance the reliability and performance of cloud-native applications, resulting in more robust and scalable systems. Collaborated with cross-functional teams to automate infrastructure upgrades and developed comprehensive KPI dashboards, improving operational efficiency and visibility across the CI/CD pipeline. Established and refined monitoring and alerting systems, utilizing tools like Prometheus and DataDog, to proactively detect and resolve issues in critical applications and infrastructure components. FICO (Mexico)Site Reliability Engineer Oct 2020 - Sep 2021 Enhanced software development infrastructure by implementing build scripts, integrating continuous deployment tools, and refining continuous integration processes, facilitating efficient global engineering operations. Managed Kubernetes applications, ensuring robust data processing through Python scripts and maintaining a complex multi-master, multi-worker environment with extensive CRON job scheduling and data management. Strengthened system security and reliability by automating task execution using Jenkins, fortifying environments with MFA and SSH tunneling, and conducting rigorous SSL/TLS certificate management and application support.BMC Software (Mexico)Sr. Technical Support Analyst Oct 2016 - Oct 2020 Managed incident resolution for server automation tools, ensuring adherence to SLAs and maintaining high levels of customer satisfaction. Facilitated customer migrations to AWS, including setup of VPCs, RDS, Security Groups, and EC2 instances, enhancing infrastructure scalability and efficiency. Diagnosed and resolved complex network and application server connectivity issues, and implemented AD and LDAP authentication for multiple clients, bolstering system security. Delivered expert analysis of Java errors and thread dumps, identifying root causes of execution errors or memory leaks, and provided tailored solutions or escalated bug reports. Hewlett Packard Enterprise (Mexico)SAN Storage Engineer Feb 2015 - Oct 2016 Configured LUNs and managed SAN connections on Brocade switches to efficiently allocate new storage capacity for applications, ensuring high availability through redundant connections and data replication across geographically dispersed data centers. Monitored storage array utilization to prevent exceeding an 85% threshold, conducted migration planning for applications at 80% capacity, and executed storage reclamation to optimize resource allocation. Performed troubleshooting for server connectivity issues with HBA and Brocade switches, and contributed to the infrastructure upgrade by transitioning to 3PAR storage arrays, enhancing system performance and reliability. WebApps Administrator Nov 2011 - Feb 2015 Ensured optimal functionality and performance of over 400 enterprise applications by monitoring system health, troubleshooting errors, and executing necessary modifications to prevent outages and enhance efficiency. Developed and implemented automation scripts for routine maintenance tasks, contributing to system reliability and data integrity by resolving ETL errors that directly affected business intelligence reporting for executive decision-making. Collaborated with application engineering teams to identify and rectify software bugs and infrastructure issues, improving application stability and performance while managing web server configurations across multiple platforms.Tata Consultancy Services (Mexico)Technical Support Analyst Apr 2010 - Nov 2011 Delivered comprehensive technical support for ERP systems, resolving issues across Order Management, Inventory, Purchase Orders, and General Ledger modules. Managed web application administration on Weblogic servers, conducting routine maintenance tasks such as disk cleanup and server recycling. Executed weekly updates of trade compliance rules, ensuring accurate database bulk loads and effective coordination with database administrators and stakeholders. EDUCATIONUniversidad de GuadalajaraBachelor's in Communications and Electronics Engineering 2009SKILLSSite Reliability Engineering Incident Management Configuration Management Change Management Build Management Release Management Version Control System SAN Storage Release Engineering Graphana Prometheus Promql Zabbix AppDynamics 1000eyes Datadog Dynatrace AWS Services Ec2 VPC Route 53 S3 Lambda Cloud Formation Templates Load Balancers CloudWatch Security Groups Ebs Iam Eks Ansible Docker Kubernetes OpenShift SSL/TLS Automation Github Python Bash Shell NSH Blcli Apache Tomcat JBoss Java High-Availability Disaster Recovery RHEL/CentOS Ubuntu Suse Debian Ibm Aix Puppet Jenkins Terraform GitLab Sus Foundation V3 AWS Certified Cloud Practitioner Network Protocols and Services HTTP SMTP TCP/Ip Ldap NFS Nis DNS DHCP SSH TLS Samba FTP Amazon Web Services (AWS) Oracle SQL Server PSSQL Apache Tomcat JBoss Shell Script Bash Python PowerShell PowerCLI NSH Blcli Service Now Jira HPSM SalesForce Docker Kubernetes OpenShift AppDynamics Zabbix Splunk Prometheus Grafana DataDog 1000eyes Dynatrace |