Data Engineer Resume Plano, TX

Data Engineer Resume Plano, TX
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-TX-Plano
Email	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Big Denton, TX

Real Estate Data Engineer Dallas, TX

Data Engineer Senior Plano, TX

Data engineer Plano, TX

Data Engineer Senior Denton, TX

Data Engineer Dallas, TX

Android Engineer Data Science Irving, TX

Click here or scroll down to respond to this candidate

Candidate's Name
EMAIL AVAILABLE 469-999-1480 Dallas TX https://LINKEDIN LINK AVAILABLE PROFESSIONAL SUMMARYResults-driven Data Engineer with over 6 years of experience in designing, implementing, and optimizing data solutions across diverse industries. Proficient in cloud platforms such as AWS, Azure, and GCP, with a strong focus on ETL processes, data warehousing, and big data technologies. Demonstrated success in leading large-scale data migrations, improving data processing efficiency by up to 30%, and reducing operational costs by 20%. Adept at collaborating with cross-functional teams to deliver scalable and secure data solutions that meet business needs. Skilled in programming languages like Python, Java, and Scala, with hands-on experience in advanced analytics, data modeling, and performance optimization. EDUCATIONMaster of Science, Business Analytics Graduated: Dec 2023 The University of Texas at Dallas, Dallas, TX GPA: 3.8/4 TECHNICAL SKILLSProgramming Languages: Python, Scala, Java.Database Technologies: MongoDB, MySQL, Postgres SQL, DB2. Cloud Services: Amazon web services (AWS), Microsoft Azure, Google Cloud Platform, Cloudera. ETL Tools: Apache Spark, Azure Data Factory, AWS Glue, Informatica PowerCenter, Talend, Apache Nifi, UNIX, Spark. Data Visualization Tools: Tableau, Power BI.Data Warehouse: Snowflake, Azure Synapse Analytics, AWS Redshift, Oracle Data Warehouse. Data Processing: Azure Databricks.Data Storage: Azure Data Lake Storage, Azure Blob Storage, AWS S3, Google Cloud Storage(GCS), Hadoop. Monitoring and Logging: Azure Monitor, AWS CloudWatch Version Control: GitWORK EXPERIENCEData Engineer, American Airlines, Dallas, TX Feb 2023 Present Analyzed existing data infrastructure and identified pain points, including scalability limitations and maintenance overheads. Utilized Java and Python frameworks and libraries to enhance Hadoop functionality, leveraging tools such as Apache Flink, Apache Kafka, or Apache Spark to build robust and scalable data processing systems. Designed an architecture for the new cloud-based data platform, leveraging Azure services such as Azure Data Lake Storage, Azure Databricks, Azure SQL Database, and Azure Data Factory. Led Data migration from on-premises servers to Azure, ensuring minimal downtime and data integrity throughout the process. Implemented ETL process using Azure data factory to ingest, transform, and load data from various sources into the Snowflake data warehouse. Established monitoring and alerting mechanisms using Azure Monitor and Snowakes native monitoring tools to track system performance and health. Executed Extract, Transform, Load (ETL/ELT) pipelines using a combination of AWS Glue and AWS Step Functions, resulting in a 20% increase in data processing efficiency. Employed AWS Glue and Spark for data cleansing and applied transformations, ensuring data analysis efficiency by 20% and reducing errors by 20%. Established monitoring solutions using tools like Grafana and Amazon CloudWatch, seamlessly integrating performance tuning mechanisms, achieving a 10% improvement in system reliability. Utilized Azure Data Factory, Spark SQL, and AWS Glue for Extract, Transform, and Load operations, ingesting data into various AWS services such as Amazon S3, Amazon RDS, Amazon Redshift, and Amazon Data Warehouse, resulting in a 30% increase in data availability. Utilized Amazon EMR for scalable and cost-effective Big Data processing, integrating with Apache Spark for advanced analytics, achieving a 15% reduction in processing costs. Also, leveraged AWS Glue for serverless ETL. Developed, tested, and optimized complex SQL queries, stored procedures, and functions to support data extraction, transformation, and loading (ETL) processes, enhancing data retrieval efficiency and accuracy. Used Agile methodologies with hands-on experience on Jira. Implemented agile methodologies (e.g., Scrum, Kanban) for data engineering projects to enable iterative delivery, cross-functional collaboration, and continuous improvement. Collaborated with cross-functional teams to define data models and ensure alignment with business requirements. Implemented data governance and security measures to protect sensitive data in compliance with regulatory standards such as GDPR and CCPA. Conducted performance tuning and optimization of queries to improve efficiency and reduce costs. Provided training and documentation for stakeholders on utilizing the new data platform effectively. Managed and orchestrated Docker containers using Docker Swarm and Kubernetes. Set up and maintained container clusters to ensure high availability, scalability, and efficient resource utilization. Automated deployment workflows and performed regular updates and maintenance. Successfully integrated ML models into production systems, leading to a 20% increase in operational efficiency and customer satisfaction. Designed an NLP-based recommendation system that improved user engagement by 15%. Reduced data processing time by 30% by implementing optimized data pipelines and parallel processing techniques.Data Engineer, Optum Health Care, Bangalore, India Feb 2020 Aug 2021 Assessed existing on-premises data infrastructure and identified opportunities for modernization and optimization. Designed a hybrid cloud architecture utilizing Azure and AWS services to accommodate diverse data workloads and business requirements. Led the migration of Data pipelines and workloads to Azure and AWS services such as Azure Blob storage, Azure Synapse Analytics, AWS S3, and AWS Redshift. Engineered resilient and scalable data integration pipelines with AWS Glue, handling the extraction, transformation, and loading of data from diverse sources like electronic health records (EHRs), claims data, and provider systems, resulting in a 10% improvement in data ingestion speed. Enhanced data analytics capabilities by implementing Databricks on AWS, contributing to increased accuracy and speed in data analysis, resulting in a 20% increase in data analysis speed and accuracy. Implemented ETL process using industry-leading tools like Apache Spark, Azure Data Factory, and AWS Glue to ensure seamless data integration, transformation, and loading. Designed, built, and maintained scalable data pipelines using Apache Spark, ensuring efficient data processing and transformation. Utilized Scala, Java, and Python to develop robust data processing applications. Experienced in testing scenarios and pre-processing data using Spark, Hive, and Nifi using principles of SQL queries. Designed and executed Extract, Transform, Load (ETL) processes on UNIX platforms, integrating data from various sources, ensuring data accuracy and consistency, and optimizing data pipelines for better performance. Successfully designed and deployed a data warehouse solution using SSIS and SSAS, leading to a 40% reduction in report generation time for key business metrics. Successfully automated the deployment of a complex data processing platform using Terraform, reducing the time required for infrastructure provisioning by 50% and achieving a 40% reduction in operational costs through optimized resource management and scaling. Developed and optimized complex SQL queries to extract, manipulate, and analyze large datasets. Ensured high performance and efficiency of queries by implementing best practices, such as indexing, partitioning, and query optimization techniques, resulting in faster data retrieval and improved overall system performance. Designed and implemented a Snowflake data warehouse for centralized storage, optimizing for structured and semi-structured data, with seamless data integration and ETL processes managed through Talend, leading to a 20% improvement in data processing speed. Collected, ingested, and integrated data from diverse sources into a centralized storage system using MSBI, Talend, Informatica, and Big Data tools, achieving a 30% increase in data ingestion efficiency. Administered and maintained databases with tools like MSBI, ensuring performance, security, and availability, with a 20% enhancement in database administration efficiency. Orchestrated data workows across multi-cloud environments, ensuring data consistency and integrity throughout migration. Developed data models and schemas to support analytical use cases and reporting requirements. Integrated Power BI dashboards and reports with the cloud data platform to provide stakeholders with interactive visualizations and real-time insights. Collaborated with business analysts and data scientists to understand user requirements and customize visualization solutions accordingly. Implemented security and compliance measures to protect sensitive data and ensure regulatory compliance across cloud environments. Provided training and support to end-users on utilizing Power BI for self-service analytics and reporting. Data Analyst/Engineer, Splunk, Hyderabad, India May 2017 Feb 2020 Designed and implemented scalable ETL processes using tools such as Informatica PowerCenter, Talend, and Apache NiFi to extract, transform, and load data from various sources into centralized data repositories. Wrote efficient and scalable code in Java and Python to integrate with Hadoop components and enhance data processing capabilities. Developed and optimized MapReduce programs in Java and Python to process large datasets on Hadoop. Designed and implemented data pipelines using GCP services such as Dataflow, Dataproc, and Pub/Sub. Create and manage data storage solutions using GCP services such as Big Query, Cloud Storage, and Cloud SQL. Participated in implementing POC on Google Cloud Platforms for 3 teams. Configured and optimized data storage solutions such as Big Query and Google Cloud Storage(GCS) for storing and querying structured and unstructured data, enabling advanced analytics and reporting capabilities. Led the migration of 10+ terabytes of data from Teradata to GCP, achieving a 40% reduction in query response times by developing optimized ETL pipelines using Cloud Composer and BigQuery. Refined GCP data architecture post-migration, reducing BigQuery costs by 25% through implementing effective partitioning, clustering, and query optimization strategies. Automated 80% of the migration validation process, cutting manual errors by 70% and decreasing incident response times by 60% through real-time monitoring system implementation. Integrated data from diverse sources, including MS_SQL Server, Oracle, PostgreSQL, DB2, MongoDB, and external APIs, using tools like Apache Nifi for efficient data ingestion, resulting in a 30% increase in ingestion efficiency. Applied PySpark for data cleansing, validation, and preparation, ensuring high-quality data is available for accurate analysis and reporting. Leveraged PySpark to integrate with big data ecosystems, including Hadoop, Hive, and Kafka, facilitating smooth data processing and real-time data streaming capabilities. Implemented CI/CD pipelines for automated deployment, scaling, and updating data applications on Kubernetes clusters. Utilized tools such as Jenkins and Argo CD to streamline the deployment process, minimize human error, and ensure continuous integration and delivery. Implemented Java-based solutions to integrate data from various sources, including databases, APIs, and flat files, into the Hadoop ecosystem. Developed and maintained RESTful APIs in Java and Python for data ingestion and retrieval processes, ensuring seamless and efficient data flow. Developed advanced SQL queries, stored procedures, and functions to support data retrieval, transformation, and reporting needs. Leveraged expertise in T-SQL and MSSQL to write complex queries that adhered to best practices for readability and maintainability. Continuously engaged in performance tuning, monitoring query execution times, and optimizing slow-running queries to enhance system performance. Enhanced data quality and consistency by implementing data validation checks and error handling mechanisms within ETL workows. Leveraged advanced analytics techniques to perform data profiling, cleansing, and enrichment to ensure accuracy and completeness. Developed custom data models and schemas to support analytical use cases and reporting requirements. Designed and optimized data models and structures for Tableau reporting, ensuring efficient data retrieval and visualization performance. Collaborated with business analysts and stakeholders to understand reporting requirements, translating them into actionable data models and visualizations in Tableau. Developed interactive dashboards and reports using Tableau, enabling stakeholders to access real-time key performance indicators (KPIs) and business metrics. Collaborated with data analysts and scientists to develop predictive models and analytical dashboards for business insights and decision support. Implemented data governance policies and procedures to ensure compliance with regulatory requirements and industry best practices. Provided training and support to analysts and business users on utilizing the enhanced analytics platform for data-driven decision-making. Collaborated with business stakeholders to understand data requirements and identify opportunities for improving data integration and analysis capabilities. Evaluated existing ETL pipelines and identified areas for optimization and automation to streamline data processing workows.

Respond to this candidate
Your Message
Please type the code shown in the image: