| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Sr.data engineerPHONE NUMBER AVAILABLE EMAIL AVAILABLE Beaumont, Texas, United States Candidate's Name
SUMMARY Data Engineer with over 10 years of experience in building and optimizing data pipelines, ETL processes, and data solutions using AWS, Azure, and GCP services Proficient in AWS Services including EC2, S3, DynamoDB, Glue, and Autoscaling, leveraging cloud infrastructure for scalable and reliable data processing Extensive experience working with Hadoop ecosystem (including HDFS, MapReduce, Hive, HBase) and Spark for large-scale data processing and transformation Skilled in developing data solutions using Azure Data Factory, Databricks, Azure Data Lake, and Azure SQL, improving data accessibility and performance Expertise in GCP tools like BigQuery, DataProc, DataFlow, and Cloud Composer, enabling seamless data integration and analysis Strong background in data modeling, data cleaning, and data visualization using tools like SQL, Python, Power BI, and Looker, providing actionable insights to stakeholders EXPERIENCEAWS data EngineerAbbvieJan '22 PresentVernon Hills, United StatesResponsibilities:Designed and implemented ETL pipelines using AWS Glue and Talend, transforming large data sets from various sources to Snowflake, improving data flow efficiency for analytics and reporting Built and maintained data pipelines in AWS using services like EC2, S3, Data Pipelines, DynamoDB, streamlining data integration processesDeveloped and deployed scalable Apache Spark applications on Databricks, optimizing data transformation, processing, and enhancing performance for business reporting Leveraged AWS S3 and Snowflake for data storage and retrieval, automating data movement using AWS Glue and Lambda, enhancing data availability and reliability Created UNIX Shell Scripts to schedule, execute, and automate data integration tasks, reducing manual errors and enhancing consistencyBuilt Hadoop-based data workflows utilizing MapReduce, HDFS, and Oozie to ensure effective processing of large datasets, contributing to better business decisions Configured and deployed containerized data applications using Docker and orchestrated with Kubernetes to manage scalable infrastructureDeveloped microservices using Spring Boot and integrated them into the data pipeline to enhance modularity and ease of scalingEmployed Apache Hive and Impala to write optimized queries for data retrieval, reducing query run time and enhancing report generation speedWorked on AWS Autoscaling to ensure high availability and cost optimization of the data processing clusters, maintaining data pipeline performance standards Set up and maintained CI/CD pipelines using Jenkins and Bamboo, incorporating automated testing and deployment practices to ensure smooth releases Utilized Git for version control of code and configuration files, ensuring collaborative development practices and smooth project updatesCreated and deployed Python scripts to process, cleanse, and validate raw data, transforming it into consumable formats for further analysisDesigned and implemented AWS-based data solutions with DynamoDB, ensuring data consistency and reliability across multiple business applications Collaborated with DevOps teams to configure Ansible for deployment automation and managed environments in Tomcat and JBoss, supporting high availability of applications Engaged in data migration using Sqoop to integrate data from PostgreSQL to Hadoop, enabling seamless transition and historical data accessibility. Environment: AWS Services EC2, S3, Autoscaling, Scala, AWS, Elastic Search, Snowflake, DynamoDB, UNIX Shell Scripting, AWS S3, AWS Glue, Hadoop (HDFS, MapReduce), AWS Data Pipe Lines, AWS Glue, DevOps, jQuery, Tomcat, Apache, Jenkins, Ansible, Python, Shell Scripting, PowerShell, GIT, Microservice, Jira, JBOSS, Bamboo, Kubernetes, Docker, Databricks, Spark, Talend, Impala, Hive, PostgreSQL, Jenkins, Nefi, Scala, Mongo DB, Cassandra, Python, Pig, Sqoop, Hibernate, spring, Oozie. GCP Data EngineerThomson ReutersApr '19 Dec '21Eagan, United StatesResponsibilities:Developed and implemented ETL pipelines using GCS, GCP BigQuery, and GCP DataFlow, enhancing the transformation and storage of large datasets for analytics Utilized Cloud Composer to create and manage Airflow DAGs, automating data workflows and optimizing schedule consistency for data ingestion Engineered data lakes using GCS Buckets and GCP DataProc, enabling scalable data processing and supporting data integration across multiple sources Built and optimized data models in GCP BigQuery and Cloud Spanner for both OLAP and OLTP use cases, improving data query performance and reliability Implemented data warehousing solutions leveraging GCP BigTable and Teradata, providing reliable and scalable infrastructure for analytics and reporting Designed streaming data solutions using Pub/Sub and DataStream, enabling real-time data integration for business-critical insightsDeveloped Python and Pyspark scripts for data transformation and data cleaning, improving data quality and reducing manual errorsManaged data processing jobs using Apache Beam on GCP DataFlow, ensuring efficient handling of batch and streaming dataBuilt containerized data processing environments using Docker and orchestrated them with Kubernetes, achieving seamless scalability for applications Implemented data visualization dashboards using Power BI and Looker, providing insights through interactive and dynamic reportsDeveloped data extraction jobs using Sqoop to migrate data from Oracle to GCP BigQuery, enabling consolidated analysis across datasetsConfigured Kafka for data streaming and integration, ensuring real-time data availability for critical business use casesEmployed DBT for data modeling and transformation, standardizing datasets across the data warehouse for effective analyticsImplemented IAM policies to manage access control and security across GCP resources, ensuring compliance with organizational policiesWorked in Agile environments, using Bitbucket and GIT for version control and collaboration, ensuring efficient and error-free deploymentsConducted unit testing of data pipelines and transformations to ensure reliability, quality, and consistency throughout development. Environment: GCS, GCS Buckets, GCP BigQuery, GCP BigTable, GCP DataFlow, GCP DataProc, DataStream, Airflow/Cloud Composer, Pub/Sub, Cloud Spanner, Cloud Functions, Cloud Shell, DBT(Data build tool), Data Modeling, Data Mining, Data Cleaning, Data Visualization, DAGs, Yarn, OLAP, OLTP, Avro, Parquet, ORC, JSON, Apache Beam, Power BI/Looker, IAM policies, ETL, Oracle, SQL, Teradata, Talend, Docker, Kubernetes, Kafka, Jenkins, Python, Scala, Java, Pyspark, Hadoop, Hive, HBase, HDFS, Unit testing, Map Reduce, Sqoop, Unix, Shell Scripting, Agile, GIT, Bitbucket.Azure Data EngineerMacy'sJun '17 Mar '19New York, United StatesResponsibilities:Developed and managed ETL pipelines using Azure Data Factory and Azure Databricks, enabling efficient data transformation and integration across various business units Implemented data workflows utilizing Azure Data Lake and Blob Storage, improving data availability for analytics and reducing data retrieval time Designed and deployed data models on Snowflake and Azure SQL, supporting both transactional and analytical processing for business intelligence needs Utilized Apache Kafka to manage real-time data streams, ensuring seamless integration between various applications and maintaining high data accuracy Built interactive Power BI dashboards to visualize key metrics and KPIs, enabling stakeholders to make informed business decisionsCreated Python scripts for data cleaning and transformation tasks, enhancing data quality and streamlining data preprocessing stepsLeveraged Azure Databricks and Spark to process large datasets, improving data transformation speed and reducing processing costsConfigured GitHub for version control, managing code changes effectively, and collaborating with cross-functional teams for seamless integrationDeveloped Shell scripts for automation of routine tasks, reducing manual errors and improving data pipeline reliabilityWorked on Azure SQL MI and Cosmos DB to store and manage unstructured and structured data, providing scalable and reliable data storage solutions Collaborated with Azure DevOps for continuous integration and deployment, automating the deployment of data solutions and reducing release cycle time Utilized Jenkins and Ansible to automate deployment processes, ensuring smooth delivery of data applications and eliminating manual configuration errors Employed MapReduce and HDFS for distributed data processing, enabling efficient handling of large-scale data within Cloudera environmentCreated U-SQL jobs in Scope Studio to process and analyze big data, integrating disparate data sources for meaningful insightsUsed SQL Server and Azure SQL to develop and optimize complex queries for data extraction, ensuring high performance and quick data retrieval for reports Developed Azure Function Apps and WebApps to handle event-driven data processing, improving the overall responsiveness and reliability of data solutions Environment: Azure DevOps, Git, Snowflake, Maven, Jira,Apache Kafka, Azure, Python, power BI, Unix, SQL Server, Cosmos, Scope Studio, U-SQL, C#, Azure Databricks GIT Hub, Iris Studio, Cauce, Kensho, SharePoint, Windows 10, Azure, Red Hat Linux, MS Azure, Jenkins, Ansible, Shell Scripting, Azure Data Factory, Azure Databricks and Azure Data Lake Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, ACR, Azure Function App, Azure WebApp, Azure SQL, and Azure SQL MI, SSH, YAML, WebLogic, Python.Data AnalystWalking Tree Technologies Aug '14 Feb '17Responsibilities:Developed and maintained ETL processes using Informatica 6.1, ensuring seamless data integration and migration across multiple systemsUtilized Data Flux for data quality management, implementing data cleansing and validation processes to enhance data accuracy and reliability Created and optimized complex SQL queries to extract insights from large datasets stored in Oracle 9i and Teradata databasesConducted data analysis and data profiling using TOAD and PL/SQL to identify paterns, trends, and anomalies, supporting informed decision-making Collaborated with business stakeholders to gather and document data requirements, ensuring alignment with business objectives and strategiesGenerated reports and dashboards using Quality Center 7.2 to visualize data findings and present actionable insights to management and stakeholders Designed and implemented data models to support data warehousing and reporting needs, improving data accessibility for business usersConducted root cause analysis on data issues, using flat files and database logs to troubleshoot and resolve discrepancies and errorsWorked closely with cross-functional teams to ensure data integrity and consistency across various sources and systemsImplemented data governance and data quality best practices to ensure compliance with industry standards and regulatory requirements, maintaining high data quality standards Environment: Quality Center 7.2, SQL, TOAD, PL/SQL, Flat Files, Teradata, Informatica 6.1, Data Flux, Oracle 9i.SKILLSCloud Technologies AWS Services (EC2, S3, Autoscaling, AWS Glue, AWS Data Pipelines), Snowflake, GCP (BigQuery, BigTable, DataFlow, DataProc, GCS, GCS Buckets), Azure (Data Factory, Databricks, Data Lake, Cosmos DB, SQL, Blob, Function App, WebApp), Cloud Composer, Cloud Spanner, Cloud Functions, Cloud Shell, Azure DevOps, MS Azure, ACR Data Processing Hadoop (HDFS, MapReduce), Spark, Databricks, Talend, Airflow/Cloud Composer, Apache Beam, Pig, Sqoop, Flume, Impala, Hive, HBase, Nefi, Kafka, DataStream, Yarn Programming Python, Scala, Java, Shell Scripting, PowerShell, C#, U-SQL Databases DynamoDB, PostgreSQL, MongoDB, Cassandra, Oracle (9i, SQL, PL/SQL), Teradata, SQL Server, Cosmos DB, Azure SQL, Azure SQL MIETL & Integration Informatica 6.1, Talend, DBT (Data Build Tool), ETL, Data Modeling, Data Mining, Data Cleaning, Data Visualization |