Data Engineer Senior Resume Irving, TX

Data Engineer Senior Resume Irving, TX
Resumes | Register
Candidate Information
Title	Data Engineer Senior
Target Location	US-TX-Irving
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Senior Plano, TX
Data Engineer Senior Denton, TX
Senior Data Engineer Irving, TX
Senior Data Engineer Frisco, TX
Data Engineer Senior Dallas, TX
Principal Machine Learning Engineer | Senior Data Scientist Dallas, TX
Data Engineer Senior Plano, TX
Click here or scroll down to respond to this candidate
Name: Bibek YadavEmail: EMAIL AVAILABLEPh#: PHONE NUMBER AVAILABLEProfessional Summary:Over 5+ years of professional IT experience in Senior Data Engineer with Data Analysis, Design, coding and Development of Data Warehousing implementations across Retail, Financial and Banking Industries.Skilled in managing data analytics, data processing, machine learning, artificial intelligence, and data-driven projects.Proficient in handling and ingesting terabytes of Streaming data (Kafka, Spark streaming, Strom), Batch Data, Automation.Good at Manage hosting plans for Azure Infrastructure, implementing & deploying workloads on Azure virtual machines (VMs).Skilled in data ingestion, extraction, and transformation using ETL processes with AWS Glue, Lambda, AWS EMR, and Azure Data Bricks.Proficiency in designing scalable and efficient data architectures on Azure, leveraging services like Azure Data Lake, Azure Data Factory, Azure Data Bricks, Azure SynapseDB, and PowerBi.Experience in developing scripts using Python, Shell Scripting to do Extract, Load and Transform data working knowledge of AWS Redshift.Expertise with the tools in Hadoop Ecosystem including Spark, Hive, HDFS, MapReduce, Sqoop, Kafka, Yarn, Oozie, and HBase.Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.Experience working with Snowflake Multi cluster and virtual warehouses in Snowflake.Extensive experience involving the migration of on-premises data into the cloud, along with the implementation of CI/CD pipelines like Jenkins, Code Pipeline, Azure DevOps, Kubernetes, Docker and GitHub.Experience automating data engineering pipelines utilizing proper standards and best practices (right partitioning, right file formats, incremental loads by maintaining previous state etc.,)Experience in designing and developing production ready data processing applications in Spark using Scala/Python.Strong experience creating the most efficient Spark applications for performing various kinds of data transformations like data cleansing, de-normalization, various kinds of joins, data aggregation.Experience fine-tuning Spark applications utilizing various concepts like Broadcasting, increasing shuffle parallelism, caching/persisting Data Frames, sizing executors appropriately to utilize the available resources in the cluster effectively etc.,Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers, and strong experience in writing complex queries for Oracle.Experience in support activities like troubleshooting, performance monitoring and resolving production incidents.Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.Ability to work closely with teams, to ensure high quality and timely delivery of build and release.Excellent communication skills, with an ability to understand the concepts and technical and non-technical requirements.Technical Skills:Big Data SystemsAmazon Web Services (AWS), Azure, Google Cloud Platform (GCP), Cloudera Hadoop, Hortonworks Hadoop, Apache Spark, Spark Streaming, Apache Kafka, Hive, Amazon S3, AWS KinesisDatabasesCassandra, HBase, DynamoDB, MongoDB, BigQuery, SQL, Hive, MySQL, Oracle, PL/SQL, RDBMS, AWS Redshift, Amazon RDS, Teradata, SnowflakeProgramming & ScriptingPython, Scala, PySpark, SQL, Java, BashETL Data PipelinesApache Airflow, Sqoop, Flume, Apache Kafka, DBT, Pentaho, SSISVisualizationTableau, Power BI, Quick Sight, Looker, KibanaCluster SecurityKerberos, Ranger, IAM, VPCCloud PlatformsAWS, GCP, AzureCI/CD ToolsJenkins, GitHub, GitLabOperating SystemsWindows, Linux, Unix, Mac OS XProfessional Summary:Client: American Airlines, Dallas, TX Jul 2023  Till Date Jul 2022  Till DateRole: Data EngineerResponsibilities:Responsible for processes to improve reliability processes that increase efficiency, eliminate downtime, and maintain performance at scale at all platforms and environments.Responsible to Build the ETL Pipelines (Extract, Transform, Load) from data lake to different databases based on the requirements.Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing, and analytics using inbuilt libraries in Azure Databricks.Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.Install and Configure Apache Airflow for S3 bucket and snowflake data warehouse and created DAGs to run the Airflow.Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.Built and managed data pipelines using Azure Data Factory and Azure Data Bricks ensuring efficient and reliable data processing and analysis workflows.Involved in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.Created and maintained user accounts, roles on Jira, MySQL, production and staging servers.Involved in Data Architecture, Data profiling, Data analysis, data mapping and Data architecture artifacts design.Created data models and schema designs for Snowflake data warehouses to support complex analytical queries and reporting.Worked on migrating data from AWS Redshift to properly partitioned dataset on AWS S3.Developed multiple Kafka Producers and Consumers as per the software requirement specifications.Utilized Kafka pub-sub model for tacking real-time events in the data records to trigger processes for data orchestration.Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.Creating SSIS Packages by using different Control Flow Tasks like Data Flow Task, Execute SQL Task, Sequence Container, For Each Loop Container, Send Mail Task, Analysis Service Process task.Converted SQL queries into Spark transformations using Spark RDDs, Python, PySpark, and Scala.Create several types of data visualizations using Python and Tableau.Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI and optimized the DAX queries.Automated advanced SQL queries and ETL techniques using Apache Airflow to reduce boring weekly administration tasks.Extracted data from source like SQL Server Databases, SQL Server Analysis Services Cube, Excel and loaded into the target MS SQL Server database.Developed and implemented Software Release Management strategies for various applications according to the agile process.Participated in daily stand-up meetings to update the project status with the internal Dev team.Environment: Kafka, Spark, AWS, Azure, Python, Scala, Airflow, ETL, SSIS, Redshift, Data Factory, Data Bricks, Jira, SQL, Snowflake, Power BI, Data Cleaning, Data Profiling, Data Mining and Windows.Client: Vanguard, Pennsylvania, PA Jan 2023  Jun 2023Role: Data EngineerResponsibilities:Designed and built a data pipeline to consolidate similar products without using unique ids, using Word2Vec, Spark, Snowflake and Airflow.Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.Developed ETL workflows using Azure Data Factory, Azure Synapse Analytics, and Azure Logic Apps to efficiently load big data sets into the data warehouse.Leveraged SQL scripting for data modeling, enabling streamlined data querying and reporting capabilities, which contributed to improved insights into customer data.Collaborated with end users to resolve data and performance-related issues during the on boarding of new users.Developed Airflow pipelines to efficiently load data from multiple sources into Redshift and monitored job schedules.Successfully migrated data from Teradata to AWS, improving data accessibility and cost efficiency.Worked on migrating the reports and dashboards from OBIEE to Power BI.Assisted multiple users from data viz team to get connected to Redshift from Power BI, Power apps, excel, Spotfire, Python.Developed connections for Tableau Application to core and peripheral data sources like Flat files, Microsoft Excel, Tableau Server, Amazon Redshift Database, Microsoft SQL Server, etc. to Analyze complicated data.Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.Used Spark SQL and Data Frames API to load structured and semi-structured data from MySQL tables into Spark Clusters.Implemented ETL processes to transform and cleanse data as it moves between MySQL and NoSQL databases.Leverage PySpark's capabilities for data manipulation, aggregation, and filtering to prepare data for further processing.Joined, manipulated, and drew actionable insights from large data sources using Python and SQL.Develop PySpark ETL pipelines to cleanse, transform, and enrich the raw data.Ingested large data streams from company REST APIs into EMR cluster through AWS kinesis.Streamed data from AWS Fully Managed Kafka brokers using Spark Streaming and processed the data using explode transformations.Created data models and schema designs for Snowflake data warehouses to support complex analytical queries and reporting.Built data ingestion pipelines (Snowflake staging) using disparate sources and other data formats to enable real-time data processing and analysis.Used Kubernetes to orchestrate the deployment, scaling, and management of Docker containers.Finalized the data pipeline using DynamoDB as a NoSQL storage option.Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applicationsPlanned and execute data migration strategies for transferring data from legacy systems to MySQL and NoSQL databases.Actively participant in scrum meetings, reporting the progress and maintain good communication with each team member and mangers.Environment: Data Warehouse, AirFlow, Kafka, Spark, MapReduce, Hadoop, Snowflake, Hive, Azure, PySpark, Docker, Kubernates, AWS, MangoDB, CI/CD, Tableau, Redshift, Power BI, Rest APIS, Teradata, GCP, Windows.Client: Western Union, Milwaukee, WI Nov 2020  Aug 2022Role: Data EngineerResponsibilities:Participated in daily agile stand-up meetings, updating the internal Dev team on project statuses, and collaborated through Palantir Foundry for real-time data management and decision-making processes.Designed and developed web app BI for performance analytics.Orchestrated Airflow/workflow in hybrid cloud environment from local on-premises server to the cloud.Wrote Shell FTP scripts for migrating data to AWS S3.Analyzed large amounts of data sets to determine optimal way to aggregate and report on them.Used Oozie workflow engine to manage interdependent Hadoop jobs and automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.Produced scripts for doing transformations using Scala/Java.Installed, configured, and monitored Apache Airflow cluster.Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.Wrote Shell scripts to orchestrate execution of other scripts and move the data files within and outside of HDFS.Created Spark applications using Pyspark and Spark-SQL for extracting, transforming, and aggregating data from multiple file formats, uncovering valuable insights into customer usage patterns.Designed Python-based notebooks for automated weekly, monthly, quarterly reporting ETL.Migrated various Hive UDFs and queries into Spark SQL for faster requests.Designed the backend database and AWS cloud infrastructure for maintaining company proprietary data.Used Sqoop to import data from Oracle to Hadoop.Wrote simple SQL scripts on the final database to prepare data for visualization with Tableau.Created Airflow Scheduling scripts in Python to automate data pipeline and data transfer.Implemented AWS Fully Managed Kafka streaming to send data streams from the company APIs to Spark cluster in AWS Databricks, Redshift, Glue and Lambda/Python.Configured AWS Lambda for triggering parallel Cron jobs scheduler for scraping and transforming data.Use Cloudera Manager for installation and management of a multi-node Hadoop cluster.Scheduled jobs using Control-M.Environment: Hadoop, Hive, Sqoop, Apache Spark, Kafka, Redshift, Azure DataBricks, Airflow, Python, Scala, Cloudera Manager, Shell, Glue, Tableau and Windows.Client: Verisk, Nepal Apr 2018  Oct 2020Role: Data EngineerResponsibilities:Handled data transformations based on the requirements.Created error reprocessing framework to handle errors during subsequent loads.Handled application log data by creating customer loggers.ADesigned and built a custom and genetic ETL framework  Spark application using Scala.Configured spark jobs for weekly and monthly executions using Amazon data pipeline.Executed queries using Spark SQL for complex joins and data validation.Developed Complex transformations Mapplets using Informatica to Extract Transform and Load Data into Data marts Enterprise Data warehouse EDW and Operational data store ODS.Created SSIS package to get the dynamic source filename using For Each Loop Container.Used the Lookup, Merge, Data conversion, sort etc. Data flow transformations in SSIS.Built continuous ETL pipeline by using Kafka, Spark streaming and HDFS.Performed ETL on data from various file formats (JSON, Parquet and Database).Created independent components for AWS S3 connections and extracted data into Redshift.Involved in writing Scala scripts for extracting from Cassandra Operational Data Store tables for comparing with legacy system data.Worked on data ingestion file validation component for threshold levels, last modified and checksum.Leveraged OLAP tools, including ETL, Data Warehousing and Modelling to extract, transform, and load data between SQL Server and Oracle databases, employing Informatica/SSIS for seamless data integration.Actively participated in meetings with user groups to analyze requirements and provide recommendations for design and specification enhancements, ensuring solutions aligned with user needs.Environment: Spark, Scala, AWS, S3, Cassandra, Redshift, Shell scripting, SSIS, Kafka, OLAP, Informatica, ETL.
Respond to this candidate
Your Message
Please type the code shown in the image: