| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateSuman Tyagi PhoneNo-PHONE NUMBER AVAILABLESenior Data Engineer EMAIL AVAILABLEVisa Status- H1BSummaryA technology enthusiast with a knack for finding innovative solutions for complex problems.More than 12+ years of IT experience. Expert in Data and ETL.Experienced in building Generic ETL frameworks using traditional and cutting-edge Big Data and Cloud platforms.Extensively worked using AWS Redshift, AWS CDK and Glue.Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer.Good Knowledge with cloud technologies like Azure and AWS, Databricks (EMR, S3, Redshift, EC2, DynamoDB)Develop a Python script to transfer data from on-premises to AWS S3.Designed and implemented a data extraction from SFTP Server to S3 Bucket thru maven/spring boot.Proficient in writing optimized SQLs in Oracle, datastage Hive, Impala, HDFS.Strong communication and leadership skills, experienced in working directly with customers and business users.Passionate about mentoring and leading young technologists and bringing out the best of their potentialA Senior ETL engineer with technical proficiency in the Data Engineering teamed with Data Analysis, BI, Data Modelling, ETL/ELT design & Development with full life cycle Implementation of Data Warehouse.Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases.Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.Understanding of Snowflake cloud technology.Experience integrating data to/from On-premise database and cloud-based database using Informatica Intelligent Cloud Services (IICS/IDMC).Expert in automating the work by writing Python & Shell scripts.Ability to architect a Data Engineering/ETL/ELT solution and data conversion strategy, Project experience with developing complex SQL query, Project experience with dimensional data model, Strong communication skill to work with client, business users and team membersExtensive experience in using SQL, creating stored procedures, complex joins, aggregate functions, materialized views, indexing in Oracle.Strong experience in database skills in IBM- DB2, Oracle and datastage in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.Capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.Worked on AWS Services such as API Gateway, Lambda, ElasticSearchGood understanding and exposure to Python programming.Experience in monitoring/supporting EOD jobs for all environments i.e., Dev, QC, Pre-Prod, Production Environments.Extensive experience in managing teams Onsite/Offshore coordination, Requirement Analysis, Code reviews, implementing standards, Production release management.Worked with wide variety of sources like Relational Databases, Flat Files, Mainframes, XML files and Scheduling tools like Control-M.Experienced Developer with good knowledge in Unix Shell Script, Linux, bash and experienced in automation shell scripting.Involved in many AWS CI/CD implementations.Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients-facing meetings with great communication skills.Excellent working and leading skills on onsite offshore model.Technical SkillsProgramming LanguagesPython, PySpark, Pandas,SQL, PL/SQL, Unix Shell Scripting, Batch Scripting, PowerShellData Engineering Tools/PlatformsSpark, Spark SQL, Google Cloud Storage, Amazon S3, Hadoop, Hive, Sqoop, Platforms, Oracle,InformaticaCloudAWS, Azure (Entry Level)Infrastructure & OrchestrationGoogle Cloud Platform (GCP), Kubernetes, Docker, Apache Airflow, Control-M, Autosys, Jenkins,YARNOperating SystemsWindows, UNIX & LINUX.Hadoop Scheduling &Monitoring ToolsAirflow, ASG Zena, Autosys, Oozee & Control-M.DatabasesOracle, SQL Server, Hive.OtherGit, GitLab, BitBucket, Jira, Confluence, MS SharePoint, HP ALM, Tableau, ToadEducationB.Tech (Computer Science & Engineering) RTU Jaipur Jul2008 to Jul2012Experience Details:Company- Deutsch Bank Feb2023 to till dateProject Name- SIMS/SIMS ProteusLocation- USARole- Lead Data EngineerResponsibilities:Lead Hadoop migration effort, creating frameworks to transfer assets from on premise Hadoop to GCPInspected and analyzed existing Hadoop environments for migration effort to Google cloud service and involved in full life cycle of project from documentation to development.Extracted data from various data sources using SQOOP and Data factory and worked on migration of large amount of data to Google cloud Storage.Build Datasets and tables in Big Query and loading data from cloud storage.Implemented Airflow DAG for task execution and monitoring process Converted and modified Hive queries to use in Big Query and Performed data cleaning on unstructured information using various tools Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses for use of included legacy assetsWork closely with team and identify, capture and communicate issues and risks promptly Contributed ideas and suggestions in team meetings and delivered updates on deadlines, designs and enhancementsCollaborated with cross-functional development team members to analyses potential system solutions based on evolving client requirements. Developed highly maintainable Hadoop code and followed all best practices regarding codingWorking as a L3 support for production stability.Capable at using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.Worked on AWS Services such as API Gateway, Lambda, ElasticSearch, AWS CDK and Glue.Worked on Palantir platform for managing data.Worked on Aerospike for better performance.Worked on Tableau.Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.Deployed and managed Scala-based microservices on cloud platforms such as AWSImplemented data processing pipelines using Scala and Apache SparkWorked with Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in Azure.Developed real-time data streaming pipelines using Apache Kafka to process and transport dataInvolved in successfully loading files to Hive, Impala and HDFS from Soup UI using Python.Developed custom airflow operators using Python to generate and load CSV files into GS from SQL Server and Oracle databases.Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Math plot library.Worked on different format such as JSON, XML and performed data analysis in Python.Designed and implemented a data extraction from MySQL Workbench to S3 Bucket.Company- Zensar Jul2019 to Nov2022Project Name- MCBSLocation-IndiaRole- Data EngineerResponsibilities:Interacted with end customers and gathering requirements for Designing and developing common architecture for storing Retail data within Enterprise and building Data Lake in Azure cloudDeveloped Geo Tracker applications using PySpark to integrate data coming from other sources like ftp, cs files processed using Azure Databricks and written into SnowflakeDeveloped Spark application, for data extraction, transformation and aggregation from multiple systems and stored on Azure Data Lake Storage using Azure Databricks notebooks Worked on Spark with Scala and converted into Pyspark Code for Geo trackerWritten Unzip and decode functions using Spark with Scala and parsing the ml files into Azure blog storageDeveloped PySpark scripts from source system like Azure Event Hub to ingest data in reload, append, and merge mode into Delta tables in DatabricksOptimized PySpark applications on Databricks, which yielded a significant amount of cost reductionCreated Pipelines in ADF to copy parquet files from ADLS Gen2 location to Azure Synapse Analytics Data WarehouseEnvironment: Azure ADF, Scala, Pyspark. Spark, SQL, Snowflake, Databricks, GitHub, Azure Git, Kafka, ADF Gen2, ADF Blog StorageWorking as a L3 support for production stability.Designed and developed a pipeline for processing millions of records from Redshift, Snowflake, MySQL and Posted into Marketo thru Python (Postman) thru HDFS(Hadoop).Involved in successfully loading files to Hive, Impala and HDFS from Soup UI using Python.Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.Designed, deployed, and managed highly available and scalable Apache Kafka clusters for real-time data streamingExtract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory Palantir Foundry, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services- (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Data bricksCompany- Virtusa Nov2016 to Jul2019Project Name- IFRS15Location-IndiaRole- ETL DeveloperResponsibilities:Gathering and analyzing requirements by handling client & business user interactions.Coordinating with customers business testing team and discussing technical problems, reported bugs, interface design, mapping designs and implementation framework of the back feed system.Create ETL design/technical documentation from functional specifications.Expertise in developing mappings, workflows, monitoring sessions, export & import infa objects with configuration management.Experience in implementing the complex business rules by creating transformations, re-usable transformations (Expression, Aggregator, Filter, Connected and Unconnected Lookup, Router, Rank, Joiner, Update Strategy, xml, Union, Stored Procedure) and developing Mapplets and Mappings aligned to cater complex levels of business enrichments.Design complex mappings involving constraint based loading, target load order.Designed workflows with many sessions with decision, assignment task, event wait, and event raise tasks, used Informatica scheduler to schedule jobs.Experience in performance tuning of Informatica Sources, Targets, Mappings, and Transformations & Sessions.Preparing HLD, LLD, functional & integration spec.Design, development and deployment of ETL and DB2/Oracle components.Perform debugging & unit testing at various levels of the ETL.Created Several Stored Procedures perform DML into audit tables for business decision making sourced via open source reporting tool.Extensive experience in developing distributed data processing pipelines using Apache Spark with PySpark.Experience in tuning PySpark jobs for performance, including optimizing resource utilization, caching, and partitioning data efficiently.Worked with various file formats such as Parquet, Avro, ORC, and JSON within PySpark for efficient data storage and retrieval.Project Name- IBIE PROFCompany- Starsun Technology Jul2012 to Nov2016Location-IndiaRole- ETL DeveloperResponsibilities:Requirement gathering and analysisDoing Estimation using Function Point Analysis.Creation of source and target stages, mappings, sessions and related workflows.Unit testing the designed code to meet the desired results and creation of Test Reports.Analysis and fixing of the defects rose in system testing and User Acceptance Testing.Providing support and knowledge transfer to the Production and System Testing team regarding the requirements. |