| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Name: Candidate's Name
Contact: PHONE NUMBER AVAILABLE Email: EMAIL AVAILABLE
Professional Summary: 8+ years of IT experience in implementing the end-to-end Data Warehouse and Hands on Experience in Big Data Engineering and Cloud engineering. Experience in using Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NOSQL DB, Azure HDInsight, Big Data Technologies (Hadoop and Apache Spark) and Databricks. Experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family. Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm. Extensive Knowledge on developing Spark Streaming jobs by developing RDD s (Resilient Distributed Datasets) using Scala, PySpark and Spark-Shell. Configured Spark streaming to receive real - time data from messaging system like Apache Kafka Experience with different file formats like Avro, parquet, Json, and XML. Expertise in Python, AWS Cloud, Azure Data Lake, GCP , Data Model, Data Warehouse and BI Reporting, Data warehousing, Data engineering, Feature engineering, big data, ETL/ELT, specialize in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem.
Hands on experience in creating tables, views, and stored procedures in Snowflake Experience in using job monitoring and scheduling tools like Zookeeper and OOZIE. Experience with AWS Cloud platform and its features which include Amazon AWS Cloud Administration and services including EC2, S3, EBS, VPC, ELB, IAM, Glue, Crawler, Spectrum, SNS, Autoscaling, LAMBDA, Cloud Watch, Cloud Trail, Cloud Formation. Experience with cloud deployment and managing configurations with tools like Puppet and Google Cloud Platform. Experience in building ETL scripts in different languages like PLSQL, Informatica, Hive, Pig and PySpark and expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie. Strong experience in writing Pig scripts and Hive Queries and Spark SQL queries to analyze large datasets and troubleshooting errors and expertise in loading and transforming large sets of structured, semi-structured and unstructured data. Experience with PySpark for using Spark libraries by using python scripting for data analysis. Good understanding of data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, Snowflake Schema Modelling, Fact and Dimension tables. Have sound knowledge of designing data warehousing applications using tools like Teradata, Oracle, and SQL Server. Experience in maintaining and managing Microsoft Power BI and Tableau reports, dashboards, and publishing to the end users for Executive level Business Decision. Experience in developing rich interactive Tableau visualizations using Heat Maps, Tree Maps, Bubble, Histograms, Pareto, Bullet, Donut, Waterfall, Highlight, Maps, Line and Bar Charts. Extensive experience in working with Informatica PowerCenter, SAS, SSIS Working knowledge on Blue Green Deployments to maintain Zero Down time while new Production Release. And maintained Branching Standards in Git-Flow, Worked on Agile/Scrum/Waterfall models. Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analysing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG. Experience in using Python for Data Engineering and Modeling. Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web-based dashboards and reports. Expertise in OLTP/OLAP System Study, Analysis, and E-R modeling, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional, and multidimensional modeling. Seeking to leverage technical skills and industry expertise to contribute to innovative data projects.Technical Skills:Professional Experience:Client: CVS Health, Irving, TX Mar 2022 Till DateSr. Data EngineerResponsibilities:
Supporting migration of Hadoop cluster which includes migration of feeds, ETL processes (Ab-Initio), scripts and Autosys. Managing and deploying Azure solutions, Configuring Azure services to meet the needs of the organization. Designing and managing swap space in the cloud environment. Designing robust code from the point of view of performance, reuse and supportability, proper controls, consistent with best practices and with appropriate documentation. Designed and implemented MapReduce based large-scale parallel relation-learning system. Strong development experience with notable BI reporting tools (Oracle BI Enterprise Edition (OBIEE)). Strong analytical skills and enjoys working with large complex data sets. Data migration from Informatica PowerCenter to Amazon Web Services (AWS). Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS. Experience in Developing ETL solutions using Spark SQL in Azure Databricks for data extraction, transformation and aggregation from multiple file formats and data sources for analyzing and transforming the data. Created Pipelines in ADF using Linked Services/ Datasets/Pipeline to Extract, Transform and Load data from different sources like Worked on migration of legacy Data Warehouse from Oracle/SQL Server/Netezza to central Data Lake AWS cloud/Snowflake using Talend and Informatica. Developed PySpark frame to bring data from DB2 to Amazon S3. Azure SQL, Blob storage, Azure SQL Data warehouse, write back tool and backwards. Designed and developed Business Intelligence applications using Azure SQL, Power BI. Monitoring and optimizing Snowflake performance. Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns. Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning. Developed and automated data pipelines for acquiring Bloomberg terminal data via API integrations, ensuring seamless data ingestion and transformation into enterprise data platforms for downstream analysis. Ensured data accuracy and integrity by implementing data quality checks and validation processes during the transformation of Bloomberg datasets, ensuring compliance with financial regulations and business standards. Develop and maintain ETL workflows using AWS Step Functions and other workflows tools. Built ETL/ELT pipeline in data technologies like PySpark, Hive, Presto, and Databricks. Worked on migration of legacy Data Warehouse from Oracle/SQL Server/Netezza to central Data Lake AWS cloud/Snowflake using Talend and Informatica. Traced and catalogue data processes, transformation logic and manual adjustments to identify data governance issues. Linked data lineage to data quality and business glossary work within the overall data governance program. Implemented ETL solutions between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.Environment: Python, PyMySQL, Informatica PowerCenter, MySQL, Linux, SQL, RDBMS, Agile, Scrum, Git, AWS, Hadoop, Spark/Scala, Sqoop, Pig, Flume, Hive, MapReduce and HDFS, Azure, DataBricks, DBT.Client: Verizon, NYC, NY Aug 2021 Feb 2022
Sr. AWS Data EngineerResponsibilities:
Collaborated with Architects and Subject Matter Experts to review business requirements and build sources to target data mapping documents and pipeline design documents. Worked in an Agile environment with weekly Sprints and daily Scrum meetings. Used AWS data pipeline for Data Extraction, Transformation and Loading from heterogeneous data sources. Design, develop, and deploy convergent mediation platform for data collection and billing process using Talend ETL Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Advance auto AWS S3 bucket. Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend and Redshift Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package. Involved and worked on Python Open stack API's and used several python libraries such as wxPython, NumPy and matplotlib. Designed and developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it. Implemented Spark RDD transformations to Map business analysis and applied actions on top of transformations. Mention your experience in data modeling, including designing and implementing data models using dbt. Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production. Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, Snowflake DB and Snowflake. Implemented AWS Lambda functions to run scripts in response to events in Snowflake DB table or S3 bucket or to HTTP requests using Amazon API gateway. Used SQL for Querying the database in UNIX environment. Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift. Migrated data from Snowflake DB to Redshift to support Data Science initiatives. Created augmented Data Lakes using Snowflake DB. Optimized Bloomberg data acquisition processes by using API calls and data transformation techniques, reducing latency and providing stakeholders with faster access to up-to-date financial market data. Designed, developed, and maintained scalable ETL data pipelines in Databricks, integrating data from diverse sources such as databases, APIs, cloud storage, and streaming data into the Databricks platform. Used Python to convert Hive/SQL queries into RDD transformations in Apache Spark. Improved the Performance by tuning HIVE and map-reduce.
Knowledge of MongoDB drivers for popular programming languages such as Python, Java, and Node.js.
Ability to optimize MongoDB performance by configuring indexes, sharding, and replication. Created talend quality checks job lets based on business requirements. Created SQL queries for data extraction and transformation to make certain that the data is properly extracted for building Tableau dashboards. Involved in creating dashboards and reports in Tableau and Monitored server activities, user activity and customized views. Implemented Data Governance using Excel and Collibra. Participated in the Data Governance working group sessions to create Data Governance Policies. Worked with MDM systems team with respect to technical aspects and generating reports on Tableau. Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality. Work with the various agile development teams to standard branching and tagging of code in our repository and maintain code base integrity using Subversion (SVN), Git, Bitbucket, clear case and Team Foundation Server (TFS).Environment: HDFS, Map Reduce, AWS S3, Glue, Lambda, Matillion, Snowflake DB, Hive, Sqoop, Pig, Oozie Scheduler, Shell Scripts, Teradata, Oracle, Redshift, HBase, Cassandra, Cloudera, Kafka, Spark, Scala, Python, GitHub.Client: Regions Bank, Irving, TX June 2020 July 2021Role: Big Data EngineerResponsibilities: Interacted with business stakeholders, gathering requirements, and managing the delivery, covering the entire Data pipeline development life cycle. Worked on all phases of data integration development lifecycle, Realtime/batch data pipelines design and implementation, and support of Big Data ETL &Reporting track. Worked on migration of legacy Data Warehouse from Oracle/SQL Server/Netezza to central Data Lake AWS cloud/Snowflake using Talend and Informatica. Developed real time data pipelines for Risk & Payment systems using Spark, Kafka and Apache KUDU/Snowflake/Couch bases. Expertise in design, development and implementation of Enterprise Data Warehouse solutions using MediationZone DigitalRoute and Talend ETL Big Data Integration suite version 6.2. Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL and Writing SQL queries against Snowflake. Worked on Migrating Objects from Netezza to Snowflake. Worked on writing Python scripts to parse JSON documents and load the data into the database. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python. Developed Python scripts to automate data sampling process for Payment system. Developed PySpark frame to bring data from DB2 to Amazon S3. Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage. Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend. Worked on loading data from various sources into Hadoop and Cassandra using Kafka. Developed Tableau dashboards and Reports for Risk Fraud system using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart. Designed and prepared interactive and intuitive dashboards using Tableau to show the daily/weekly/monthly/yearly approval/decline transactions reports for Risk and Fraud systems. Involved in converting Hive/SQL queries into Spark transformation using Spark RDDs. Worked with Avro and Parquet file formats and used various compression techniques to leverage the Storage in HDFS. Mention your knowledge of data privacy and security best practices and how they can be applied to dbt projects. Created schema in Hive with performance optimization using bucketing & partitioning. Involved in fixing various data issues related to data quality, data availability and data stability.Environment: Cloudera Hadoop, MapReduce, Tableau, Informatica, Python, PySpark, HDFS, Zookeeper, Cassandra, HBase, Erwin, Spark-SQL, Scala, AWS EMR, S3, AWS Glue, Redshift, AWS Lambda, Snowflake, Kafka, SQL, PL/SQL, bitbucketClient: Macy's, Cincinnati, OH Jan 2018 June 2020Role: Big Data Developer
Responsibilities:
Loaded structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions. Implemented data ingestion and handling clusters in real time processing using Kafka. Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming. Built scalable distributed Hadoop cluster running Hortonworks Data Platform. Built dimensional modelling, data vault architecture on Snowflake. Serialized JSON data and storing the data into tables using Spark SQL. Used Spark Streaming for collecting data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase). Worked on Spark framework on both batch and real-time data processing. Developed programs for Spark streaming which takes the data from Kafka and pushes into different sources. Loaded the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned. Created different Pig scripts and converted them as shell command to provide aliases for common operation for project business flow. Implemented Partitioning, Bucketing in Hive for better organization of the data. Created few Hive UDF's to as well to hide or abstract complex repetitive rules. Performed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables. Scheduled different Snowflake jobs using NiFi. Used NiFi to ping snowflake to keep Client Session alive. Developed bash scripts to bring log files from FTP server and then processing it to load into Hive tables. Developed MapReduce programs for applying business rules on the data. Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafka broker. Python and handle proper closing and waiting stages as well. Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.Environment: Hadoop HDP, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, ApacheSpark, Git, SQL, LinuxClient- Wipro, India Nov 2015 Oct 2017Data EngineerResponsibilities: Used Django Framework for developing web applications using model view control architecture.
Build the python coding for data modelling and presented to multiple teams across the company. Implemented configuration changes for data models, worked on the data management and presented the reports to various teams for the business analysis. Created a script using VBA and Macros in Excel 2010 to automate the merging of tables in the different worksheets for comparison of various metrics in complaints data.
Implemented VBA and Macro code to automate the creation of pivot tables and do VLOOKUPS in Excel.
Implemented custom error handling in Talend jobs and worked on different methods of logging. Created an interactive dashboard application using R and Shiny to evaluate the quality measures using six-sigma charts using the complaint data metrics.
Successfully automated the task of data collection from Track Wise to the backend database of MS SQL Server 2010.
Used ETL to develop functions for extracting, transforming, cleaning, loading data from various sources to a different destination.
Used Python, Tableau, and Excel to analyze the number of products per customer and sales in a category for sales optimization.
Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs. Visualized data by using advanced tableau functions like action filters, context filters, and Level-of-Detail (LOD) expressions. Managed, developed, and designed a dashboard control panel for customers and Administrators using Django.Environment: Python 2.7, .NET, R programming, PyQuery, MVW, HTML5, Shell Scripting, JSON, Apache Web Server, SQL, UNIX, Windows, and Python libraries, VBA, ExcelEDUCATIONAL DETAILS: Bachelor of Technology, Major: Computer Science and Engineering, JNUTK, INDIA. |