Sr Data Engineer Resume Phoenix, AZ

Sr Data Engineer Resume Phoenix, AZ
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Sr Data Engineer
Target Location	US-AZ-Phoenix
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Azure Data Engineer Scottsdale, AZ

Data Engineer Phoenix, AZ

Data Engineer Sql Server Tempe, AZ

Senior Data Engineer Chandler, AZ

Data Engineer Azure Phoenix, AZ

Data Scientist/AI Engineer Tempe, AZ

Azure Data Engineer Phoenix, AZ

Click here or scroll down to respond to this candidate

Gugan
EMAIL AVAILABLEPHONE NUMBER AVAILABLESummary: Having 10 years of experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data and expertise working in a variety of industries including Banking and Healthcare. Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Data Governance (Collibra), IAM, DBT, Terraform, Informatica, Power Centre, Modeling (Feature Engineering, Model Evaluation) and Deployment. Equipped with experience in utilizing statistical techniques which include hypothesis testing, Principal Component Analysis (PCA), ANOVA, sampling distributions, chi - square tests, time-series analysis, discriminant analysis, Bayesian inference, multivariate analysis. Efficient in preprocessing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality Reduction techniques using Machine learning platforms like Python Data Science Packages (Scikit-Learn, Pandas, NumPy). Expertise in building various machine learning models using algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Support Vector Machines (SVM), Decision trees, KNN, K-means Clustering, Ensemble methods (Bagging, Gradient Boosting). Experience in Text mining, Topic modeling, Natural Language Processing (NLP), Content Classification, Sentiment analysis, Market Basket Analysis, Recommendation systems, Entity recognition etc. Experience in Practice Infrastructure as code (IaC) to develop automation routines and integration. Applied text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing. Expertise using NLP techniques (BOW, TF-IDF, Word2Vec) and toolkits such as NLTK, Genism, SpaCy. Experienced in tuning models using Grid Search, Randomized Grid Search, K-Fold Cross Validation. Strong Understanding with artificial neural networks, convolutional neural networks, and deep learning. Can work parallelly in both GCP and Azure Clouds coherently. Skilled in using statistical methods including exploratory data analysis, regression analysis regularized linear models, time-series analysis, cluster analysis, goodness of fit, Monte Carlo simulation, sampling, cross-validation, ANOVA, A/B testing, etc. Working experience in Natural Language Processing (NLP) and Deep understanding of Statistics/Linear Algebra/Calculus and various optimization algorithms like gradient descent. Familiar with key data science concepts (statistics, data visualization, Artificial Intelligence, machine learning, etc.). Experienced in Python, R, MATLAB, SAS, PySpark programming for statistic and quantitative analysis. Experience in building production quality and large-scale deployment of applications related to natural language processing and machine learning algorithms.Technical Skills:Data Sources: AWS snowflake, OBIEE, Postgres SQL, MS SQL Server, NoSQL, MongoDB, MySQL, HBase, Amazon Redshift, Teradata.Big Data Technologies: Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, Yarn, Apache Spark, Mahout, Spark MLIib.Google Platform: GBQ, GCP, Cloud Functions, Cloud pub/ Sub.
CONTAINERS: KUBERNETES, DOCKER, BAZEL, MESOS.Statistical Methods: Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.Machine Learning: Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA, LDA, Natural Language ProcessingDeep Learning: Artificial Neural Networks, Convolutional Neural Networks, RNN, Deep Learning on AWS, Keras API.Hadoop Ecosystem: Hadoop, Spark, MapReduce, Hive QL, HDFS, Sqoop, Pig LatinData Visualization: Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI, QlikView, D3.jsLanguages: Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, Go, SQL, MATLAB, Spark, Java, C#Operating Systems: UNIX Shell Scripting (via PuTTY client), Linux, Windows, Mac OSOther tools and technologies: TensorFlow, Keras, AWS ML, Azure ML studio, NLTK, HVR, SpaCy, Genism, MS Office Suite, Google Analytics, GitHub, AWS (EC2/S3/Redshift/EMR 4, 5, 6/Lambda/Snowflake)Education Details:B. Tech in Information Technology at Anna University, 2013 and Chennai, INDIAMaster's Degree in Analytics at Northeastern University, 2021 and Boston, MA, USAProfessional Experience:First Republic Bank, Boston MA Dec 2022 Till DateRole: Sr Data EngineerResponsibilities: Worked with data governance team on CCPA- California consumer privacy act, GDPR - General data protection regulation projects. Loaded the aggregated data into MongoDB for reporting on the dashboard. Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake. Worked closely with security teams by providing them the logs with respect to firewalls, VPC s and setting up rules in GCP for vulnerability.
Develop and monitor IBM DataStage Jobs using various Processing and Debug Stages. Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds. Involved in loading JSON datasets into MongoDB and validating the data using Mongo shell. Resolved all Control-M agent, server, EM and client communication issues. Used various AWS services including S3, EC2, AWS Glue, Athena, RedShift, EMR, SNS, SQS, DMS, Kinesis. Leverage Image, text and numeric data within Confidential using Gen AI tools to deliver results. Clustered the microservice based docker containers using Kubernetes. Integrate Ataccama with Collibra using MuleESB connector and publish DQ rule results on Collibra using REST API calls. Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery Establishing formal data stewardship programs that include training, policies, and regular audits to ensure data quality and compliance. Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from Google Big Query(GBQ).
Used Rabbit queues for reliable and asynchronous messaging exchange. Gen AI continues to evolve rapidly, offering exciting possibilities across various domains while also posing significant technical and ethical challenges.
Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes. Developed snippets of Java code which can run on Flink in the standalone mode on my local machine. Extensive experience working on HVR as a real-time database replication software. Ability to create table and load using HVR software and configure HVR for integration. Working as a data engineer with Flash EDW involves a blend of traditional data engineering skills and specialized knowledge of high-performance flash storage systems. Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake. Creating and maintaining scalable data pipeline and build out new integration. Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production. Engineered a data processing pipeline with GCP, API, CSS, API & Open Slide APIs to pre-process large-scale. Service Azure fabric Micro services to support mobile application. Database performance and SQL tuning includes table analysis, index analysis for schema tuning. Optimized the Informatica objects & SQL queries for better performance of the ETL loads. I created a data pipeline which ingest route data, mentor data and DA s attendance by creating a glue workflow leveraging CloudWatch, lambda and SNS. Created pipelines, data flows and complex data transformations and manipulations using ADF and Python, PySpark with Databricks. Work related to downloading Google BigQuery data into pandas or Spark data frames for advanced ETL capabilities. Created Database on InfluxDB also worked on Interface, created for Kafka also checked the measurements on Databases. Designed 3rd normal form target data model and mapped to logical model and involved in extensive DATA validation using ANSI SQL queries and back-end testing.
Loaded and fact tables, transformed large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts. Built ETL process using T-SQL completely and also using SSIS for different loads. Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation and aggregation technique, Knowledge of tools such as DBT, DataStage.Environment: Python, ML, AI, Postgres, GBQ, Big data, AWS snowflake, EMR, OBIEE, data Catalog, Alation, snow SQL, SQL tuning, NoSQL, Java, Json, Airflow, ADO, AWS EC2, S3, Athena, Kafka, SSIS, AWS lambda, IAM, GCP, Terraform, ANSI, Data Governance, Data migration, Collibra, Flash EDW, Bazel, Data stewardship, Informatica, IaC, AWS SQS, GRPC, React, SDK, Power Centre, Mongo DB, Rabbit, Aurora, Mainframe, DB2,ADF, Azure Fabric, Scala.CVS, Boston MA Aug 2020 Nov 2022Role: Data EngineerResponsibilities: Developed BIX Extract application in Python to ingest Pega (Complaint System) files to HDFS and configure Airflow DAGs to orchestrate ETL workflow. Developed the PySpark code for AWS Glue jobs and for EMR. Involved in various sectors of business, with In-depth knowledge of SDLC (System Development Life Cycle) with all phases of Agile - Scrum, & Waterfall. Developed the python script to automate the data cataloging in Alation data catalog tool.
Developed Map Reduce jobs using Java to process large data sets by fitting the problem into the Map Reduce programming paradigm. Developed Spark scripts by using Java, and Python shell commands as per the requirement. Orchestrated and migrated CI/CD processes using Cloud Formation and Terraform, packer Templates and Containerized the infrastructure using Docker, which was setup in OpenShift, AWS and VPCs. Used SOAP, Java Web Services for interacting with other clients. Built data pipelines to move data from source to destination scheduling by Airflow. Exposure to implementation and operations of data governance, data strategy, data management and solutions. Created GCP projects and migrated on-prem/AWS instances to GCP. Consumed Rabbit MQ messages using Spring Listeners. Developed spark job using Spark Data frames to flatten Json documents to flat file. Used Amazon EMR for MapReduce jobs and test locally using Jenkins. Worked on MongoDB schema/document modeling, querying, indexing and tuning. Worked on source control M tools like Tortoise SVN, CVS, IBM DataStage, Perforce, and GIT. Created pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks. Created Google BigQuery authorized views for row level security or exposing the data to other teams. Clusters using Kubernetes and worked on creating many pods, replication controllers, services, deployments, labels, health checks. Implemented GCP cloud solutions cloud SQL, storage bucket, cloud DNS, GKE Auto scaling groups in k8s cluster. Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks. Familiar with MongoDB write concern to avoid loss of data during system failures. Experience in writing the HTTP RESTful Web services and SOAP API in Golang. Using Informatica PowerCenter created mappings and mapplets to transform the data according to the business rules. Build, test, and packaged project with tool Bazel that's easy to adopt and extend the environment. Load the transformed data into the Flash EDW, ensuring efficient use of storage and processing capabilities. Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.) Monitoring API s and Database Clusters (Cassandra and Aerospike Database stacks) using New Relic, ELK (Elastic Search, Log Stash, Kibana) and Optimized performance by seeing logs using AWS Cloud Trail, and Metrics from Cloud Watch and X-Ray. Designed, build and managed ELT data pipeline, leveraging Airflow, python, dbt, Stitch Data and GCP solutions. Implemented incremental and differential Loads using SSIS and scheduled as daily jobs. Developed central and local flume framework for loading large log files into the Data Lake. Designed and implemented distributed systems with Apache Spark and Python/Scala. Created Python / SQL scripts, to transform Databricks notebooks from Redshift table into Snowflake S3 buckets.Environment: SOAP, REST APIs, SQL, SQL tuning, AWS, EMR, Data Catalog, ETL, Json, GBQ, Go, APIs, Big Data, Informatica ADF, Alation, Athena, Data Governance, Collibra, Data Ingestion, Data Stewardship, OBIEE, IAM, Scala, Mainframe, DB2, GCP, Terraform, Rabbit, Flash EDW, DBT, SSIS, Bazel, Power Centre, Postgre SQL, UNIX, PL/SQL, CI/CD, Power BI, IaC, Mongo DB, Glue, Matplotlib, ADO, RDS, ADF, DWH, ADLS, Docker, Kubernetes, PyHive, Keras, Java, React, SDK, DevOps, NoSQL- HBASE, Sqoop, Pig, MapReduce, Oozie, Spark Mllib.Amazon, India Aug 2018 Dec 2019Role: Data EngineerResponsibilities: Design & Develop batch processing solutions by using Data Factory and Azure Databricks Designed, developed, and implemented solutions with data warehouse, ETL, data analysis and BI reporting technologies. Identified, evaluated, and documented potential data sources in support of project requirements within the assigned departments as per agile methodology. Created Python / SQL scripts, to transform Databricks notebooks from Redshift table into Snowflake S3 buckets. Extensively worked on Data Services for migrating data from one database to another database. Implemented various performance optimization techniques such as caching, Push-down memory-intensive operations to the database server, etc. Working with developing customized UDF's in java to extend Hive and Pig Latin functionality. Involved in data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement and used Cassandra through Java services. Involved in Agile development methodology active member in scrum meetings. Involved in continuous integration and deployment (CI/CD) using DevOps tools like Looper, Concord. Involved in understanding and analyzing the Software development life cycle requirements of the project using Java.
Design and implement integrations from Collibra to other systems thru Collibra connect. Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores. Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Installed and configured Instana APM on linux machine and monitored Kubernetes cluster, EC2 Servers and Websites. Experience with automated build tools like Bazel to produce executable programs and libraries and assembling deployable packages. Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue. Implemented a CI/CD pipeline with Jenkins, GitHub, Nexus, Maven, and AWS AMI's. Worked with google data catalog and other google cloud API s for monitoring, query and billing related analysis for BigQuery usage. Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design. Created several Databricks Spark jobs with PySpark to perform several tables to table operations. Designed and Implement test environment on AWS. Engineered Python report processing data pipeline in AWS to lay a framework to migrate existing system off Cloudera and enable business users to create their own customer reports without support. Create and manage ETL (Extract, Transform, Load) pipelines to move data from various sources into the Flash EDW. Extracted data from multiple applications and data sources with the database integration using SSIS and Linked server ODBC connections. Built Jenkins pipeline to drive all microservices builds out to the Docker registry and then deployed to Kubernetes, created pods and managed using Kubernetes. Linked data lineage to data quality and business glossary work within the overall data governance program. Successfully Generated consumer group lags from Kafka using their API. Created S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS. Mentor and guide analyst on building purposeful analytics tables in DBT for cleaner schemas. Involved in working with large sets of big data in dealing with various security logs. Followed Agile & Scrum principles in developing. Involved in porting the existing on-premises Hive code migration to GCP (Google Cloud Platform) BiqQuery. Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive. Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse. Built a data pipeline and data applications to analyze email marketing campaigns, using Power Shell, SQL Azure, and Power BI. Built a dashboard using DOMO and Tableau to build various business and operational for Guest Emails to have better insights for the management. Supported current data processing and compliance initiative by creating technical and summary documentation. Participate in daily standups, bi-weekly scrums, and PI panning. The New Management Services is a SAFe (Agile) certified organization. Transferred data l from AWS S3 to AWS Redshift.Environment: Python, AWS S3, AWS Redshift, EMR, Athena, AWS Data Pipeline, Data migration, IAM, GCP, Java, Data Catalog, Scala, Data Governance, Aurora, Alation, Collibra, Kafka, Google Cloud computing, Terraform, SSIS, Big Data, Power BI, Spark, DevOps, CI/CD, IBM DB2, Mongo DB, Flash EDW, Airflow, SAP ECC, Spark ML, SQL, agile, Bazel, ELT, Kubernetes, S3, NoSQL, SQL DB, SQL AZURE, AWS.IGene DI and VFX, India June 2017 July 2018Data EngineerResponsibilities: Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export. Worked on Design, Development and Documentation of the ETL strategy to populate the data from the various source systems using Talend ETL tool into Data Warehouse. Devised PL/SQL Stored Procedures, Functions, Triggers, Views, and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance. Developed logistic regression models (using R programming and Python) to predict subscription response rate based on customer s variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc. Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business. Understanding the differences between each version of MongoDB and enable the new features in each release. Managed Kubernetes environment and deployed docker based applications using Spinnaker and Jenkins. Worked in establishing connectivity between VB applications and MS SQL Server by using ADO. Implemented FTP operations using Talend Studio to transfer files in between network folders as well as to FTP server using components like File Copy, TFile Archive, TFile Delete, Create Temporary File, FTP Delete, FTP Copy, FTP Rename, FTP Put, FTP Get etc. Design and develop spark job with Scala to implement end to end data pipeline for batch processing. Hands on experience in creating EMR cluster and developing the glue jobs. Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards. Installed Ranger in all environments for Second Level of security in Kafka Broker. Tracing connections to the Mongo database and monitoring utilization of resources for each process. Knowledge in Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views by integrating with other platforms. Worked with senior management to plan, define and clarify dashboard goals, objectives, and requirement. Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.Environment: Hadoop Ecosystem (HDFS), AWS, EMR, Talend, SQL, Scala, Tableau, Java, Mongo DB, Data migration, Hive, Sqoop, Kafka, Kubernetes, Impala, Spark, Unix Shell Scripting.Zoho, India Apr 2014 May 2017Data AnalystResponsibilities: Worked extensively in documenting the Source to Target Mapping documents with data transformation logic. Transformations of requirements into data structures, which can be used to efficiently store, manipulate, and retrieve information. Worked with different business users to develop Vision Document and Business Requirement Specifications by gathering Business and Functional Requirements. Experience supporting the full Software Development Life Cycle SDLC in Agile Scrum methodology. Documented the AS-IS Business Workflow adhering to UML standards. Comprehensively performed requirement gathering for enterprise reporting system-using Requisite Pro. Collaborate with data modelers, ETL developers in the creating the Data Functional Design documents. Ensure that models conform to established best practices including normalization rules and accommodate change in a cost-effective and timely manner. Wrote, tested, and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL. Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata Performed data integrity and balance sheet verification procedures, as well as contributing to process improvements in the product control function. Maintained a Traceability Matrix to ensure that all Functional Requirements are addressed at the Use Case Level as well as the Test Case Level. Performed Functional and GUI Testing to ensure that the user acceptance criteria are met. Created User training materials. Co-coordinated with the SME'S to make sure that all the Business Requirements are addressed in the application. Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ and Created new reports based on requirements. Responsible in Generating Weekly ad-hoc reports. Planned, coordinated, and monitored project levels of performance and activities to ensure project completion in time. Created Complex Teradata scripts to generate ad-hoc reports that supported and monitored Day to Day. Involved in Building a specific data-mart as part of a Business Objects Universe, which replaced the existing system of reporting that was based on exporting data sets from Teradata to Excel spreadsheets.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: