| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Senior AWS Data EngineerCell: PHONE NUMBER AVAILABLE | e-mail: EMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLEProfessional Summary Around 11 years of IT experience in Data engineering, Analysis, Design, development and Maintenance of various Business applications.
A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements. Hands on experience with Teradata, Hadoop HDFC, Hive, Amazon Web Services (AWS), Redshift, Databricks, Tableau and Pyspark for data process, storage and analysis. Experienced in using distributed computing architectures such as AWS, Hadoop, Python, and Spark. Hands-on experience with Hadoop Ecosystem, HDFS, YARN, Hive, Oozie, Kafka, and Spark, for data storage and analysis. Experience with AWS data services (e.g., S3, Glue, IAM, Athena, Redshift, Dynamo DB, Lambda, SQS, SNS, and RDS, Step Functions) and Experience different File Format (Parquet, AVRO, ORC etc.). Proficient in the AWS Redshift cloud data warehouse and the AWS S3 bucket for integrating and loading data from various source systems. Worked with Redshift sort key, distribution key, data compressions, constrains, smallest possible column size, verifying data files before and after load, performance tuning. Expertise in Core Python (Data Structures, OOPs, Data types, File Handling, Exception Handling, Generators, Iterators, Multi-Threading etc.). Responsible for Designing Logical and Physical data modelling for various data sources on Teradata, Hive, Redshift, Athena, Snowflake, Spectrum. Redshift spectrum external schema/tables creation for S3 data(Json, AVRO)on Redshift upon running instance and query S3 data from spectrum and load data into other fact and dimension tables, rather than using COPY command if data volumes are huge and semi-structured. Created Redshift Spectrum external schemas, tables with partitions using S3, Hive, Athena, Glue catalog and Redshift with different file formats like JSON, XML, CSV, ORC and Parquet. Experience in processing both structured and semi structured Data with the given file formats. ETL pipelines in and out of the data warehouse using a combination of Databricks, Python and writing complex SQL queries on fact and dimension tables against Hive, Redshift, Athena. Modify and develop new ETL programs, transformations, indexes, data staging areas, summary tables, and data quality. Proficient in data analysis create Athena data sources on S3 buckets, Redshift for adhoc querying and business dash boards using Tableau and Denedo reporting tools. Experience with deploying configuration management and CI/CD services such as (GIT, Jenkins, Docker, JIRA, Cloud Formation, Maven). Investigation, profiling, resolution of performance bottlenecks, query optimization and performance tuning of large scale databases using Views, Indexing mechanism, temporary tables and partitions. Familiarity with scheduling tools like Control M, Autosys UC4, CRON Tab and expert in source code management tools like Github, Gitlab and SVN. Excellent interpersonal and communication skills with the ability to collaborate with stakeholders at all levels. Highly motivated, adaptable to new technologies, and quick to learn. Strong problem-solving and analytical abilities, capable of working independently and as part of a team.CertificationsTeradata 14 Certified ProfessionalTeradata 14 SQLAWS certified developer associateTechnical Skills:AWS Cloud technologies : S3, Redshift, RDS, Athena, EC2, Glue, Airflow, Kinesis, SNS, SQS, MSKData Warehousing : Redshift, Hive, Teradata, DB2, SnowflakeProgramming Languages : Python, PySpark, SQL, HiveQLBig Data Technologies : HDFS, YARN, Sqoop, Spark, Hive, HBase, KafkaCloud Environments : Snowflake, Amazon Web Services, DatabricksVersion Control : GIT, Bit bucketVisualization Tools : Tableau, Denedo, ExcelETL/ELT : Informatica, AWS GlueScheduling Tools : Control M, Stone BranchProjectsProject #1Company : Infosys - TMNA (Toyota Motors North America) Aug 2023 Jul 2024Role : Sr. Spark - AWS DeveloperProject Title : NDW FDW ModernizationDomain : RetailRoles and Responsibilities: Participated in requirements gathering and actively involved in the developing the requirement s into technical specifications. Used various AWS services including S3, EC2, AWS Glue, Athena, RedShift, EMR, SNS, SQS, DMS, kinesis. Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3, Redshift and Athena. Used AWS Glue for transformations and AWS Lambda to automate the process. Created monitors, alarms, notifications (SNS) and logs for Lambda functions, Glue Jobs using CloudWatch. Implemented the Databricks scripts using Python-Spark and loaded the data into multiple layers of S3 (RAW, TRN, ING) and worked with multiple file formats like json, csv and Parquet. Built data pipelines to load the data from different sources into stage area (S3), after implemented dimensions and fact relation in Athena, Redshift and Spectrum. Created different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda and MSK. Create custom reports and dashboards using BI Tableau and Denedo to present data analysis and conclusions. Manage Amazon Web Services (AWS) infrastructure with orchestration tools such as Terraform and Jenkins Pipeline Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline. Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies. Performed unit testing / integration, AB testing on the data which is loaded into the various tables.Project #2Company : CGI - CIGNA May 2021 May 2023Role : Lead AWS Data EngineerProject Title : D&A LCF (Oscar)Domain : Healthcare InsuranceRoles and Responsibilities: Played a lead role in gathering requirements, analysis of entire system and providing estimation on development, testing efforts. Worked on AWS services like S3, EC2 IAM, Kafka Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue. Build Cloud data stores in S3 storage with logical layers built for Raw, Curated and transformed data management. Extracted data from multiple source systems S3, Redshift, Athena and created multiple tables/databases in Glue Catalog by creating Glue Crawlers. Created Kafka to capture and process the streaming data using lambda and then output into S3, Dynamo DB and Redshift for storage and analysis. Create Terraform scripts to automate deployment of EC2 Instance, S3, EFS, EBS, IAM Roles, Snapshots and Jenkins Server. Improvised a python sparak scripts that de-normalizes data from RDBMS to JSON, Parquet as part of the migration (ec2- sftp). Involved in S3 event notifications, an SNS topic, an SQS queue, and a Lambda function sending a message to the Slack channel. Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and Tableau. Used git version control, Control M and JIRA for project management, tracking issues and bugsProject #3Company : Cognizant - PayPal Jan 2019 Apr 2021Role : Sr. AWS Data engineerProject Title : PP_DIS (Data, Insights and strategy) Language Strategy
Domain : BankingRoles and Responsibilities: Responsible for creating Data Engineering solutions for Business related problems. Experience with Batch Processing using AWS Cloud technologies and responsible for maintaining a Hadoop cluster on AWS EMR. Performed end-to-end architecture and implementation evaluations of different AWS services such as EMR, Redshift, S3, Athena, Glue, and Kinesis. Used AWS Athena to import structured data from S3 Data Lake into other systems such as Redshift to generate reports. Used AWS services such as EC2 and S3 for dataset processing and storage. Developed and implemented ETL pipelines using Amazon Cloud technologies. created and deployed AWS Lambda functions to provide a server less data pipeline that can be written to glue Catalog and queried from Athena. Used Amazon glue studio for data integration and Data manipulation (margining, sorting, filtering and aggregating) in Redshift.
Additionally, experienced in deploying Java projects using git and Jenkins, Control M.Project #4Company : IBM - DBS (Development Bank of Singapore) Aug 2017 Dec 2018Role : AWS Data EngineerProject Title : BIP URDomain : BankingRoles and Responsibilities: Involved in Requirement gathering, business Analysis, Design, and Development and implementation of business rules. Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda function and configured it to receive events from your S3 bucket. Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.) Creating AWS Lambda functions using python for deployment management in AWS and designed and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure. Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function. Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog. Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue. Experience in writing SAM template to deploy serverless applications on AWS cloud. Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc. Designed and Developed ETL jobs in AWS GLUE to extract data from S3 objects and load it in data mart in Redshift. Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.Project #5Company : RWaltz Group Inc. - Walmart Labs Jan 2017 Jul 2017Role : Big Data - AWS Data EngineerProject Title : GIS-GEC (Global Orders-Orders 360)Domain : RetailRoles and Responsibilities: Responsible for provisioning key AWS Cloud services and configure them for scalability, flexibility, and cost optimization Used PYSpark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data. Implemented pyspark integration with AWS Databricks notebooks which reduce development work and also achieved performance Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Creating Hive tables and working on them using HiveQL. Designed and Implemented Partitioning (Static, Data stage, Dynamic), Buckets in HIVE. Hive tables were created on HDFS to store the data processed by Apache Spark on the Hadoop Cluster with different Text, Sequence, Avro, Orc, Parquet file format and Hive table Partitions. Understanding informatica complex graphs and plans for preparing functional and technical requirement documents.Project #6Company : RWaltz Group Inc. - AT&T Jun 2016 Dec 2016Role : Big Data Hadoop DeveloperProject Title : DPV (Data Patterns for visitors)Domain : TelecomRoles and Responsibilities: Involved in Requirement gathering, business Analysis, Design, and Development, testing and implementation of business rules. Writing scripts for data cleansing, data validation, data transformation for the data coming from different source systems to HDFS Creating Hive tables and working on them using HiveQL. Designed and Implemented Partitioning (Static, Data stage, Dynamic), Buckets in HIVE. Modification of views on Databases, Performance Tuning and Workload Management.
Used extensively Derived Tables, Volatile Table and GTT tables in many of the ETL scripts. Writing scripts for data cleansing, data validation, data transformation for the data coming from different source systems to HDFS. Ingesting the data from different databases like SQL, Oracle into Hadoop Data Lake using Sqoop. Writing Pig and Hive scripts to process the HDFS data. Transforming the data according to the data set before exporting the data from Hadoop. Writing Sqoop queries for exporting the data in Teradata and SQL server databases.Project #7Company : IBM - DBS (Development Bank Singapore) Dec 2015 May 2016Role : Hadoop Teradata DeveloperProject Title : BIP -URDomain : BankingRoles and Responsibilities: Ingesting the data from different databases like SQL, Teradata, DB2 into Hadoop Data Lake using Sqoop. Writing Hive scripts to process the HDFS data. Involved in the migration from Teradata to Hadoop and creating the tables and views in Hive. Import and export the data from Teradata to Hadoop using Sqoop. Worked in SQL and Performance tuning experience, Batch and distributed computing using ETL/ELT (Spark/SQL Server DWH/ Teradata etc.) Used Informatica PowerCenter created mappings and mapplets to transform the data according to the business rules Worked with Partition components like partition by key and partition by Expression. Efficient use of Multi files system, which comes under Data Parallelism. Tuning of Teradata SQL statements using Explain analysing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries Documents prepared and reports generated for all the working sectionsProject #8Company : Accel Frontline Dec 2012 Nov 2015Role : Teradata DeveloperProject Title : WDD (Enterprises data-DEV)
Domain : RetailRoles and Responsibilities: Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules. Developer scripts to load the data from multiple sources Teradata data Warehouse using utilites BTEQ, Fast Load, and Multi Load. Worked on Teradata query performance tuning via Explain plan, primary, secondary, partition Indexes, collect statistics volatile tables. Developed Informatica mappings to load the data from various sources using different transformations(Joiner, Filter, Update Strategy, Rank and Router )
Collected Multi-Column Statistics on all the non-indexed columns used during the join operations & all columns used in the residual conditions.Education:Bachelor of Technology in Computer Science and Engineering from JNTU-Hyderabad in 2012. |