Quantcast

Data Engineer Resume Powell, OH
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Data engineer
Target Location US-OH-Powell
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Data Engineer Senior Columbus, OH

Data Engineer Machine Learning Columbus, OH

Data Engineer Analytics Columbus, OH

Materials Engineering Data Entry Columbus, OH

Data Engineer Big Dublin, OH

Sr. AWS, GCP, Cloud & Data Engineer Pataskala, OH

Information Systems Engineering Columbus, OH

Click here or scroll down to respond to this candidate
NAME: SushmaEMAIL: EMAIL AVAILABLECONTACT: PHONE NUMBER AVAILABLELINKEDIN:SR. DATA ENGINEERBACKGROUND SUMMARY:IT Professional having more than 10+ years of experience with strong background in end-to-end enterprise Data Warehousing and Big Data Projects.Excellent hands-on requirement analysis, designing, developing, testing, and maintaining the complete data management & processing systems, process documentation and ETL technical and design documents.Experience in designing and building Data Management Lifecycle covering Data Ingestion, Data integration, Data consumption, Data delivery, and integration Reporting, Analytics, and System-System integration.Proficient in Big Data environment and Hands-on experience in utilizing Hadoop environment components for large-scale data processing including structured and semi-structured data.Strong experience with all phases including Requirement Analysis, Design, Coding, Testing, Support, and Documentation.Extensive experience with Azure cloud technologies like Azure Data Lake Storage, Azure Data Factory, Azure SQL, Azure Data Warehouse, Azure Synapse Analytical, Azure Analytical Services, Azure HDInsight, and DatabricksSolid Knowledge of AWS services like AWS EMR, Redshift, S3, EC2, and concepts, configuring the servers for auto-scaling and elastic load balancing.Hands on experience with GCP services including DataProc, VM, Big Query, Dataflow, Cloud functions, Pub/Sub, Composer, Secrets, Storage & Compute.Hands on experience with Databricks services including Notebooks, Delta Tables, SQL Endpoints, Unity Catalog, Secrets, Clusters.Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on-premises data & data processing pipelines to cloud including AWS, Azure & GCP.Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.Expertise in resolving production issues, hands-on experience in all phases of the software development Life cycle (SDLC).Aggregated Data through Kafka, HDFS, Hive, Scala, and Spark Streams in AWSWell versed with Bigdata on AWS cloud services i.e., EC2, EMR, S3, Glue, DynamoDB and RedShiftDeveloped, deployed, and managed event-driven and scheduled AWS Lambda functions to be triggered in response to events on various AWS sources including logging, monitoring, and security related events and to be invoked on scheduled basis to take backups.Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python Math plot library. Worked with Cloudera and Hortonworks distributions.Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization and reporting solutions that scales across massive volume of structured and unstructured Data.Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters including producing tables, graphs, listings using various procedures and tools such as Tableau and user-filters using Tableau.Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement. Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.Experience in Designing end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and ML Studio.Knowledge of working with Proof of Concepts (PoC's), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and TeradataExperience in developing customized UDF s in Python to extend Hive and Pig Latin functionality.Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management (MDM) and Configuration Management.Scheduled jobs in Databricks using Databricks workflows.Used Spark Streaming APIs to perform transformations on the fly to build a common data model which gets the data from confluent Kafka in real time and persist it to snowflake.Proficiency in SQL across several dialects (MySQL, PostgreSQL, SQL Server, and Oracle).Experience in working with NoSQL databases like HBase, DynamoDB.Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and SparkSQL to manipulate Data Frames in Scala.Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.Developed transformations logic using Snow Pipe. Hands on experience working with snowflake utilities such as Snow SQL and Snow Pipe.Solid experience and understanding of designing and operationalization of large-scale data and analytics solutions on Snowflake Data Warehouse.Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirementDeveloped ETL pipelines in and out of data warehouse using combination of Python and Snowsql.Experience in working with Flume and NiFi for loading log files into Hadoop.Experience in Creating Teradata SQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQLfor creating tables, views, indexes, stored procedures, and functions.Knowledge and experience with Continuous Integration and Continuous Deployment using containerization technologies like Docker and Jenkins.Excellent working experience in Agile/Scrum development and Waterfall project execution methodologies.TECHNICAL SKILLSBig Data Ecosystem:HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMRMachine Learning:Classification Algorithms Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Na ve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.Cloud Technologies:AWS, Azure, Google cloud platform (GCP)IDE sIntelliJ:Eclipse, Spyder, JupyterEnsemble and StackingAveraged Ensembles Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, Auto ML - Scikit-Learn, ML jar, etc.Databases:Oracle 11g/10g/9i, MySQL, DB2, MS SQL Server, HBASEProgramming:Query Languages, Java, SQL, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SAS, R Programming, RStudio, PL/SQL, Linux shell scripts, Scala.Data Engineer:Big Data Tools / Cloud / Visualization / Other Tools Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera, Oozie, Airflow, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, GCP, Google Shell, Linux, PuTTY, Bash Shell, Unix, etc., Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design.PROFESSIONAL SUMMARYCLIENT: PNC Bank(Columbus, ohio)DURATION: Mar 2024 - PresentROLE: Data EngineerResponsibilities:Worked with business/user groups for gathering the requirements and working on the creation and development of pipelinesParticipate in technical planning & requirements gathering phases including design, coding, testing, and documenting big data-oriented software applications.Worked on creating S3 file systems via code pipelines, creating and using IAM rolesCreating and using AWS transfer and Storage Gateway mechanism.Exprienced in building and using CloudFormation templatesCreated and Setting up and maintaining EC2 instances and Lambda services and Kubernetes clusters and maintenance.Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hash tags.Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.Used Apache Flink for Data Processing like batch processing of the data in the pipeline administration and web scrapping.Experienced with data visualization tools like Power BI, Tableau.Used Power BI, Power Pivot to develop data analysis prototype, and used Power Views and Power maps to visualize reports.Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and CassandraCollected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.Implement efficient data workflows, data pipelines, and ETL processes to accommodate structured and unstructured data from various sources to ensure the timely delivery of high-quality data.Creating and used the data architecture and data modeling on DatabricksCreating the CI/CD Pipelines Databricks and building Notebooks with complex code structures.Extensive experiencde in Databricks Data engineering Job Runs Data Ingestion and Delta Live Tables.Worked with setting up and maintaining SQL Warehouses SQL Editor and alerts.Good experienced in setting up and maintaining workspaces catalogs workflows and Compute and builded and maintained the Autoloader process.Experienced with purpose compute Job computes SQL Warehouse Vector search Pools and Policies.Experienced in crafting and maintaining Databricks schema objects Notebooks scheduler and Job clusters.Worked on optimization of Hive queries using best practices and right parameters and using Hadoop, YARN, Java, and Spark.Experience with performance tuning and optimization in Azure Synapse, including optimizing SQL queries and optimizing Spark jobs.Solid experience in the usage of Table ACL Row and Column Level Security With Unity CatalogShould be good in Data Ingestion Storage Harmonization and curationExperienced in performance and tuning to ensure jobs are running at efficient levels and no performance bottleneck.Experienced in data ingestion using various methods API Direct database reads AWS Cross account data and third party vendor data using FTP process etc.Worked with Tableau for generating reports and created Tableau dashboards, pie charts, and heat maps according to the business requirements.Migrated from relational databases to a streaming and big data architecture, including a complete overhaul of our data feeds.Defined streaming event data feeds required for real-time analytics and reportingLeveling up our platform, including enhancing our automation, test coverage, observability, alerting, and performance.Worked with all phases of Software Development Life Cycle and used Agile methodology for development.Environment: Java, SQL, Cassandra DB, AWS EC2, AWS S3, EMR, AWS Redshift, AWS Lamda, Spark, Databricks, ETL, SQL Server, Kafka, Informatica, Delta Lake, Stream Analytics, PowerShell, Apache Airflow, Hadoop, YARN, PySpark, Hive, Hue, Impala,Teradata, Sqoop, HDFS, Spark, Agile.CLIENT: TD Bank(Cherry Hill, New Jersey)DURATION: March 2022 - Feb 2024ROLE: Azure Data EngineerResponsibilities:Worked with business/user groups for gathering the requirements and working on the creation and development of pipelinesParticipate in technical planning & requirements gathering phases including design, coding, testing, and documenting big data-oriented software applications.Migrated applications from Cassandra DB to Azure Data Lake Storage Gen 2 using Azure Data Factory, created tables, and loading and analyzed data in the Azure cloud.Worked on creating Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure.Worked on developing the process and ingested the data in Azure cloud from web service and load it to Azure SQL DB.Experience in designing and developing data pipelines using Azure Synapse and related technologies.Understanding of Azure Synapse's integration with other Azure services, such as Azure Data Lake Storage and Azure Stream Analytics.Experience with performance tuning and optimization in Azure Synapse, including optimizing SQL queries and optimizing Spark jobs.Experience designing and developing data models using Azure data modeling tools such as Azure SQL Database, Azure Synapse Analytics, and Azure Cosmos DB.Worked with Spark applications in Java for developing the distributed environment to load high volume files using Scala spark with different schema into Data frames and process them to reload into Azure SQL DB tables.Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hash tags.Used Apache Flink for Data Processing like batch processing of the data in the pipeline administration and web scrapping.Designed and developed the pipelines using Databricks and automated the pipelines for the ETL processes and further maintenance of the workloads in the process.Worked on creating ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Worked with ETL operations in Azure Databricks by connecting to different relational databases using Kafka and used Informatica for creating, executing, and monitoring sessions and workflows.Worked on automating data ingestion into the Lakehouse and transformed the data, used Apache Spark for leveraging the data, and stored the data in Delta Lake.Ensured data quality and integrity of the data using Azure SQL Database and automated ETL deployment and operationalization.Used Databricks, Scala, and Spark for creating the data workflows and capturing the data from Delta tables in Delta Lakes.Performed Streaming of pipelines using Azure Event Hubs and Stream Analytics to analyze the data from the data-driven workflows.Worked with Delta Lakes for consistent unification of Streaming, processed the data, and worked on ACID transactions using Apache Spark.Worked with Azure Blob Storage and developed the framework for the implementation of the huge volume of data and the system files.Implemented of distributed stream processing platform with low latency and seamless integration, with data and analytics services inside and outside Azure to build your complete big data pipeline.Worked with PowerShell scripting for maintaining and configuring the data. Automated and validated the data using Apache Airflow.Worked on optimization of Hive queries using best practices and right parameters and using Hadoop, YARN, Java, and Spark.Used Sqoop to extract the data from Teradata into HDFS and export the patterns analyzed back to Teradata.Created DA specs and Mapping Data flow and provided the details to developer along with HLDs.Created Build definition and Release definition for Continuous Integration (CI) and Continuous Deployment (CD).Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share.Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks.Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters. Integrated Azure Active Directory authentication to every Cosmos DB request sent and demoed feature to Stakeholders.Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time. Perform ongoing monitoring, automation and refinement of data engineering solutions prepare complex SQL views, stored procedures in Azure SQL Datawarehouse and Hyperscale.Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub, and Service Bus Queue. Created Linked service to land the data from SFTP location to Azure Data Lake.Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.Created several Databricks Spark jobs with PySpark to perform several tables to table operations.Working with complex SQL, Stored Procedures, Triggers, and packages in large databases from various servers.Developed complex SQL queries using stored procedures, Common table expressions, temporary table to support Power BI reports.Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.Used Accumulators and Broadcast variables to tune the Spark applications and to monitor the created analytics and jobs.Designed and model datasets with Power BI desktop based on the measures and dimensions requested by customers and dashboard needs.Used Power BI, Power Pivot to develop data analysis prototype, and used Power Views and Power maps to visualize reports.Tracked Hadoop cluster job performance and capacity planning and tuning Hadoop performance for high availability and Hadoop cluster recovery.Worked with Tableau for generating reports and created Tableau dashboards, pie charts, and heat maps according to the business requirements.Worked with all phases of Software Development Life Cycle and used Agile methodology for development.Environment: Java, SQL, Cassandra DB, Azure Data Lake Storage Gen 2, Azure Data Factory, Azure SQL DB, Spark, Databricks, SSIS, SQL Server, Kafka, Informatica, Apache Spark, Delta Lake, Azure Event Hubs, Stream Analytics, Azure Blob Storage, PowerShell, Apache Airflow, Hadoop, YARN, PySpark, Hive, Teradata, Sqoop, HDFS, Spark, Agile.CLIENT: CareFirst (Baltimore, Maryland)DURATION: JUN 2020- JAN 2022ROLE: Aws Data EngineerResponsibilities:Worked in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional workflow of information from source systems to destination systems.Utilizing analytical, statistical, and programming skills to collect, analyze and interpret large data sets to develop data-driven and technical solutions to difficult business problems using tools such as SQL, and Python.Worked on designing AWS EC2 instance architecture to meet high availability application architecture and security parameters.Created AWS S3 buckets and managed policies for S3 buckets and Utilized S3 buckets and Glacier for storage and backup.Worked on Hadoop cluster and data querying tools to store and retrieve data from the stored databases.Worked with different file formats like Parquet files and Impala using PySpark for accessing the data and performed Spark Streaming with RDDs and Data Frames.Performed the aggregation of log data from different servers and used them in downstream systems for analytics using Apache Kafka.Worked on designing and developing the SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files.Worked on Data Integration for extracting, transforming, and loading processes for the designed packages.Designed and deployed automated ETL workflows using AWS lambda, organized and cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon Redshift.Worked on developing an ETL Data pipeline using the data and loading it into the relational databases using Spark.Created ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Worked on designing and developing the SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files.Worked on Data Integration for extracting, transforming, and loading processes for the designed packages.Designed and deployed automated ETL workflows using AWS lambda, organized and cleansed the data in S3 buckets using AWS Glue, and processed the data using Amazon Redshift.Worked within the ETL architecture enhancements to increase the performance using query optimizer.Implemented the data that is extracted using Spark, Hive, and large data sets using HDFS.Worked on Streaming data transfer, data from different data sources into HDFS, No SQL databases.Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.Worked within the ETL architecture enhancements to increase the performance using query optimizer.Implemented the data that is extracted using Spark, Hive, and large data sets using HDFS.Worked on Streaming data transfer, data from different data sources into HDFS, No SQL databases.Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.Worked on scripting with Python in Spark for transforming the data from various files like Text files, CSV, and JSON.Loaded the data from different relational databases like MySQL and Teradata using Sqoop jobs.Worked on processing the data and testing using Spark SQL and on real-time processing by Spark Streaming and Kafka using Python.Scripted using Python and PowerShell for setting up baselines, branching, merging, and automation processes across the process using GIT.Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS RedshiftUsed various spark Transformations and Actions for cleansing the input data.Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.Enforced standards and best practices around data catalog, data governance efforts.Created DataStage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc.Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.Worked in building ETL pipeline for data ingestion, data transformation, data validation on cloud service AWS, working along with data steward under data compliance.Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG, LAMBDA.Used Pyspark for extract, filtering and transforming the Data in data pipelines.Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana.Used Data Build Tool for transformations in ETL process, AWS lambda, AWS SQSWorked on scheduling all jobs using Airflow scripts using python. Adding different tasks to DAG's and dependencies between the tasks.Documented data validation processes, test cases, and results, ensuring that they are easily accessible and understandable to stakeholders and team members.Conducted training sessions for team members on data validation techniques and best practices, resulting in improved data quality awareness and knowledge across the organization.Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.Created Unix Shell scripts to automate the data load processes to the target Data Warehouse.Worked with the implementation of the ETL architecture for enhancing the data and optimized workflows by building DAGs in Apache Airflow to schedule the ETL jobs and additional components in Apache Airflow like Pool, Executors, and multi-node functionality.Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers and Fuzzy.Worked on creating SSIS packages for Data Conversion using data conversion transformation and producing the advanced extensible reports using SQL Server Reporting Services.Environment: Python, SQL, AWS EC2, AWS S3 buckets, Hadoop, PySpark, AWS lambda, AWS lambda, AWS Glue, Amazon Redshift, Spark Streaming, Apache Kafka, SSIS, Informatica, ETL, Hive, HDFS, NoSQL, Talend, MySQL, Teradata, Sqoop, PowerShell, GIT, Apache Airflow.CLIENT: Clinical Health (New York City, New York)DURATION: March 2018- JAN 2020ROLE: Data EngineerResponsibilities:Participated in gathering the requirements from the business analysts and collected the raw data from the analysis team for the implementation and development processes.Involved in Migrating Objects using the custom ingestion framework from variety of sources such as Oracle, SAP/HANA, & TeradataPlanning and design of data warehouse in STAR schema. Designing structure of tables and documenting it.Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday and real-time data processing.Developed Python, Bash scripts to automate and provide Control flow.Develop Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.Worked extensively on AWS Components such as Elastic Map Reduce (EMR)Migrated the data from the Oracle databases to the AWS cloud by creating the instances in AWS RDS and replicating the instances and performing the cleaning of the data.Analyzed the data and applied transformations to the data frames created in the Amazon SQS queue using Spark SQL and loaded them into database tables.Worked on Amazon S3 for transforming the Spark Data Frames in S3 buckets and using Amazon S3 as a Data Lake to the data pipeline running on Spark and MapReduce.Developed logging functions in Scala and stored the pipelines using Amazon S3 buckets and built applications for collecting the data and queued the data using the Amazon SQS queue.Worked on developing an ETL Data pipeline using the data and loading it into the relational databases using Spark.Created ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Worked using Cloudera Distribution Platform for running nodes in a Hadoop Cluster and deployed AWS EMR persistent clusters and configured them.Working on creating the EMR Cluster and AWS SNS and subscribed them to AWS Lambda and alerts when the data reaches the Lake and polled the messages using AWS SQS from the S3.Executed Apache Airflow for authorizing, scheduling, and monitoring Data Pipelines, designed and automate ETL pipelines.Worked using Hive and Linux scripts to implement data integrity and quality checks in Hadoop.Used Python and Spark to ingest large amounts of data from multiple sources (AWS S3, Parquet, API) and performed ETL operations.Worked on the Spark Architecture and RDD for internal processes by involving and processing the data from Local files, HDFS, and RDBMS sources by creating RDD and optimizing for performance.Working on fetching data from various source systems such as Hive, Amazon S3, and AWS Kinesis.Spark Streaming gathers this data from AWS Kinesis in near real-time, performs the necessarytransformations and aggregation on the fly and persists the data in a NoSQL store to build Hbase.Worked on enhancing scripts of existing Python modules and writing APIs to load the processed data to HBase tables.Worked on continuous integration using the tools like Jenkins and used GIT for building and testing the process.Used Tableau and MS SQL Server to design and develop dashboards, workbooks, and reports.Worked with SDLC and development using Agile methodology, participated in daily scrum meetings, and spring planning.Environment: Python, SQL, Oracle, AWS RDS, Amazon SQS, Spark SQL, Amazon S3, AWS EMR, AWS Lambda, AWS SNS, MapReduce, Scala, ETL Data pipeline, SSIS, SQL Server, Cloudera, Hadoop, Apache Airflow, Jenkins, GIT, Hive, Linux, Spark, RDD, HDFS, AWS Kinesis, NoSQL, Tableau, Agile.CLIENT: CITI Bank (New York City, New York)DURATION: April 2016- JAN 2018ROLE: Data EngineerResponsibilities:Migrate data from on-premises to AWS storage buckets.Developed a python script to transfer data from on-premises to AWS S3.Developed a python script to hit REST API s and extract data to AWS S3.Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.Created YAML files for each data source and including glue table stack creation.Worked on a python script to extract data from Netezza databases and transfer it to AWS S3.Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS, Event Bridge, SNS).Created a Lambda Deployment function and configured it to receive events from S3 buckets.Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.Developed Mappings using Transformations like Expression, Filter, Joiner, and Lookups for better data messaging and to migrate clean and consistent data.Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.Created a Lambda Deployment function and configured it to receive events from S3 buckets.Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab.Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.Developed Mappings using Transformations like Expression, Filter, Joiner, and Lookups for better data messaging and to migrate clean and consistent data.Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.Used Sqoop to channel data from different sources of HDFS and RDBMS.Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and CassandraUsed Sqoop to channel data from different sources of HDFS and RDBMS.Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and CassandraCollected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.Used Apache NiFi to copy data from local file system to HDP.Worked on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modeling using Erwin.Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System (HDFS).Environment: BigData3.0, Hadoop3.0, Oracle12c, PL/SQL, Scala, Spark-SQL, PySpark, Python, kafka1.1, SAS, Azure SQL, MDM, Oozie4.3, SSIS, T-SQL, ETL, HDFS, Cosmos, Pig0.17, Sqoop1.4, MS Access.CLIENT: Capital One, (McLean Virginia)DURATION: Oct 2015 - March2016ROLE: Database DeveloperResponsibilities:Coordinated with the front-end design team to provide them with the necessary stored procedures and packages and the necessary insight into the data.Worked on SQL*Loader to load data from flat files obtained from various facilities every day.Created and modified several UNIX shell scripts according to the changing needs of the project and client requirements.Wrote Unix Shell Scripts to process the files on a daily basis like renaming the file, extracting date from the file, unzipping the file, and removing the junk characters from the file before loading them into the base tables.Involved in the continuous enhancements and fixing of production problems.Generated server-side PL/SQL scripts for data manipulation and validation and materialized views for remote instances.Developed PL/SQL triggers and master tables for automatic creation of primary keys.Created PL/SQL stored procedures, functions, and packages for moving the data from the staging area to the data mart.Created scripts to create new tables, views, and queries for new enhancements in the application using TOAD.Created indexes on the tables for faster retrieval of the data to enhance database performance.Involved in data loading using PL/SQL and SQL*Loader calling UNIX scripts to download and manipulate files.Performed SQL and PL/SQL tuning and Application tuning using various tools like EXPLAIN PLAN, SQL*TRACE, and auto trace.Extensively involved in using hints to direct the optimizer to choose an optimum query execution plan.Developed advanced PL/SQL packages, procedures, triggers, functions, Indexes, and Collections toimplement business logic using SQLNavigator. Generated server-side PL/SQL scripts for data manipulation and validation and materialized views for remote instances.Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing, and indexes for improved performance and efficiency. Involved in table redesigning with the implementation of Partitions Table and Partition Indexes to make the Database Faster and easier to maintain.Experience in Database Application Development, Query Optimization, Performance Tuning, and DBA solutions and implementation experience in complete System Development Life Cycle.Used SQL Server SSIS tool to build high-performance data integration solutions including extraction, transformation, and load packages for data warehousing. Extracted data from the XML file and loaded it into the database.Designed and developed Oracle forms & reports generating up to 60 reports.Performed modifications on existing form as per change request and maintained it.Used Crystal Reports to track logins, mouse overs, click-throughs, session durations, and demographical comparisons with SQL database of customer information.Worked on SQL*Loader to load data from flat files obtained from various facilities every day. Used standard packages like UTL_FILE, DMBS_SQL, and PL/SQL Collections and used BULK Binding.involved in writing database procedures, functions, and packages for Front End Module.Used principles of Normalization to improve the performance. Involved in ETL code using PL/SQL in order to meet requirements for Extract, transformation, cleansing, and loading of data from source to target data structures.Involved in the continuous enhancements and fixing of production problems. Designed, implemented, and tuned interfaces and batch jobs using PL/SQL.Involved in data replication and high availability design scenarios with Oracle Streams. Developed UNIX Shell scripts to automate repetitive database processes.Developed code fixes and enhancements for inclusion in future code releases and patches.Environment: PL/SQL, SQL, SSIS, Database, Oracle, TOAD, ETL, Crystal Reports, Unix ScriptsCLIENT: Accord Solutions, Hyderabad, IndiaDURATION: May 2010- Nov 2012ROLE: Data AnalystResponsibilities:Involved and gathered requirements from the business users and created technical specification documents.Designed the database tables and reviewed new report standards to ensure optimized performance under the new reporting service SSRS.Planned, designed, and documented the optimum usage of space requirements and distribution of space forthe Data Warehouse.Worked with MS SQL Server and managed the programs using MS SQL Server Setup. Designed anddeveloped packages for data warehousing and Data Migration projects using Integration services and SSIS.on MS SQL Server.Extracted the data from Oracle, Flat Files, transformed and implemented required Business Logic, and Loadedinto the target Data warehouse using SSIS.Created OLAP cubes on top of the data warehouse basing various fact and dimension tables for analysis.purposes using SQL Server Analysis Services.Worked with the setup and implementation of the reporting servers, written T-SQL queries, and Stored Procedures, and used them to build packages.Worked on modifying a variety of parameterized, drill down, click through, chart, matrix, sub reports using SSRS.using data from a variety of sources.Worked on scheduling the reports on capacity at a level of different servers to run daily and sent the results to the business users in the required format using Tableau.Designed and implemented Stored Procedures and Triggers for automating tasks.Managed all indexing, debugging, optimization, and performance tuning using T-SQL.Worked on creating and modifying SQL Joins, sub-queries, and other T-SQL and PL/SQL code to implement business rules.Environment: SQL, PL/SQL, T/SQL, XML, Informatica, Python, Tableau, OLAP, SSIS, SSRS, Excel, OLTP, Git.EDUCATIONAL DETAILS

Respond to this candidate
Your Message
Please type the code shown in the image:

Note: Responding to this resume will create an account on our partner site postjobfree.com
Register for Free on Jobvertise