Data Engineer Resume Cincinnati, OH

Data Engineer Resume Cincinnati, OH
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data engineer
Target Location	US-OH-Cincinnati
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Data Engineer Intern Cincinnati, OH

Data Engineer/ ( Azure, AWS, Python, SQL, ETL, Tableau, Snowflak Fairborn, OH

Electrical Engineering Data Analyst Monroe, OH

Qa Engineer Data Analyst Mason, OH

Deep Learning Data Engineer Cincinnati, OH

Data Engineering Engineer Cincinnati, OH

Click here or scroll down to respond to this candidate

Candidate's Name
Sr. Data EngineerContact: PHONE NUMBER AVAILABLEEmail: EMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLEProfessional Summary:? Around 10 years of extensive IT experience with multinational clients, including Hadoop related architecture experiencedeveloping Big Data / Hadoop applications.? Hands on Hadoop stack knowledge (MapReduce, Spark, HDFS, Sqoop, Pig, Hive, HBase, Flume, Oozie and Zookeeper).? Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricksand Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to AzureData lake store using Azure Data factory.? Well versed in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop,MapR & Cloudera.? Expertise in designing and implementing Snowflake data warehouses, including database objects, schemas, tables, views,and stored procedures.? Expertise proven in Big Data analytics using MapReduce, Spark, Hive and Pig.? Familiar with real time analytics on NoSQL databases such as HBase.? Hands-on experience of importing and exporting data from Relational databases to HDFS, Hive and HBase usingSqoop/Spark.? Experienced in ETL Tools like Informatica, Datastage, Azure Data Factory.? Good understanding of Data Modeling (Dimensional and Relational) concepts like Star-Schema Modeling, SnowflakeSchema Modeling, Fact and Dimension Tables.? Large amounts of data sets analyzed writing Pig scripts and Hive queries.? Experienced in writing MapReduce programs & UDFs for both Hive & Spark in Java/Scala.? Experience configuring of Hadoop Ecosystem components: Hive, Spark, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.? Supported MapReduce Programs running on the cluster and wrote custom MapReduce Data Processing scripts inScala.? Experienced with build tools and continuous integrations like Jenkins.? Familiar in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance)using Waterfall and Agile methodologies.? Motivated team player with excellent communication skills, interpersonal skills, analytics and problem solving.Technical Skills:Big Data Technologies: Hadoop, MapReduce 2(YARN), Hive, Pig, Apache Spark, HDFS, Sqoop,Cloudera Manager, Kafka, Amazon EC2, Azure, GCPLanguages Scala, Python, REST, Java, XML, SQL, PL/SQL, HTML, Shell ScriptingJ2EE Technologies JDBC, JSP, Servlets, Servlet, AngularJS, Angular 2/4/5, ReactJs, Node.JSWeb Technologies JavaScript, HTML, jQueryWeb/App Servers Apache Tomcat, Weblogic, JBossWeb Services SOAP, WSDL, REST, SOADatabase HDFS, Oracle 9i, 10g & 11g, MySQL, DB2, HBaseOperating Systems Windows OS 7, 8, 10, UNIXMethodologies Agile Scrum, Waterfall, Design patternsProfessional Experience:Client: Fifth Third Bank, Cincinnati, Ohio Oct 23 - PresentRole: Sr. Data EngineerDescription: Fifth Third Bank (5/3 Bank), the principal subsidiary of Fifth Third Bancorp, is an American bank holding companyheadquartered in Cincinnati, Ohio. Fifth Third is one of the largest consumer banks in the Midwestern United States.Fifth Third'sclient base spans retail, small business, corporate, and investment clients.Responsibilities:? Proven track record in optimizing query performance and improving overall system efficiency in Snowflake.? Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.? Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETL to Google CloudPlatform (GCP) using cloud native tools such as BIG query, Cloud DataProc, Google Cloud Storage, Composer.? Led cross-functional teams in requirements analysis, solution design, and implementation phases, collaborating closely withstakeholders to deliver ETL solutions aligned with business objectives and technical requirements.? Leveraged Spark's distributed computing capabilities to parallelize data transformations and accelerate processing speeds,enhancing the efficiency and throughput of ETL workflows in AWS ecosystem.? Implemented robust error handling mechanisms and data quality checks within Spark-based ETL processes, ensuring dataintegrity and reliability across Snowflake data warehouse and downstream systems.? Integrated Spark-based ETL pipelines with diverse data sources and destinations on AWS, including Amazon S3, AmazonRedshift, and relational databases, facilitating seamless data integration and synchronization.? Orchestrated performance tuning initiatives to optimize Spark job execution and resource utilization, employing techniquessuch as partitioning, caching, and data skew handling to improve query performance and reduce processing times.? Mentored and coached team members on Spark programming best practices and AWS cloud services, fosteringknowledge sharing and skill development in ETL development and big data technologies.? Collaborated with architecture teams to define data models, schema designs, and data governance policies tailored toSpark-based ETL solutions deployed on AWS, ensuring adherence to enterprise standards and compliance requirements.? Import raw data such as csv, json files into Azure Data Lake Gen2 to perform data ingestion by writing PySpark toextract flat files.? Design and development of jobs using DataStage Designer to load data from different heterogeneous sourcefiles to target databases.? Created BigQuery authorized views for row level security or exposing the data to other teams.? Created DataStage routines to capture counts from source and target to fulfill audit requirements for ETL jobs? Involved in designing, developing and documenting the ETL (Extract, Transformation and Load) strategy to populate thedata from various source systems feeds using Informatica.? Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate and reformat into parquetfiles and load them into Azure Blob storage container.Environment: Apache Spark, Apache Hadoop, Snowflake, Apache Kafka, MySQL, Apache Flink, Apache Airflow, GCP,DBT, Cassandra, Teradata, Talend, Ambari, PowerBI, AWS, DataStage, Apache NiFi, ETL, Informatica, QlikView, QlikSense, Tableau, Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Python, Git, Informatica PowerCenter, PySpark, Airflow,Hive, HBase, Airflow.Client: CommonSpirit Health, Chicago, IL. Sep 22 - Oct 23Role: Sr. Data EngineerDescription: CommonSpirit Health was created in early 2019 when Catholic Health Initiatives and Dignity Health came together asone ministry. Drawing on our combined resources, CommonSpirit is dedicated to building healthy communities, advocating for thepoor and vulnerable, and innovating how and where healing can happen both inside our hospitals and out in the community.Responsibilities:? Participated in weekly release meetings with Technology stakeholders to identify and mitigate potential risks associatedwith the releases.? Strong understanding of Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes. Experience inintegrating data from various sources into Snowflake.? Proven track record in optimizing query performance and improving overall system efficiency in Snowflake.? Create Spark Clusters and manage the all-purpose clusters and job clusters in Databricks running and hosting in Azurecloud service.? Mount Azure Data Lake containers to Databricks and create service principals, access keys, tokens to access Azure Data LakeGen2 storage account.? Very keen in knowing the newer techno stack that Google Cloud platform (GCP) adds.? Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.? Performed the Data modeling to develop the mappings accordingly and involved with the Data Modelling team andprovided suggestions in creating the data model.? Created DataStage routines to capture counts from source and target to fulfill audit requirements for ETL jobs? Involved in designing, developing and documenting the ETL (Extract, Transformation and Load) strategy to populate thedata from various source systems feeds using Informatica.? Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate and reformat into parquetfiles and load them into Azure Blob storage container.? Develop Azure linked services to construct connections with on-premises Oracle Database, SQL Server, Apache Hivewith Azure datasets in the cloud.? Connected the data bricks notebooks with Airflow to schedule and monitor the ETL process.? Train NLP Question & Answering models using BERT Transfer Learning to answer domain questions; expedite Name-EntityRecognition process.? Launch NLP dashboards utilizing Dash, Plotly, PowerBIand maintain them in the server; save 20% of the time in Sprintreview meetings.? Provided user management and support by administering epics, user stories, tasks in Jira using Agile methodology, loggedprocess flow documents in Confluence? Experience in integrating Snowflake with external systems using Snowflake connectors and APIs.? Stayed updated with the latest Snowflake features and best practices, actively exploring opportunities forinnovation and process improvement.Environment: Azure HDInsight, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, Snowflake, Cassandra,Teradata, Ambari, PowerBI, Azure, DataStage, Blob Storage, ETL, Informatica, Data Factory, Data Storage Explorer,Scala, Hadoop 2.x (HDFS, MapReduce, Yarn), Spark, Git, PySpark, Airflow, Hive, HBase, GCP Cloud Storage, BigQuery, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub.Client: Credit Suisse, Morrisville, NC. April 21 ? Aug 22Role: Sr. Big Data EngineerDescription: Credit Suisse Group AG is a global investment bank and financial services firm founded and based in Switzerland.Headquartered in Z?rich, it maintains offices in all major financial centers around the world and is one of the nine global "BulgeBracket" banks providing services in investment banking, private banking, asset management, and shared services.Responsibilities:? Collaborated with various teams & management to understand the requirement & design the complete system. Implementedthe complete Big data pipeline with Batch and real-time processing.? Worked with Sqoop to ingest & retrieve data from various RDBMS like Oracle DB & MySQL.? Designed and developed Snowflake data warehouse solutions for clients, ensuring optimal performance andscalability.? Implemented data ingestion pipelines using Snowflake's built-in features and external ETL tools like Informatica and Talend.? Created and maintained database objects such as schemas, tables, views, and stored procedures in Snowflake.? Designed Data Quality Framework to perform schema validation and data profiling on Spark (PySpark).? Automation performed using Pig, HQL, Shell Scripts and Python for Ingestion and Consumption.? Used tools like Data Meer to validate HBase tables.? Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out ofHadoop using PySpark and shell scripting.? Worked on Azure deployment of Hadoop clusters and Environment.? Used Spark and Spark-SQL to read the ORC data and create the tables in hive using the Scala API.? Implemented Spark using Scala and Spark SQL for faster testing and processing of data.? Worked on Teradata, EDW and Hadoop technologies (Teradata, DB2, Hive, Sqoop, Phoenix, HBase).? Worked on SQL scripts, and Automation Framework for DDA, Cards and Bill pay data.? Built Automation Framework for Hive tables to validate Source (text files, fixed width, xml, Json, ORC format files) to Target(Hive tables, Phoenix, Json and Xml files) Testing and generated reports. Published the report in the project Dashboard.? Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, DataFactory, Data Lake, Azure Monitoring, Key Vault, Function app and Event Hubs.? Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.? Good experience in tracking and logging end to end software application build using Azure DevOps.? Used Terraform script for deploying the applications for higher environments.? Responsible for estimating the cluster size, monitoring and troubleshooting of the spark Databricks cluster.? Extract, transform and load the data from the source systems to Azure data storage services with a combination of AzureData Factory, T-SQL and Spark Sql and processing the data in DataBricks.? Verifying the data in the target database by ETL process.? Assisted in migrating data from legacy systems to Snowflake, ensuring data integrity and minimal disruption.? Helped various teams to deploy the application on GCP, RDS, IBM soft layer and Microsoft Azure.? Verified column mapping between source and target and Metadata testing.? Created PySpark scripts for the processing and presenting the data to the target databases/directories(Ex: Sybase, MS SQL Server, HDFS and Hive).? Involved in Test Plan, Test Strategy, Test case preparation, loading of Test cases in ALM and raised defects and participatedin Triage calls and got Sign Off from stakeholders.? Validated the files in Hadoop S3 filesystem after cloud migration. Hadoop API calls, Metadata Store Contract tests, runningexisting S3A unit and integration tests with S3Guard enabled, source to target, data validation etc.? Created schema in Hive with performance optimization using bucketing & partitioning.? Created Oozie workflow jobs for scheduling queries & actions.Environment: Sqoop, RDBMS, Oracle DB, MySQL, HBase, Talend, PySpark, Azure, Scala, Hadoop, ETL, Teradata, EDW, DB2,Hive, Sqoop, Phoenix, HBase, SQL, Informatica, Data Warehouse, Hive, Xml, Json, ORC, ALM, Oozie, Snowflake, Json, Metadata,Oozie Google Big Query,GCP, Cloud SQL, Cloud Functions, Cloud Pub/Sub.Client: BNSF Railways, Fort Worth, TX Nov 19 ? Mar 21Role: Data EngineerDescription: BNSF Railway Company provides railroad transportation services. The Company provides freight transportationservices to industrial and commercial clients, including transportation of consumer, coal, industrial, and agricultural products.Responsibilities:? Followed agile methodology and used Rally to maintain user stories.? Installed, configured and maintained Hadoop MapR 5.2.1 Distribution? Developed different Scala APIs which work on top of Spark and Hadoop Ecosystem.? Executed the program by using python API written in python to support Apache Spark or PySpark.? Created a Lambda Deployment function, and configured it to receive events from your S3 bucket? Developed Business Logic using Python on Django Web Framework.? Written extensive Spark/Scala programming using Data Frames, Data Sets & RDD's for transforming transnational databasedata and load it into hive/HBase tables.? Involved in helping the UNIX and Splunk administrators to deploy Splunk across the UNIX and windowsenvironment.? Build ETL data pipelines in Azure Data Factory (ADF)to manage and process>1B+rows into Azure SQL DW.? Configured Input & Output bindings of Azure Function with Azure Cosmos DB collection to read and write data from thecontainer whenever the function executes.? Importing and exporting data in HDFS and Hive from RDBMS using Sqoop.? Handled Blob/Clob data types in Hive/Spark.? Implemented Kafka for broadcasting the logs generated using spark streaming.? Used Restful API with JSON, XML to extract Network traffic information.? Worked on Apache Drill version 1.6 & Spark SQL for querying for a better performance.? Implemented RESTful web services in JSON format to query from web browsers.? Involved in loading data into HBase from Hive tables to see the performance.? Developed views and templates with Python and Django view controllers and templating language to create a user-friendlywebsite interface.? Created and maintained ETL processes for data extraction, transformation, and loading into Snowflake.? Analyzed the SQL scripts and designed the solution to implement using PySpark.? Involved in creating Hive tables, loading data as text, parquet and orc to write into hive queries.? Written a custom MapReduce program for merging the data in incremental Sqoop.? Responsible for loading data from the UNIX file system to HDFS.? Created UNIX shell scripts for parameterizing the Sqoop and hive jobs.? Worked in production deployment and post production support in the team.? Used Maven as the build tool and GIT for code management.? Has experience working with offshore teams.? Worked closely with the BRT and QA team to fix the issues.Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Scala, MapR, Java, Oozie, Flume, HBase, Hue,Hortonworks, Snowflake, ETL, PySpark, Azure, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5, windows, Splunkknowledge objects, Amazon Web Services, Google Cloud, Amazon Lambda, Jenkins, Java/J2EE, Gcp, Bigquery.Client: T- Mobile, Washington, DC. Jun 18 ? Dec 19Role: Data EngineerDescription: T-Mobile, Inc. is a leading telecommunications corporation and provides communications and digital entertainmentservices in the United States and the world. The project involves examining customer information, improves customer?s experience,and wants to provide better and feasible alternatives as per the ongoing Marketing Strategy. As part of the team I am involved in datarelated to customer's reviews, suggestions and their inputs collectively grouped together on regular intervals.Responsibilities:? Understand the business needs and objectives of the system and interact with the end client/users and gatherrequirements for the integrated system.? Worked as a Source System Analyst to understand various source systems by interacting with each source SME?s.? Analyzed clickstream data from Google analytics with Big Query.? Deep experience in developing data processing tasks using Pyspark such as reading data from external sources, mergedata, perform data enrichment and load into target data destinations.? Written Python utilities and scripts to automate tasks in AWS using boto3 and AWS SDK. Automated backups using AWSSDK (boto3) to transfer data into S3 buckets.? Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.? Worked in API group running Jenkins in a Docker container with RDS, GCP slaves in Amazon AWS? Use of API, Open NLP & Stanford NLP for Natural Language Processing and sentiment analysis.? Integrated application with 3rd party APIs (Google, Facebook, Stripe, PayPal, Google?s Natural language API).? Creating the API with the Serverless framework, in Python 3.6.? Implemented security measures, including Azure Key Vault and Azure Managed Identity, to protect sensitive data andcredentials within the Azure environment.? Optimized Azure cost management through the use of Azure Advisor and Azure Cost Management + Billing, resulting in a[mention cost reduction percentage] reduction in cloud spending.? Implemented incremental load approach in spark for huge amounts of data tables.? Used RESTful API with JSON for extracting Network traffic/Memory performance information.? Contribute to new and existing projects using Python, Django &GraphQL with deployment to the cloud (AWS).? Using Amazon Web Services (AWS) for storage and processing of data in the cloud.? Created Incremental eligibility document and developed code for Initial load process.? Extracted data from Teradata database and loaded into Data warehouse using spark Jdbc.? Performed Transformations and Actions using Spark for improving the performance.? Load the transformation data into Hive/Save As Table in spark.? Using Build tools like Maven to build projects.? Extensive experience on Unit testing by creating Test Cases.? Using Kafka, Spark Streaming for streaming purposes.? Experience in Development Methodologies like Agile, Waterfall.? Experience in code repositories like GitHub.Environment: Apache Spark, Scala, Eclipse, HBase, Talend, Data Warehouse, Python, PySpark, Hortonworks, SparkSQL, Hive,Teradata, Hue, SparkCore, Linux, GitHub, AWS, JSON.Client: Raymond James Clearwater, Florida Oct 16 ? July 18Role: Data EngineerDescription: Built Hadoop cluster ensuring High availability for Name Node, mixed-workload management, performanceoptimization, and backup and recovery across one or more nodes. Customer Sentiment rating is an important measure that helps anyorganization to know the actual response of the customer to the product or service they are being provided.Responsibilities:? Involved in Requirements Analysis and design of an Object-oriented domain model.? Implemented test scripts to support test driven development and continuous integration.? Designed and implemented an end-to-end data platform on Azure, leveraging services such as Azure Data Factory, AzureDatabricks, and Azure Synapse Analytics for data ingestion, processing, and visualization.? Orchestrated data workflows and ETL pipelines using Azure Data Factory, ensuring timely and accurate data delivery todownstream applications and stakeholders.Created database using MySQL, wrote several queries and Django API to extractdata from database.? Write scripts using python modules and its libraries to develop programs that improve processing of access requests.? Exploring with Spark for improving the performance and optimization of the existing algorithms.? Perform Enterprise Linux tasks as they pertain to supporting the Splunk application.? in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD.? Involved in writing queries in SparkSQL using Scala. Worked with Splunk to analyze and visualize data.? Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and runcustom functions.? Worked in complete SDLC phase like Requirements, Specification, Design, Implementation and Testing.? Developed the mechanism for logging and debugging with Log4j Involved in developing database tractions through JDBCUsed GIT for version control.? Created RESTful services like Drop wizard framework for various web-services involving both JSON and XML.? Used oracle as Database and used load for queries execution and involved in writing SQL scripts, SQL code for proceduresand functions.? Hands-on experience in exporting the results into relational databases using Sqoop for visualization and to generate reportsfor BI team.? Involvement in post-production support, Testing and used JUNIT for unit testing of the module? Worked in Agile methodology.Environment: Hadoop, HDFS, MapReduce, Spark, Python, AWS, Scala, Hive, Sqoop, Kafka, Oracle, ETL, UNIX, MapR, REST,Tableau, Eclipse, Bit bucket.Company: Snipe IT Solutions, India. Aug 14 ? Sept 16Role: Java DeveloperDescription: This project enables the dealers to provide service warranty to the end customers. The end customers can buy thewarranty from one dealer and can utilize the warranty service at any other dealer. The IT system for this acts as the centralized system.It aids the IT systems of the dealers to generate the invoice to the end customers for service repairs.Responsibilities:? Analyzing the Business Requirements and System specifications to understand the application.? Hands-on experience with Hadoop/ Spark Distribution ? Hive, HBASE, Oozie, Cloudera, Hortonworks. The system is a fullmicro services architecture written in Python utilizing distributed.? message passing via Kafka with JSON as data exchange formats.? Experience in Angular 4 with implementing typescript, components, two-way data binding, services, dependency injection,directives, pipes, routing for the User Interface (UI).? Implemented best practices by developing Angular applications in modular fashion using core, shared and featuremodules.? Involved in coding & unit testing the new codes.? Prepare Test Plan & Test data.? Implemented Migration using ngUpgrade Module which allows AngularJS and Angular 2.0 application to co-existand upgrade application part by part working harmoniously.? Used Angular CLI to manage the project, developed custom modules such as app-routing module, pipes to make theangular application more conceivable.? Deployed web modules in Tomcat web server.? Testing the code changes at functional and system level.? Conduct quality review of design documents, code and test plans.? Ensure availability of document/code for review.? Conduct quality reviews of testing.? Modifications required to the code to prevent problems from occurring in future (Preventive Maintenance).? Involved in presenting induction to the new joiner?s in the project.Environment: Java, Maven, UNIX, Eclipse, SOAP UI, WINSCP, Tomcat,JSP, Quality Center, HTML5, CSS3, JavaScript, Angular4, TypeScript, Bootstrap.EDUCATION & CERTIFICATIONS:Masters in Information Systems, University of Missouri St Louis, USABachelors in Computer Science, Jawaharlal Nehru Technological University, IndiaCERTIFICATIONS:Microsoft Certified Azure FundamentalsDatabricks Certified Associate Developer for Apache Spark 2.4AWS Certified Database - Specialty

Respond to this candidate
Your Message
Please type the code shown in the image: