Big Data Engineer Resume Milwaukee, WI

Big Data Engineer Resume Milwaukee, WI
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Big data engineer
Target Location	US-WI-milwaukee
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Elkhorn, WI

Software Engineer Data Entry Lake Zurich, IL

Account Executive Big Data Brookfield, WI

Test Engineer Automation Lake Zurich, IL

Systems Engineer Team Members Waukesha, WI

Devops Engineer Cloud Lake Zurich, IL

Click here or scroll down to respond to this candidate

Name: Candidate's Name
Email id: EMAIL AVAILABLEPhone: PHONE NUMBER AVAILABLELinkedIn: LINKEDIN LINK AVAILABLESenior Big Data EngineerProfessional Summary: Over 8+ years of IT experience in analyzing, designing, developing, implementing, maintaining, and supporting solutions, with a focus on strategic deployment of big data technologies to efficiently address complex processing requirements. Experienced in managing Hadoop clusters and services using Cloudera Manager. Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks. Good experience in Oozie Framework and Automating daily import jobs. Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce. Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop. Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka. Worked on reading multiple data formats on HDFS using Scala.
Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills. Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
Expertise with Big data on AWS cloud services i.e. EC2, S3, Auto Scaling, Glue, Lambda, Cloud Watch, Cloud Formation, Anthena, DynamoDB and RedShift Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.
Collected logs data from various sources and integrated in to HDFS using Flume. Good understanding of NoSQL Data bases and hands on work experience in writing applications on NoSQL data bases like Cassandra and Mongo DB. Good knowledge in querying data from Cassandra for searching grouping and sorting. Experience working with Cloudera, Amazon Web Services (AWS), Microsoft Azure and Hortonworks. Excellent experience in designing and developing Enterprise Applications for J2EE platform using Servlets, JSP, Struts, Spring, Hibernate and Web services. Created machine learning models with help of python and scikit-learn.
Creative skills in developing elegant solutions to challenges related to pipeline engineering Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services. Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, calculated fields, Sets, Groups, Parameters etc., in Tableau experience in working with Flume and NiFi for loading log files into Hadoop. Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drill and spark for data storage and analysis. Implemented various algorithms for analytics using Cassandra with Spark and Scala. Experienced in Creating Vizboards for data visualization in Platform for real - time dashboard on Hadoop. Designed and implemented a product search service using Apache Solar. Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data Lake Store, Azure Data Factory, Azure Databricks, and created POC in moving the data from flat files and SQL Server using U-SQL jobs. Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository. Experienced in running query - using Impala and used BI tools to run ad-hoc queries directly on Hadoop. Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming.
Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij, Putty, GIT.
Flexible working Operating Systems like UNIX/ Linux (Centos, Redhat, Ubuntu) and Windows Environments.Technical Skills:
Languages: Python, Scala, PL/SQL, SQL, T-SQL, UNIX, Shell Scripting Big Data Technologies: Hadoop, HDFS, Hive, Airflow, Pig, HBase, Sqoop, Flume, Yarn, Spark SQL, Kafka, Presto Operating Systems: Windows, Unix, Linux Business Intelligence Tools: SSIS, SSRS, SSAS Modeling Tools: IBM InfoSphere, SQL Power Architect, Oracle Designer, Erwin, ER/Studio, Sybase PowerDesigner Database Systems: Oracle, MS Access, Microsoft SQL Server, Teradata, PostgreSQL, Snowflake Cloud Platforms: AWS (Amazon Web Services), Microsoft Azure Reporting Tools: Business Objects, Crystal Reports Software & Tools: TOAD, MS Office, BTEQ, Teradata SQL Assistant ETL Tools: Pentaho, Informatica PowerCenter, SAP Business Objects XIR3.1/XIR2, Web Intelligence Additional Tools: TOAD, SQLPlus, SQLLoader, MS Project, MS Visio, C++, UNIX, PL/SQLProject Experience:
Client: AT&T, Middletown, NJ Feb 2024 till date Senior Big Data Engineer Used Pyspark for data frames, ETL, Data Mapping, Transformation and Loading in complex and high-volume environment Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines Create Spark code to process streaming data from Kafka cluster and load the data to staging area for processing. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau. Business applications and Data marts for reporting. Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements. Built performant, scalable ETL processes to load, cleanse and validate data Implemented business use case in Hadoop/Hive and visualized in Tableau Create data pipelines to use for business reports and process streaming data by using Kafka on premise cluster. Process the data from Kafka pipelines from topics and show the real time streaming in dashboards Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks. Worked on developing ETL Workflows on the data obtained using Scala for processing it in HDFS and HBase using Oozie Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development. Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience. Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell. Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
Implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB). Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management Experienced in ETL concepts, building ETL solutions and Data modeling
Worked on architecting the ETL transformation layers and writing spark jobs to do the processing. Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters Optimized the Tensor Flow Model for efficiency Compiled data from various sources to perform complex analysis for actionable results Measured Efficiency of Hadoop/Hive environment ensuring SLA is met Implemented Copy activity, Custom Azure Data Factory Pipeline Activities Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes. Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines Migration of on premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2). Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab. Collaborate with team members and stakeholders in design and development of data environment Preparing associated documentation for specifications, requirements, and testing.Environment: Kafka, Impala, Pyspark, Azure, HDInsight, Data factory, Databricks, Datalake, Apache Beam, Cloud Shell, Tableau, Cloud Sql, MySQL, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql, No Sql, MongoDB, TensorFlow, Jira.Client: VF Corp, Greensboro, NC Dec 2022 Feb 2024 Big Data EngineerResponsibilities:
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS. Used Apache NiFi to copy data from local file system to HDP. Designed both 3NF Data models and dimensional Data models using Star and Snowflake Schemas Handling message streaming data through Kafka to s3. Implementing python script for creating the AWS Cloud Formation template to build EMR cluster with instance types. Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP. Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries. Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, PostgreSQL, Data Frame, Open Shift, Talend, pair RDD's. Experience with deploying Hadoop in a VM and AWS Cloud as well as physical server environment Monitor Hadoop cluster connectivity and security and File system management. Data engineering using Spark, Python and Pyspark. Developed Mappings using Transformations like Expression, Filter, Joiner and Lookups for better data messaging and to migrate clean and consistent data. Data replacement and appending to the Hive database via pulling it using Sqoop processing tool to HDFS database from multiple data marts as source.
Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables visualization) Expertise in using Docker to run and deploy the applications in multiple containers like Docker Swarm and Docker Wave. Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories. Involved in Unit Testing the code and provided the feedback to the developers. Performed Unit Testing of the application by using NUnit. Created architecture stack blueprint for data access with NoSQL Database Cassandra; Brought data from various sources in to Hadoop and Cassandra using Kafka. Created multiple dashboards in tableau for multiple business needs. Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF s for Pig Latin Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloud watch. Worked on EMR clusters of AWS for processing Big Data across a Hadoop Cluster of virtual servers. Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer Developed a python script to transfer data, REST API s and extract data from on-premises to AWS S3. Implemented Micro Services based Cloud Architecture using Spring Boot. Worked on Docker containers snapshots, attaching to a running container, removing images, managing Directory structures and managing containers. Successfully implemented POC (Proof of Concept) in a Development Databases to validate the requirements and benchmarking the ETL loads. Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena. Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling usingaform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline. Used Pandas in Python for Data Cleansing and validating the source data. Deployed the Big Data Hadoop application using Talendon cloud AWS (Amazon Web Services) and also on Microsoft Azure. After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and Kafka. Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.Environment: Hortonworks, Hadoop, HDFS, AWS Glue, AWS Athena, EMR, Pig, Sqoop, Hive, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL, AWS, SQL Server, Tableau, Kafka.Client: US Bank, Madison, WI Apr 2021 Nov 2022Data EngineerResponsibilities: Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems Having experience in developing a data pipeline using Kafka to store data into HDFS. Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin. Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done. Integrated Azure Synapse with other Azure services such as Azure Data Lake Storage, Azure Data Factory, Azure Databricks, and Power BI to create end-to-end data solutions and enable seamless data workflows across the Azure ecosystem. Created numerous pipelines in Azure using Azure Data Factory and worked with different Databricks clusters, notebooks, jobs. Measured Efficiency of Hadoop/Hive environment ensuring SLA is met Optimized the Tensor Flow Model for efficiency Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes Worked with Azure Cloud, Azure Data Bricks Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, Azure PowerShell, Data bricks, Pyspark, Azure SQL Server, Azure Data Warehouse Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data Warehouse. Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin. Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Created various complex SSIS/ETL packages to Extract, Transform and Load data. Was responsible for ETL and data validation using SQL Server Integration Services. Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics. Implementing and Managing ETL solutions and automating operational processes. Collaborate with team members and stakeholders in design and development of data environment Preparing associated documentation for specifications, requirements, and testingEnvironment: Hadoop, Kafka, MS Azure, SQL Server, Erwin, Oracle 10g/11g, Informatica, RDS, NOSQL, Snow Flake Schema, MySQL, PostgreSQL, Tableau, Git Hub.Client: Emblem Health care, New York Dec 2018 Mar 2021Data EngineerResponsibilities: Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Used AWS Glue for the data transformation, validate and data cleansing. Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS. Created components like Hive UDFs for missing functionality in HIVE for analytics. Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various. Used different file formats like Text files, Sequence Files, Avro. Cluster co-ordination services through Zookeeper. Worked extensively with HIVE DDLs and Hive Query language (HQLs). Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems. Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS. Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS. Used Sqoop to import and export data from HDFS to RDBMS and vice-versa. Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports. Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries. Implemented SQOOP for large dataset transfer between Hadoop and RDBMs. Processed data into HDFS by developing solutions. Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data. Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.Environment: Hadoop, HDFS, Map Reduce, AWS, Hive, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.Client: Irus Infotech Pvt. Ltd, India Apr 2016 July 2018Data EngineerResponsibilities: Loaded data from MySQL server to the Hadoop clusters using the data ingestion tool Sqoop. Extensively worked with PySpark / Spark SQL for data cleansing and generating Data Frames and RDDs. Involved in creating Hive tables, loading with data and writing hive queries on top of data present in HDFS. Worked on tuning the performance Pig queries. Involved in Developing the Pig scripts for processing data. Written Hive queries to transform the data into tabular format and process the results using Hive Query Language. By using Apache Flume loaded real time unstructured data like xml data, log files into HDFS. Processed large amount both structured and unstructured data using Map Reduce framework. Designed solution to perform ETL tasks like data acquisition, data transformation, data cleaning and efficient data storage on HDFS Developed Spark code using Scala and Spark Streaming for faster testing and processing of data. Store the resultant processed data back into Hadoop Distributed File System. Applied machine learning algorithms (K- nearest Neighbors, random forest) using Spark MLib on top of HDFS data and compare the accuracy between the models. Used Tableau to get the visualizations on data outcome from the ML algorithms.Environment: Apache Sqoop, Apache Flume, Hadoop, MapReduce, Spark, Hive, pig, Spark MLib, Tableau

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: