Quantcast

Data Engineer Big Resume Carbondale, IL
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Data Engineer Big
Target Location US-IL-Carbondale
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Azure Data Engineer Carbondale, IL

Data engineer Benton, MO

.Net Core Software Engineer Carbondale, IL

Data General Laborer Belleville, IL

Data Entry Customer Service Belleville, IL

Customer Service Data Entry Marion, IL

Office Manager Data Entry Paducah, KY

Click here or scroll down to respond to this candidate
Candidate's Name
Email: EMAIL AVAILABLEPH: PHONE NUMBER AVAILABLE Sr. Data EngineerProfessional Summary: Having around 10 years of professional expertise in Data-Driven approaches to solving diverse business problems using big data, Analytics, and implementation of parallel processing. Experience in Hadoop Ecosystems HDFS (Storage), Spark, Map Reduce (Processing), Hive, Pig, Sqoop, YARN, AWS, Job Tracker, Task Tracker, Name Node, Data Node, Apache, Cassandra, and MapReduce programming paradigm. Expertise in the AWS Stack, which includes AWS, SNOWFLAKE, C2, S3, LAMBDA, DATAPIPELINE, EMR, SNS, Cloud Watch, AWS-REDSHIFT, DMS, and ATHENA. Practical experience using the AWS family of services, including EC2, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, SES, and SQS. Expertise in data warehousing and ETL software, such as Informatica. Worked on ingesting data using Sqoop from many sources, including SQL servers and HDFS. Experience in using various tools like Sqoop, Flume, Kafka, and Pig to ingest structured, semi-structured, and unstructured data into the cluster. Proficient with Apache Spark ecosystems such as Spark and Spark Streaming using Scala and Python. Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement. Employed ab initio supply chain modeling techniques to optimize inventory management, leading to a 10% decrease in stockouts and a 5% improvement in order fulfillment efficiency. Profound knowledge of Root Cause Analysis, Metadata Analysis, Tableau Dashboard, Tableau Calculated Fields, SQL, and Tableau Blend for code automation, customized reports, code re-usability, and Ad-hoc reports. Strong experience in implementing data models and loading unstructured data using HBase, Dynamo DB, and Cassandra. Hands-on Experience in Spark architecture and its integrations like Spark SQL, Data Frames, and Datasets APIs. Extensive experience in migrating on-premises Hadoop platforms to cloud solutions using AWS. Hands-on experience working on statistical procedures in R like ggplot, shiny apps, and dashboard building. Expertise in creating and customizing Splunk applications, searches, and dashboards as desired by IT teams and businesses. Experience collecting log data from various sources and integrating it into HDFS using Flume, staging data in HDFS for further analysis. Designed and developed logical and physical data models that utilize concepts such as Star Schema, Snowflake Schema, and Slowly Changing Dimensions. Good understanding of Cloud-Based technologies such as AWS and Azure. Experience in changing over existing AWS infrastructure to Serverless architecture (AWS Lambda, AWS Kinesis) through the creation of a Serverless Architecture using AWS Lambda, API gateway, Route 53, S3 buckets. Experience in writing python as an ETL framework and Pyspark to process huge amounts of data daily. Experienced in transporting and processing real-time event streaming using Kafka and Spark Streaming. Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse. Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS. Hands-on experience with importing and exporting data from Relational databases to HDFS, Hive, and HBase using Sqoop. Experienced in processing real-time data using Kafka 0.10.1 producers and stream processors and implemented stream processing using Kinesis and data landed into Data Lake S3. Vast expertise using Spark tools like Spark SQL and RDD transforms. Integrated AWS DynamoDB with AWS Lambda to backup DynamoDB streams and store value items. Adept in statistical programming languages like R and Python including Big Data Technologies like Hadoop 2, HAVE, HDFS, MapReduce, and Spark. Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy, and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS. Created Automation to create infrastructure for Kafka clusters with different instances as per components in the cluster using Terraform for creating multiple EC2 instances & attaching ephemeral or EBS volumes as per instance type in different availability zones & multiple regions in AWS. Developed version control software such as Bit-Bucket, GIT, and SVN. Has used AWS Elastic Beanstalk for app deployments and have experience with AWS Lambda and Amazon Kinesis. Extremely capable of handling big data sets that are structured, semi-structured, or unstructured, facilitating Big Data and machine learning applications. HBase, Cassandra, and MongoDB NoSQL database experience and created Sqoop scripts for data migration from Teradata and Oracle to Bigdata Environment. Vast ETL methodology experience with Talend and created data for Data Profiling, Data Migration, Extraction, Transformation, and Loading. In-depth understanding and experience with machine learning and AI technologies using Python frameworks like Tensor Flow and Scikit Learn. Created Spark jobs that successfully conduct a variety of data transformations on the provided data using the Spark Data Frame and Spark SQL APIs. Practical expertise with waterfall approaches and support teams for development, testing, and operations during the rollout of new systems professional experienceProgramming skills:SQL, Python, LinuxData Science Libraries:NumPy, Pandas, TensorFlow, Pytorch, Matplotlib, Plotly (DASH), SeabornBusiness Intelligence tools:Tableau, Power BI, Snowflake, FlourishCloud:AWS (EC2, S3, RDS, Redshift), Heroku, Azure DevOpsTools:Jira, Git, Alteryx (ETL), Talend (ETL), SnowflakeDatabases:SQL Server, Mongo DB, PostgreSQL, MySQL, OracleTECHNICAL SKILLSClient: Chewy - Dania Beach, FL March 2022 to PresentRole: Sr. Data EngineerResponsibilities:Developed real-time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS. Utilized Ab Initio ETL Coding in GDE for data transformations.Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.ETL jobs were created to load server data and data from other sources into buckets and move S3 data into the data warehouse. Used Talend tool to create workflows for processing data from multiple source systems.Used ETL to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.Employed Ab Initio supply chain modeling techniques to optimize inventory management, leading to a 10% decrease in stockouts and a 5% improvement in order fulfillment efficiency.Created Big Query authorized views for row-level security or exposing the data to other teams. Utilized Teradata SQL Assistant (a.k.a. QueryMan) for EDS Support.Developed and implemented comprehensive data retention strategies, aligning technical solutions with organizational data retention policies and legal requirements, ensuring efficient storage management and compliance.Demonstrated expertise in designing and architecting PI systems tailored to meet specific organizational needs and requirements.Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark. Utilized FastLoad, MultiLoad, and FastExport for efficient data loading.Completed data extraction, aggregation, and analysis in HDFS by using PySpark and storing the data needed in Hive. Leveraged TPump for efficient data loading into Teradata.Conducted data analysis using ab initio approach to identify customer behavior patterns, leading to a 10% increase in sales.Developed Cloud Functions in Python to process JSON files from source and load the files to BigQuery. Implemented Cloud operations and Cloud Governance practices.Environment: HDFS, Hive, Sqoop, Pig, Oozie, Cassandra, MySQL, Kafka, Spark, Redshift, Cornerstone, Apache Spark, Snowflake, Data modelling, Scala, Cloudera Manager (CDH4), HDFS, Azure, Amazon Web Services (AWS), SNS, Splunk, GLUE, Python, GIT, SSIS, T-SQL, Jenkins, Ab Initio ETL Coding in GDE, Ab Initio Metadata Hub Lineage, Ab Initio TRMC, ANSI SQL, Teradata SQL Assistant, BTEQ, FastLoad, MultiLoad, FastExport, TPump, Cloud operations, Cloud Governance, Data Warehouse.Client: Cummins - Columbus, Indiana August 2019 to February 2022Role: Sr. Data EngineerResponsibilities:Analysed data in the Hadoop cluster using big data tools including Pig, Hive, and Sqoop, leveraging knowledge of ANSI SQL and Teradata SQL extensions.Developed a comprehensive understanding of system components and their interactions to optimize performance and reliability, utilizing BTEQ for efficient data loading.Developed Spark SQL applications to perform complex data operations on structured and semi-structured data stored as Parquet, utilizing FastExport for data export tasks.Created ETL jobs on AWS Glue to load vendor data from different sources, transformations involving data cleaning, data imputation, and data mapping, and storing the results into S3 buckets, demonstrating expertise in Data Engineering.Transformation, and Load) strategy to populate the Data Warehouse from various source systems.Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data managementLoaded application analytics data into data warehouse in regular intervals of timePerformed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data managementParticipated in the development of a data pipeline and carried out analytics utilizing the AWS stack (EMR, EC2, S3, RDS, Lambda, Kinesis, Athena, SQS, Redshift, and ECS), integrating Cloud operations and Cloud Governance practices.Utilized risk management principles to identify and mitigate potential data-related risks, employing knowledge of Cloud Security best practices.Developed code in Python to retrieve and manipulate data, demonstrating proficiency in Python programming.Spearheaded the design and implementation of large-scale data management systems using Java and Springboot technologies, ensuring high performance and scalability to handle massive datasets efficiently.Made Hive queries that enabled market analysts to identify emerging trends, utilizing knowledge of Data Observability practices.Optimized data processing pipelines to enhance performance and reliability, implementing techniques such as parallel processing, caching, and data partitioning to maximize throughput and minimize latency.Collaborated with legal and compliance teams on data governance initiatives, providing technical expertise and guidance on implementing data management solutions that align with regulatory requirements and industry standards.Environment: HDFS, Hive, Scoop, Pig, Impala, Flume, Oozie, Kafka, Spark, HBase, Unix Shell Scripting, Cloudera, Amazon Web Services (AWS), ETL, Yarn, Spark SQL, Redshift, Mongo DB, Databricks, R Programming, Athena, Power BI, Tableau, Data Warehouse, Pyspark, Snowflake, Jenkins, Azure, SQL Server (SSIS, SSRS), Python, ANSI SQL, Teradata SQL extensions, BTEQ, FastExport, Cloud operations, Cloud Governance, Data Engineering, Data Observability.Client: Ford, Dearborn, MI April 2017 to July 2019Role: Sr. Data EngineerResponsibilities:Anti-Money Laundering (AML) - Validating the EBA process is running as originally designed and documented.Light GBM regression model has been developing to predict the associated risk rating with the location.Responsible for providing data science solutions in Agile environment from data gathering to deliverables.Responsible for researching and creating AWS architecture in collaboration with data engineers and DevOps team to migrate models from on-premises to AWS cloud.Worked with the project team to develop and maintain the process of ETL (extract, transform, and load).Running SQL scripts, creating indexes, and stored procedures for data analysis.Designed various ETL strategies from various heterogeneous sources.Worked on different data formats such as JSON, and XML.Used Spark-SQL and Python on Spark engine to develop end-to-end ETL pipeline.Participated in code/design analysis, strategy development, and project planning.Gathered and Imported data from different sources into Spark RDD for further data transformations and analysis. Worked with vendors to onboard external data into Target s3 buckets.Deployed the code for stream processing using Apache Kafka in Amazon S3.Monitored and controlled Local disk storage and Log files using Amazon Cloud Watch.Exploring AWS services and Blue Prism classification models for automating document classification.Mentored analysts across different teams on Python libraries, packages, frameworks, and AWS services.Automated Python scripts on the server and locally depending on the job and data size.Delivers effective presentation of findings and recommendations to multiple levels of leadership by creating visual displays of quantitative information.Used Python 3.X (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.Expertise in creating and customizing Splunk applications, searches, and dashboards as desired by IT teams and businesses. Experience collecting log data from various sources and integrating it into HDFS using Flume, staging data in HDFS for further analysis.Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.Designed rich data visualizations to model data into human-readable form with Matplotlib.Developed MapReduce/Spark Python modules for predictive analytics and machine learning in Hadoop on AWS.Worked on data cleaning and ensured data quality, consistency, and integrity using Pandas, Numpy.Used NumPy, SciPy, pandas, nltk (Natural Language Processing Toolkit), and matplotlib to build the model.Application of various Artificial Intelligence (AI)/machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models.Collaborates with cross-functional partners to understand their business needs, formulating end-to-end analysis that includes data gathering, analysis, ongoing deliverables, and presentations.Environment: AML, AWS cloud, AML, SPSS, Splunk, SQL, XML, SSIS, JSON, Cloud Watch, RDD, ETLClient: IBing Software Solutions Private Limited Hyd India October 2015 to December 2016Role: Data EngineerResponsibilities:Designed ARM templates in Azure and custom build PowerShell scripts to automate the resource creation and deployment which saved 140 hours of effort for creating every new environment.Created pipelines in Azure Data Factory using linked services/datasets to extract, transform, and load data between different storage systems like Azure SQL, Blob storage, Azure DW and Azure Data LakeMigrated data from on-premises SQL databases to cloud Azure SQL DB using SSIS packages.Created Databricks notebooks to perform ETL operations to stage business data based on the requirements.Developed and implemented novel ab initio machine learning algorithms to enhance product recommendation systems, resulting in a 30% increase in cross-selling revenue.Involved in landing different source datasets into Azure Data Lake Storage (ADLS) in the form of Parquet file.Extensively used Agile methodology as the Organization Standard to implement the data Models. UsedPerformed Regression testing for Golden Test Cases from State (end to end test cases) and automated the process using python scripts.Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.Decommissioning nodes and adding nodes in the clusters for maintenanceAdding new users and groups of users as per the requests from the clientLogistic Regression and Linear Regression using Python to determine the accuracy rate of each model.Performed migration of customer and employee databases to Snowflake database from on-prem SQL DatabaseAs a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.Cleaned and transformed data by executing SQL queries in Snowflake worksheets.Performed data quality issue analysis using Snow SQL by builing analytical warehouses on Snowflake.Experience in Data warehouse technical architectures, ETL/ELT, reporting/analytic tools, and data securityDeveloped SQL scripts to Upload, Retrieve, Manipulate, and handle sensitive data (National Provider Identifier Data I.e., Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the ProjectWorked on retrieving the data from FS to S3 using spark commands.Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.Imported Legacy data from SQL Server and Teradata into Amazon S3Worked with stakeholders to communicate campaign results, strategy, issues or needs.Environment: Python, Azure, Pyspark, Hive, Mspark, SQL, ETL, EC2, Jenkins, Jira, JavaScript, AWS.Role: Hudda Infotech Private Limited Hyderabad, India April 2014 to September 2015Role: Data EngineerResponsibilities:Worked as a Data Engineer on several Hadoop Ecosystem components with Cloudera Hadoop distribution.Worked on managing and reviewing Hadoop log files.Tested and reported defects in an Agile Methodology perspective.Worked on migrating Pig scripts programs to Spark and Spark SQL to improve performance.Extensively involved in writing Oracle, PL/SQL, stored procedures, functions, and packages.Loaded data from different source (database & files) into Hive using Talend tool.Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.Worked on interviewing business users to gather requirements and documenting the requirements.Used Flume to collect, aggregate, and store web log data from different sources.Imported and exported data into HDFS and Hive using Sqoop and Flume.Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.Implemented a proof of concept deploying this product in Amazon Web Services AWS.Developed and maintained stored procedures, implemented changes to database design including tables.Ingested data from various sources and processed the Data-at-Rest utilizing Big Data technologies.Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.Worked with AWS to implement client-side encryption as Dynamo DB does not support at rest encryption at this time.Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.Created Integration Relational 3NF models that can functionally relate to other subject areas and are responsible to determine transformation rules accordingly in the Functional Specification Document.Involved in reports development using reporting tools.Loaded and transformed huge sets of structured, semi structured, and unstructured data.Developed and implemented logical and physical data models using enterprise modeling tools Erwin.Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.Performed performance tuning of OLTP and Data warehouse environments using SQL.Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.Implemented referential integrity using primary key and foreign key relationships.Developed Staging jobs where in using data from different sources.Environment: HBase, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS, Sqoop, Flume

Respond to this candidate
Your Message
Please type the code shown in the image:

Note: Responding to this resume will create an account on our partner site postjobfree.com
Register for Free on Jobvertise