Senior Data Engineer Resume Delray beach...

Senior Data Engineer Resume Delray beach...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Senior Data Engineer
Target Location	US-FL-Delray Beach
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Click here or scroll down to respond to this candidate

Candidate's Name
Email: EMAIL AVAILABLE PH: PHONE NUMBER AVAILABLESr. Data EngineerProfessional Summary 9+ years of IT experience in various Big Data Technologies with 5+ years of Big data Processing with large scaled distributed applications using Big Data Ecosystem tools.
Comprehensive experience on Hadoop ecosystem utilizing technologies like MapReduce, Hive, HBase, MapReduce, YARN, PIG, Spark, Sqoop, Kafka, Oozie, Zookeeper.
Experience in AWS platform and its features including IAM, EC2, EBS, VPC, RDS, Cloud Watch, AWS Configuration, Cloud Front, S3, SQS, SNS, Lambda and Route53
Experience working on Hortonworks / Cloudera / Map R.
Proficient in converting AWS existing infrastructure to server less architecture (AWS Lambda, Kinesis) and deployed AWS Cloud formation
Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK s
Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog, Quick Sight.
Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
Hands of experience in GCP, BigQuery, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver.
Knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers, Map Reduce programming paradigm and good hands-on experience in PySpark and SQL queries.
In depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages and task and Spark Streaming.
Hands on experience in Apache Spark creating RDD s and Data Frames applying Operations Transformation and Actions and concerting RDD s to Data Frames.
Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
Performed map-side joins on RDD and Imported data from different sources like HDFS/HBase into Spark RDD.
Proficient in SQL, PL/SQL, Scala and Python coding.
Strong knowledge in NoSQL column-oriented databases like HBase and their integration with Hadoop cluster.
Expertise in database performance tuning data modeling.
Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Loading the data into EMR from various sources S3 process it using Hive Scripts.
Used the Spark - Cassandra Connector to load data to and from Cassandra.
Populated HDFS with vast amounts of data using Apache Kafka and Flume.
Knowledge in Kafka installation integration with Spark Streaming.
Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL.
Setup/Managing CDN on Amazon Cloud Front to improve site performance.
Good working experience on Hadoop tools related to Data warehousing like Hive, Pig and also involved in extracting the data from these tools on to the cluster using Sqoop.PROFESSIONAL EXPERIENCESr. Data EngineerWalmart,Rogers, Arkansas May 2023 to presentResponsibilities:
Responsible for designing and developing various analytical solutions for gaining analytical insights into large data sets by ingesting and transforming these datasets in the Big Data environment using technologies like Spark.
Experience in building/operating/maintaining fault tolerant and scalable data processing integrations using AWS.
Migrated terabytes of data from the data warehouse into the cloud environment in an incremental format.
Developed Data-Pipelines using Apache Spark and other ETL solutions to get data from various sources to the central warehouse.
Developed generic code modules and defined strategy to extract data from different vendors and build ETL logic on top of them to feed the data to the central data warehouse.
Performed Data Pre-processing, Data Profiling, Data Modeling and created frameworks using Python framework for validation and automation of regular checks.
Involved in daily operational activities in order to troubleshoot ad-hoc production and data issues and enhancement of infrastructure in the space of Big Data and AWS cloud to provide better solutions to delegate the existing issues.
Developed and enhanced code for implementing required data quality check before sending into the cloud-based data warehouse.
Developed Spark jobs to transform data and apply business transformation rules to load/process data across enterprise and application specific layers.
Worked on submitting the Spark jobs which shows the metrics of the data which is used for Data Quality Checking.
Working on building efficient data pipelines that transform high volume data into a format used for analytical, fraud prevention cases.
Extensively used Splunk Search Processing Language (SPL) queries, Reports, Alerts and Dashboards.
Excellent knowledge of source control management concepts such as Branching, Merging, Labeling/Tagging and Integration, with tool like Git.
Collaborated with and across Agile teams to design, develop, test, implement, and support technical solutions in full-stack development tools and technologies.
Performed unit tests and conduct reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance.Environment: AWS, Python, MS SQL Server, GIT, Splunk, IntelliJ, Spark Jenkins, Python, MySQL, Pig, Hadoop, Hortonworks, Tableau, Cassandra, Spark, AWS Kinesis Firehose, AWS Glue, Athena, UNIX Shell Scripting, HBase, Kafka
Sr. Data EngineerChevron Corporation, Santa Rosa, NM November 2021 to April 2023Responsibilities:
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team
Transformed the raw data into actionable insights by incorporating various statistical techniques and using data mining tools such as Python (Scikit-Learn, Pandas, NumPy, Matplotlib) and SQL.
Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Compared Self hosted Hadoop with respect to GCPs Data Proc and explored Big Table (managed HBase) use cases, performance evolution.
Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
Worked on loading CSV/TXT/DAT files using Scala language in Spark Framework to process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
Involved in Platform Modernization project to get the data into GCP.
Experience in Creating, Scheduling and Debugging Spark jobs using Python.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and BigData technologies.
Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce.
Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases into EDW.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
Created Airflow Scheduling scripts in Python.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
Built models using Python and PySpark to predict the probability of attendance for various campaigns and events.
Developed Machine Learning models to build media planning and optimization specifications and implemented SQL queries to mine CRM data.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
Designing and building data pipelines to load the data into GCP platform.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation, and summarization activities according to the requirement.
Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
Used broadcast variables in spark, effective & efficient Joins, transformations and other capabilities for data processing. Utilized Spark in Memory capabilities, to handle large datasets.Environment: GCP, RedShift, Spark, Hive, Sqoop, Oozie, HBase, Scala, MapReduce, Azure, Teradata, SQL, python, R Studio, Excel, Power Point, Tableau Hadoop, PySpark, random forest, Apache Airflow.
Hadoop/Big Data DeveloperHomesite insurance, Boston, MA December 2018 to October 2021Responsibilities:
Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and
Created continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps to automate steps in software delivery process.
Experience in processing of load and transform the large data sets of structured, unstructured and semi structured data in Hortonworks.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE & Impala for efficient data access.
Configured spark streaming data to receive real-time data from Kafka and store it in HDFS.
Involved in HBase setup and storing data into HBase, which will be used for further analysis.
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift and used Glue to define table and column mapping from S3 data to Redshift.
Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Pig, Hbase, AVRO, Zookeeper), Amazon Web Services (S3, EC2, EMR etc.)
Worked on importing some of the data from NoSQL databases including HBase.
Worked in Agile environment and used rally tool to maintain the user stories and tasks.
Extensively worked on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
Experienced in running query using Impala and used BI tools and reporting tool (tableau) to run ad-hoc queries directly on Hadoop.
Worked on Apache Tez, an extensible framework for building high performance batch and interactive data processing applications Hive jobs
Experience in using Spark framework with Scala and Python. Good exposure to performance tuning hive queries and MapReduce jobs in spark (Spark-SQL) framework on Hortonworks.
Developed Scala & Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark-SQL for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Configured Spark streaming (receivers) to receive Kafka input streams from the Kafka and Specified exact block interval for data Processing into HDFS using Scala.
Collect the data using Spark streaming and dump into HBase. Used the Spark- Cassandra Connector to load data to and from Cassandra.
Collecting and aggregating large amounts of log data using Kafka and staging data in HDFS Data lake for further analysis.
Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.
Extensively used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
Creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way
Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.Environment: HDFS, Python Scripting, Map Reduce, Hive, Impala, Spark-SQL, Spark Streaming, Sqoop, AWS, Python, Scala, UNIX Shell Scripting, Git, Hadoop, Map Reduce, Hue, Pig, HBase, Cloudera, Impala, Kafka, Teradata, HBase, Oozie, Flume, AWS.
Hadoop EngineerDhruvsoft Services Private Limited, Hyderabad, India November 2016 to September 2018Responsibilities:
Involved in the high-level design of the Hadoop architecture for the existing data structure and Problem statement and setup the Multi-Node cluster and configured the entire Hadoop platform.
Extracted files from MySQL, Oracle, and Teradata through Sqoop and placed in HDFS Distribution and processed.
Worked with various HDFS file formats like Avro, Parquet, ORC, Sequence File and various compression formats like Snappy, bzip2, GZip.
Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing.
Developed the Hive UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts.
Used Hive to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
Developed bash scripts to automate the data flow by using different commands like awk, sed, grep, xargs, exec and integrated the scripts with YAML.
Developed Hive Queries for data sampling and analysis to the analysts.
Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
Developed Bash Script and python modules to convert mainframe fixed width source file to delimited file.
Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts.
Created workflows on Talend to extract data from various data sources and dump them into HDFS.
Designing ETL data pipeline flow to ingest data from RDMS source to HDFS using Shell script.
Created HBase tables from Hive and Wrote HiveQL statements to access HBase table s data.
Used Hive to perform data validation on the data ingested using Scoop and Flume and the cleansed data set is pushed into HBase.Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Scala, Python, Pig, Sqoop, AWS, DB2, UNIX Shell Scripting.
Data EngineerCybage Software Private Limited Hyd India August 2014 to October 2016Responsibilities:
Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database.
Interacted with the business community and database administrators to identify the Business requirements and data realties.
Created various transformations according to the business logic like Source Qualifier, Normalizer, Lookup, Stored Procedure, Sequence Generator, Router, Filter, Aggregator, Joiner, Expression and Update Strategy.
Improving workflow performance by shifting filters as close as possible to the source and selecting tables with fewer rows as the master during joins.
Used connected and unconnected lookups whenever appropriate, along with the use of appropriate caches.
Created tasks and workflows in the Workflow Manager and monitored the sessions in the Workflow Monitor.
Perform Maintenance, including managing Space, Remove Bad Files, Remove Cache Files and monitoring services.
Set up Permissions for Groups and Users in all Development Environments.
Migration of developed objects across different environments.
Designed and developed Web services using XML and jQuery.
Improved performance by using more modularized approach and using more in-built methods.
Experienced in Agile Methodologies and SCRUM Process.
Maintained program libraries, user s manuals and technical documentation.
Wrote unit test cases for testing tools.
Involved in entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
Built various graphs for business decision making using Python MatPlotLib library.
Worked in development of applications especially in UNIX environment and familiar with all its commands.
Used NumPy for Numerical analysis for Insurance premium.
Handling the day to day issues and fine tuning the applications for enhanced performance.
Implement code in Python to retrieve and manipulate data.Environment: Python, Django, MySQL, Linux, Informatica Power Centre 9.6.1, PL/SQL, HTML, XHTML, CSS, AJAX, Apache Web Server, NO SQL, jQuery.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: