Quantcast

Senior Big Data Engineer Resume Aurora, ...
Resumes | Register

Candidate Information
Name Available: Register for Free
Title senior big data engineer
Target Location US-CO-Aurora
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Data Analyst Scientist Denver, CO

Data Engineer Denver, CO

Business Intelligence Data Analyst Denver, CO

Senior Director Business Development Aurora, CO

Data Center Network Engineer Denver, CO

Data Scientist Science Denver, CO

Machine Learning Data Scientist Greenwood Village, CO

Click here or scroll down to respond to this candidate
Candidate's Name
e-mail: EMAIL AVAILABLE
Phone: PHONE NUMBER AVAILABLE
LinkedIn: LINKEDIN LINK AVAILABLE


Professional Summary:

        Senior Big Data/ Spark Engineer with over 8 years of experience in Information Technology, specializing in Big Data technologies
        utilizing the Hadoop framework. Proven expertise in Analysis, Design, Development, Testing, Documentation, Deployment, and
        Integration with SQL and advanced Big Data technologies. Skilled in Hadoop architecture, including HDFS, Job Tracker, Task Tracker,
        Name Node, Data Node, and the MapReduce Programming Paradigm.
        Extensive knowledge of the Hadoop ecosystem encompassing HDFS, YARN, MapReduce, Kafka, Sqoop, Avro, Spark, Hive, HBase,
        Impala, Pig, Oozie, Hue, Flume, and Zookeeper.
        Proficient in transforming and retrieving data using Spark, Impala, Pig, Hive, SSIS, and MapReduce.
        Skilled in Data Pipeline Development, Data Streaming, Cloud Migration Strategy, data streaming from diverse sources such as
        cloud platforms (AWS, Azure) and on-premises utilizing Spark and Flume.
        Experience in developing customized UDFs in Python to enhance Hive and Pig functionality.
        Hands-on utilization of Spark and Scala APIs for performance comparison with Hive and SQL, and manipulation of Data Frames in
        Scala using Spark SQL.
        Proficiency in data importing and exporting using Sqoop between HDFS and Relational Database Systems.
        Extensive use of open-source languages such as Perl, Python, Scala, and Java.
        Hands-on experience in distributed computing architectures including AWS products (e.g., EC2, Redshift, EMR, Elastic search),
        Hadoop, Python, Spark, and effective use of Azure SQL Database, MapReduce, Hive, SQL, and PySpark for addressing big data
        challenges.
        Expertise in Hadoop streaming, Machine Learning Models, Disaster Recovery Systems, and MR job writing with Perl, Python, besides
        Java.
        Excellent understanding and extensive use of Web HDFS REST API commands.
        Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX, and Spark Streaming for processing and transforming complex
        data using in-memory computing capabilities written in Scala.
        Utilized Spark to enhance the efficiency of existing algorithms with Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDDs,
        and Spark YARN.
        Experience in automation and building CI/CD pipelines using Jenkins and Chef.
        Developed generic SQL procedures and complex T-SQL statements for report generation.
        Hands-on experience in data modeling with Star and Snowflake schemas.
        Proficient in Business Intelligence tools like SSIS, SSAS, SSRS, Informatica, and PowerBI.
        Designed and implemented data distribution mechanisms on SQL Server.
        Experience with Microsoft Azure/Cloud Services such as SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data
        Lake, and Azure Blob Storage.
        Familiarity with Data Marts, OLAP, Dimensional Data Modeling using Ralph Kimball Methodology, and Analysis Services.
        Designed and implemented High Availability and Disaster Recovery Systems on SQL Server.
        Expertise in SQL Server Failover Cluster with Active/Passive model.
        Excellent knowledge of Database/Data Warehousing concepts, including Normalization, Entity-Relationship Modeling, Dimensional
        Data Modeling, Schema, and Metadata.
        Proficient in monitoring data activities and applying improvements.
        Developed Spark Applications for handling data from various RDBMS and Streaming sources.
        Extensive knowledge and utilization of NoSQL databases like HBase, MongoDB, and Cassandra.
        Familiarity with Confidential Azure Services, Amazon Web Services, and Management.
        Experienced in side-by-side upgrade, in-place upgrade, and data migration.
        Proficient in Incident Management, SLA Management, TSG Maintenance, and FTM Improvement.
        Effective planning and management of project deliverables using onsite and offshore models to enhance client satisfaction.
        Responsible for team goal setting, providing timely feedback, and improving performance.
     Technical Skills:
     Big Data/ Hadoop Ecosystem: HDFS, Map Reduce, YARN, Hive, Pig, Hbase, Kafka, Impala, Zookeeper, Sqoop, Oozie, DataStax &
     Apache Cassandra, Drill, Flume, Spark, Solr and Avro
     Web Technologies HTML, XML, JDBC, JSP, JavaScript, AJAX
     RDBMS: Oracle 12c, MySQL, SQL server, Teradata
     No SQL: Hbase, Cassandra, MongoDB
     Web/Application servers: Tomcat, LDAP
     Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)
     Cloud Environment: AWS, MS Azure
     Development Tools: Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans
     Programming Languages: Scala, Python, SQL, Java, PL/SQL, Linux shell scripts.
     Tools Used: Eclipse, Putty, Cygwin, MS Office, Nitro, Copilot
     BI Tools: Platfora, Tableau, Pentaho




Project Experience:
Client: Ascena Retail Group, Pataskala, Ohio                                                          July 2021- Present
Role: Senior Big Data/ Spark Engineer
Roles & Responsibilities:

        Spearheaded development of data pipelines, Data Integration, achieving 99% data accuracy and reducing processing time by up to
        35% using Hive, Spark, Scala, Kafka, and advanced techniques.
        Directed extraction and integration of data from AWS S3 and third-party sources, enhancing insight accuracy by 20% and processing
        speed by 20% using Hadoop on Qubole.
        Architected a resilient ETL framework and CI/CD pipelines with Jenkins and Chef, reducing data acquisition and deployment times by
        30% and 40%, respectively.
        Utilized Platfora for real-time data visualization from Hive tables, creating 5 dashboards for data-driven decisions, and developed
        multiple Tableau dashboards for diverse business needs.
        Employed Sqoop, Apache Drill, and Cassandra to optimize data access, reduce data processing time by 25%, and improve
        accessibility by 30%.
        Configured and optimized Hive, developed Hive UDFs, and implemented Spark streaming for real-time data transformation.
        Managed data movement and ensured consistency between cloud and on-premise Hadoop using DISTCP and proprietary
        frameworks.
        Applied Data Frame Manipulation, ORC, Parquet, and Avro formats with compression for efficient storage, and adopted AVRO for
        data ingestion, improving efficiency.
        Implemented Apache Sentry for secure access control on Hive tables and utilized Tidal and Oozie for workflow scheduling, ensuring
        smooth data processing.
        Led Hadoop updates, patches, and version upgrades, effectively addressing post-upgrade issues in collaboration with Enterprise
        data support teams.
        Employed Jenkins, Maven, and test scripts for seamless project development, deployment, and continuous integration.
        Developed production-level machine learning models with Python and PySpark, and provided Git support for repository
        management and access control.
        Produced comprehensive research reports, providing strategic recommendations to senior management based on detailed
        experiment findings.

    Environment: Hadoop, Map Reduce, Horton Works, Spark, Zeppelin, Qubole, Oozie, Hive, Impala, Kafka, Sqoop, AWS, Cassandra.
    Tableau, Pig, Teradata, Java, Scala, Jenkins, Maven, Chef, Python, Linux Red-Hat and Teradata, Git.


Client: Molina Healthcare, Bothell, WA                                                                       Aug 2019- June 2020
Role: Big Data/ Spark Engineer
Roles & Responsibilities:

        Managed ingestion and transformation of over 10TB of diverse data types, maintaining 98% data integrity and enhancing processing
        efficiency by 30% using Perl and Python for data collectors and parsers.
        Developed custom Python UDFs for Hive and Pig, improving processing efficiency by 25%, and managed data ingestion into Azure
        services, reducing latency by 30%.
        Executed ETL processes and data migration with Azure Data Factory, achieving a 95% success rate and ensuring 100% data integrity
        during migrations to SQL Azure.
        Designed 10+ interactive dashboards in Zoom-Data and developed over 20TB of Big Data in Cloudera Hadoop, boosting processing
        efficiency by 40%.
        Automated tasks with Shell scripting and Crontab, reducing manual processing time by 50%, and optimized Hadoop MapReduce,
        HDFS, and multiple MapReduce jobs in Java and Scala for data preprocessing.
        Utilized Python libraries like pandas and NumPy for data analysis, and leveraged Web HDFS REST API commands in Perl scripting for
        efficient Big Data management within Hadoop.
        Managed Big Data in Hive and Impala, using partitioning and ETL processes, and employed Cloudera tools for streamlined data
        management and analytics.
        Conducted real-time data streaming and transformation with Spark, enabling advanced analytics and actionable insights for
        stakeholders.
        Integrated NoSQL databases such as HBase and Cassandra, addressing diverse data storage needs and using Cassandra for
        distributed metadata resolution.
        Installed and configured Hive, developed Hive UDFs, and utilized MapReduce and JUnit for unit testing to ensure data accuracy and
        reliability.
        Leveraged DataStax Cassandra for reporting and analytics, enhancing search capabilities with Apache Solr/Lucene and providing
        quick data access.
        Managed various data sources like Teradata and Oracle, facilitating seamless data loading and transformation.
        Supported Hadoop cluster operations, including installation, commissioning, decommissioning, and Name node recovery, ensuring
        optimal performance and scalability.
        Utilized Yarn Architecture and MapReduce 2.0 for POC projects and supported MapReduce programs for smooth data processing
        operations at Molina Healthcare.

Environment: Hadoop, Cloudera, Map Reduce, Kafka, Impala, Spark, Azure, data bricks, data factory, data lake, Zeppelin, Hue, Impala, Pig,
Hive, Sqoop, Java, Scala, Cassandra, SQL, Tableau, Pig, Zookeeper, Teradata, Zoom-Data, Linux Red-Hat and Oracle.

Client: Global Atlantic Financial Group, Indianpolis, IA                                                      Feb 2017- July2019
Role: Big Data / Hadoop Engineer
Roles & Responsibilities:
       Led the development of scalable data solutions with Hadoop, increasing processing speed by 40% and overseeing the migration of
         legacy applications to Hadoop infrastructure.
       Implemented Spark code in Scala for faster data operations with HBase, and optimized Kafka producers, enhancing data reliability by
         30%.
       Provided real-time data insights with Confluent Kafka, reducing ETA errors by 20%, and improved data processing efficiency by 35%
         using Apache Spark on Mesos.
       Utilized Flume and Spark for real-time data processing, improving speed by 30%, and developed MapReduce jobs with Hive and Pig,
         reducing analysis time by 25%.
       Engineered a robust multi-terabyte data ingestion framework with quality checks and storage optimization using Parquet and
         Amazon S3, reducing transfer time by 25%.
       Leveraged AWS EC2 for scalable data operations and designed efficient data storage solutions compatible with Python for data
         modeling and schema design.
       Crafted Spark Streaming code for real-time ingestion and automated job scheduling with Oozie, streamlining workflows and
         improving efficiency.
       Deployed an Apache Solr/Lucene search engine server, enhancing financial document search capabilities, and exported analyzed
         data to relational databases using Sqoop.
       Developed Spark programs in Scala, Spark SQL queries, and Oozie workflows, facilitating streamlined data processing and analysis.
       Designed analytics solutions for structured and unstructured data, managing large data ingestion using Avro, Flume, Thrift, Kafka,
         and Sqoop.
       Created Pig UDFs and scripts to analyze customer behavior and processed data in Hadoop, improving data insights and operational
         efficiency.
       Scheduled automated tasks with Oozie for data loading and preprocessing in HDFS using Sqoop, Pig, and Hive.
       Managed scalable distributed computing systems and designed software architecture using Hadoop, Apache Spark, and Apache
         Storm.
       Ingested streaming data into Hadoop using Spark and Storm, ensuring timely processing and visualized data in Tableau dashboards
         for enhanced insights.
       Facilitated data transfer from HDFS to MongoDB and monitored the Hadoop Cluster using Cloudera Manager to ensure optimal
         performance and reliability.

Environment: Hadoop, Spark code, Scala, Hbase, AWS, EC2, S3, Oozie, spark streaming, Pig, kafka, Mongo DB, Hive, Map Reduce, Flume.
ICICI Financial Services, HYD, IND                                                                  Dec 2015- Nov 2016
Hadoop Developer
Roles & Responsibilities:

        Installed and configured Hadoop MapReduce and HDFS, developing multiple MapReduce jobs in Java for data cleaning and
        preprocessing tasks.
        Assessed the suitability of Hadoop and its ecosystem for the project, conducting proof of concept (POC) applications to validate and
        eventually leverage the benefits of the Big Data Hadoop initiative.
        Estimated software and hardware requirements for Name Node and Data Node, and planned the cluster accordingly.
        Extracted necessary data from servers into HDFS and performed bulk loading of cleaned data into HBase.
        Led NoSQL column family design, client access software, and Cassandra tuning during the migration from Oracle-based data stores.
        Implemented data streaming capabilities using Kafka and Talend for handling multiple data sources.
        Participated in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.
        Developed large-scale data processing systems in data warehousing solutions and worked with unstructured data mining on NoSQL
        platforms.
        Specified cluster size, allocated resource pools, and distributed Hadoop by writing specification texts in JSON format.
        Designed, implemented, and deployed custom parallel algorithms within a customer's existing Hadoop/Cassandra cluster for various
        customer-defined metrics and unsupervised learning models.
        Enhanced and optimized Spark code for product aggregation, grouping, and data mining tasks.
        Utilized DataStax Cassandra CQL to write queries for creating, altering, inserting, and deleting elements.
        Developed MapReduce programs and Hive UDFs in Java, utilizing MapReduce JUnit for unit testing.
        Created Hive queries for analysts to analyze data using Hive Query Language (HQL).
        Queried both Managed and External tables created by Hive using Impala.
        Implemented an email notification service upon job completion for specific teams requesting data.
        Defined job workflows based on dependencies in Oozie.
        Played a pivotal role in productionizing applications after testing by BI analysts.
        Conducted a POC of FLUME to handle real-time log processing for attribution reports.
        Ensured system integrity of all sub-components related to Hadoop.

Environment: Apache Hadoop, HDFS, Spark, Solr, Hive, DataStax Cassandra, Map Reduce, Pig, Java, Flume, Cloudera CDH4, Oozie, Oracle,
MySQL, Amazon S3.


Neo App Technologies, HYD, IND                                                                     Oct 2014- Nov 2015
Support Analyst
Roles & Responsibilities:

        Resolving issues pertaining to Enterprise Data Warehouse (EDW), stored procedures in OLTP systems, and devising, analyzing,
        designing, and implementing ETL strategies.
        Identifying performance bottlenecks in existing sources, targets, and mappings through comprehensive data flow analysis,
        transformation evaluation, and tuning to enhance overall performance.
        Extracting data from heterogeneous sources including Oracle databases, XML, and flat files, and loading it into a relational Oracle
        warehouse.
        Troubleshooting and optimizing standard and reusable mappings and mapplets using a variety of transformations such as
        Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected), and Filter.
        Performing SQL query and Stored Procedure tuning to expedite data extraction in OLTP environments, addressing and resolving
        related issues.
        Investigating and resolving long-running sessions and associated issues.
        Utilizing Variables and Parameters within mappings to facilitate value passing between sessions.
        Contributing to the development of PL/SQL stored procedures, functions, and packages for processing business data in OLTP
        systems.
        Collaborating with Services and Portal teams to address data-related issues in OLTP systems.
        Working closely with testing teams to address bugs in ETL mappings prior to production deployment.
        Generating weekly project status reports, monitoring task progress according to schedule, and communicating any risks and
        contingency plans to management and business users.

Environment: Informatica PowerCenter, Oracle, PL/SQL, SQL Developer, ETL, OLTP, XML, Toad.


EDUCATION:
Bachelor s degree in computer science from JNT University, Hyderabad                                Aug 2015

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise