Senior Data Cloud Engineer Resume Manhat...

Senior Data Cloud Engineer Resume Manhat...
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Senior Data / Cloud Engineer
Target Location	US-NY-Manhattan
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Senior Data Engineer New York City, NY
Cloud Architect Senior Director Massapequa, NY
| Senior Data Solutions Architect Jersey City, NJ
Data Engineer Senior Piscataway, NJ
Senior Data Engineer Manhattan, NY
Senior Big Data Engineer Manhattan, NY
Senior Data Engineer Brooklyn, NY
Click here or scroll down to respond to this candidate
Candidate's Name
Lead Data/Cloud EngineerPhone: PHONE NUMBER AVAILABLE; Email: EMAIL AVAILABLEProfessional Summary10+ years of experience in Big Data Solutions development with expertise in Hadoop and cloud platforms (AWS, Azure).Proficient in data ingestion, processing, and manipulation across various sectors using AWS and Azure.Experience working on Big Data systems, ETL pipelines, and real-time analytic systems, designing and implementing end-to-end data pipelines for data processing and extracting, transforming, and loading data from various sources to destinations.Strong skills in designing scalable data architectures on Azure, utilizing services like Azure Data Lake, Azure Synapse Analytics, and Azure SQL Data Warehouse.Proven track record in building and managing data pipelines on Azure Data Factory.Experienced in AWS services, including EMR, Redshift, Lambda, and Glue, for data processing and analysis.Strong exposure to Software Development methodology and application integration aspects, well versed in Process Narratives, involved in all the phases of SDLC from requirements gathering, planning, design, construction, implementation, testing & support of large sized Business Intelligence and Data Warehousing systems (On Premises and Cloud).Proficient in managing and monitoring complex NiFi data flows, configuring processors, controllers, and reporting tasks to optimize data movement.Leading teams to implement best practices for data security, availability, and performance in ETL pipelines.Skilled in Agile/Scrum processes and CI/CD pipeline implementation.Proficient in Hadoop ecosystems, Microsoft Azure, and Spark Databricks.Achievement-driven professional skilled in databases, data management, analytics, data processing, data cleanings, data modeling, and data-driven projectsAdept at integrating NiFi with various systems, including Kafka, HDFS, and relational databases, for seamless data exchange and analytics.Expert in on-premises data migration and CloudFormation scripting for streamlined project execution.Adept at optimizing Spark performance across multiple platforms.Expertise in troubleshooting to fix issues with scripts or processesOrchestrated secure Virtual Private Cloud (VPC) deployment with CloudFormation scripting.Technical SkillsCloud Platforms: AWS (S3, EMR, DynamoDB, Redshift, RDS, Athena, Elasticsearch, Kinesis), Azure (Data Factory, Databricks, Data Lake Gen2, SQL, HDInsight, Synapse, Stream Analytics).Programming Languages: Scala, Python, Java, Bash, SQL, HiveQL.Hadoop Components: Hive, Pig, Zookeeper, Sqoop, Oozie, Yarn, Maven, Flume, HDFS, Airflow.Hadoop Administration: Zookeeper, Oozie, Cloudera Manager, Ambari, Yarn.ETL Data Pipelines: Apache Airflow, Hive, Sqoop, Flume, Scala, Python, Kafka, Logstash.Scripting: HiveQL, SQL, Shell Scripting.Big Data Frameworks: PySpark, Kafka.Data Warehouses/Databases: Snowflake, Redshift, MySQL, PostgreSQL, MongoDB, HBase, DynamoDB, BigQuery, Oracle, RDBMS.Spark Framework: PySpark API, Spark Streaming, Spark SQL, Spark Structured Streaming.Visualization: Tableau, QlikView, PowerBI.IDE: Jupyter Notebooks, PyCharm, IntelliJ.CI/CD: Jenkins, Versioning: Git, GitHub, Bitbucket.Methodologies: Agile Scrum, Test-Driven Development, Continuous Integration, Unit Testing, Functional Testing, Scenario Testing.Professional ExperienceLead Data /Cloud Engineer  Oct23 - PresentBloomberg, New York City, NYSummary: I was responsible for maintaining multiple Hadoop clusters for various tenants, overseeing daily maintenance tasks and performing upgrades across the cluster machines. I worked closely with teams utilizing these clusters, helping troubleshoot application issues and implementing necessary adjustments to ensure smooth operation. I also collaborated with both application and hardware teams to develop and refine maintenance workflows, enhancing the internal tooling to improve the efficiency of our maintenance processes. My proactive approach ensured the stability and performance of the clusters while supporting seamless operations for all teams involved.Built scalable, fault-tolerant data pipelines using Python and distributed computing frameworks like Apache Spark and Hadoop to process large datasets, ensuring efficient data ingestion and transformation.Collaborated with cross-functional teams to design and build APIs that facilitated integration between Hadoop services and other internal systems.Partnered with hardware teams to develop and optimize maintenance tools, improving the speed and accuracy of cluster upkeep.Designed and implemented real-time data pipelines using Apache Kafka to stream, process, and analyze high-volume event data, ensuring low-latency and fault-tolerant data ingestion for downstream analytics and machine learning models.Conducted in-depth analysis and performance tuning of Hadoop clusters and underlying machines to improve efficiency and resource utilization.Coordinated with multiple teams across the organization to collect statistical data, providing insights for performance optimization and capacity planning.Utilized Python for scripting and automating cluster maintenance and monitoring tasks.Leveraged Java for developing and maintaining components of Hadoop ecosystems and applications.Managed and monitored Hadoop clusters using Cloudera Manager to ensure smooth operations and optimal resource allocation.Worked extensively within a Linux environment to manage Hadoop clusters, utilizing shell scripting and system administration skills.Administered & optimized core components of the Hadoop ecosystem (Apache Hadoop/ HBase/ YARN/ Oozie/ Hive/ HDFS/ Spark/ Zookeeper), ensuring the reliability and scalability of distributed data processing.Sr. Data Engineer  May21  Sep23Chevron - San Ramon, CASummary: Collaborated in an Operation Research and Analytics project to provide Chevron the ability to process almost two million barrels of crude oil per day means an improvement of one cent per barrel is worth millions of dollars annually. I was part of this project to develop an analytics platform migration to consistently generate valuable business insights for the business to make data-driven decisions.Successfully migrated on-premises data to AWS, enhancing data accessibility and scalability to reduce data processing time by implementing real-time data analysis with Amazon Kinesis Data Analytics.Improved data quality and reliability through automated testing in CI/CD pipelines while enabling interactive reporting with visualization tools, enhancing data-driven decision-making.Achieved a unified metadata repository for streamlined data management using AWS Glue utilizing AWS S3 for streamlined data collection and storage, facilitating easy access to large datasets.Evaluated on-premises data infrastructure, identifying migration opportunities to AWS and orchestrated data pipelines with AWS Step Functions and utilized Amazon Kinesis for event messaging.Implemented automated ETL pipeline deployment using Jenkins and version-controlled code with Git.Automated end-to-end ETL processes using Python scripts, extracting data from multiple sources, transforming it into required formats, and loading it into data lakes or warehouses such as AWS S3, Snowflake, or Redshift.Developed and optimized PySpark scripts for data cleansing, transformation, and feature extraction on large-scale datasets, reducing data processing time by 30% while ensuring seamless integration with data lakes and warehouses.Utilized Apache Spark in a cluster environment to process and analyze petabyte-scale datasets, optimizing job execution using Spark's RDD and DataFrame APIs, improving query performance by 40%.I employed AWS Glue for data cleaning and preprocessing, while performing real-time analysis with Amazon Kinesis Data Analytics to leverage AWS services like EMR, Redshift, DynamoDB, and Lambda for scalable, high-performance data processing.Designed and monitored computing clusters using AWS Lambda, S3, Redshift, Databricks, and CloudWatch integrating testing into CI/CD pipelines to ensure data quality and infrastructure stability.Connected visualization tools (Tableau, Power BI) to Amazon Redshift for interactive reporting and managed tasks and tracked progress using Jira Kanban boards, adhering to Agile methodology.Configured monitoring, logging, and alerting solutions for cloud-based data infrastructure.Designed and managed scalable data warehousing solutions in Snowflake, optimizing storage and query performance for large datasets by partitioning and clustering data, improving query response times by 35%.Developed complex SQL queries, triggers, and stored procedures in PostgreSQL for data extraction, transformation, and reporting, ensuring optimal performance and data consistency in high-volume transactional systems.Built and optimized distributed data processing pipelines using Scala with Apache Spark, leveraging functional programming paradigms to efficiently handle large-scale data transformations and improve code maintainability.Sr. Big Data Engineer  Jan20 - May21First American Financial - Santa Ana, CASummary: This project involved the successful migration of data, implementation of automation, and optimization of data solutions, contributing to enhanced data management and analytics capabilities for First American Mortgage Solutions. We provided to our clients the current, accurate property, homeowner and mortgage data to deliver on their business goals.Architected scalable ETL pipelines using Apache Spark, Hadoop, and AWS Glue for data ingestion, transformation, and loading from multiple data sources into data lakes and data warehouses.Designed a scalable cloud-based architecture to accommodate growing data volumes and continuously monitored and optimized cloud infrastructure for cost-efficiency. Migrated and thoroughly tested data while ensuring data accuracy and completeness.Validated cloud-based solution performance and scalability and configured Jenkins for CI/CD, automating deployment tasks.Designed data pipelines using Azure (ADL Gen2, Blob Storage, ADF, Azure Databricks, Azure SQL, Azure Synapse) and AWS (Redshift, S3, Lambda) environments to develop consumer intelligence reports based on market research and social media data analytics.Utilized PySpark for data extraction, transformation, and cleansing from various file formats leveraging PySpark's capabilities for data manipulation, aggregation, and filtering, fully integrated with MongoDB and Snowflake to enrich existing data.Implemented streaming data processing using AWS Fully Managed Kafka, Spark Streaming, and DynamoDB managing Hadoop clusters on AWS EC2 and Cloudera distributions.Extracted, transformed, and loaded large-scale structured and unstructured datasets from various sources such as APIs, relational databases (MySQL, SQL Server), NoSQL databases (MongoDB, Cassandra), and data streams.Used Spark-SQL, Hive Query Language (HQL), and AWS Kinesis for client insights.Sr. Data Engineer  Jul18 - Dec19Burlington Stores Inc. - North Burlington, NJSummary: Worked along with many professionals from several areas in a transformative data engineering project for Burlington Stores Inc., enabling the company to successfully enter the stock market by harnessing the appeal of affordable clothing in a challenging economic landscape. This project enabled Burlington Stores Inc. to efficiently manage and analyze data, paving the way for successful entry into the stock market and capitalizing on the demand for affordable clothing during economic challenges.Integrated Python with big data tools and technologies like Apache Kafka, Hadoop HDFS, and Hive to process and analyze real-time and batch data, optimizing system performance and resource utilization.We defined an efficient data architecture that seamlessly integrated Snowflake, Oracle, and GCP services, optimizing data flow for analytics and configured AWS Lambda functions to automate PySpark ETL jobs, ensuring scheduled and error-handled data processing.Transformed ETL processes to leverage Hadoop's distributed file system (HDFS), enhancing data processing capabilities and scalability whike tuning PySpark jobs for optimal performance, enhancing cluster resource utilization.Leveraged Snowflake's optimization features to boost query performance for analytical workloads. Seamlessly transferred data between Hadoop and MySQL RDBMS using Sqoop, ensuring data synchronization.Implemented infrastructure provisioning with Terraform, ensuring consistency and repeatability across project stages. Defined Snowflake, Oracle, GCP resources, and configurations in code.Installed and configured Hive for data warehousing and developed custom Hive UDFs to meet specific business needs.Set up Cloud Storage buckets for raw and processed data, facilitating easy data sharing and backup and ntegrated Snowflake as the data warehousing solution for efficient storage, querying, and analytics.Also developed ETL pipelines to extract data from Oracle databases using CDC and batch processing utilizing Google Dataflow for real-time and batch data ingestion into Snowflake and BigQuery.Created a structured data model in BigQuery to support advanced analytics and reporting, developed SQL queries and views in BigQuery to derive actionable insights from integrated data sources.Sr. Hadoop Developer  Nov17  Jun18Old Republic International Corporation - Chicago, ILSummary: Developed a fraud detection system for property insurance using real-time analysis and classification algorithms. Enhanced data processing efficiency, managed Hadoop clusters, and ensured data security for seamless operations. The platform was used to help all our clients, ultimately boosting sector trust, addressing the problem of rising real estate costs, and reducing the effects of fraud.Optimized and integrated Hive, SQOOP, and Flume into existing ETL processes. Accelerated extraction, transformation, and loading of massive structured and unstructured data, improving data processing speed using Hive to simulate a data warehouse for client-based transit system analytics, handling highly unstructured and structured data efficiently.Worked with various data formats (JSON, XML, CSV, ORC) and implemented Hive partitioning and bucketing techniques.Installed, commissioned, and decommissioned data nodes. Configured slots and on-name node high availability. Executed cluster upgrades on staging platforms before production to minimize disruptions, implementing Kerberized authentication for the cluster, ensuring secure user access and authentication within the Hadoop environment.Developed and optimized Java-based MapReduce jobs for processing large-scale datasets in a Hadoop environment, enhancing data processing efficiency and reducing job execution time by 25% through custom partitioning and tuning.Proficiently handled Hadoop system administration using Hortonworks/Ambari and Linux system administration (RHEL 7, Centos). Conducted HDFS balancing, fine-tuning, and optimized MapReduce counters to improve data processing efficiency.Used Python with Apache Kafka and Spark Streaming to build real-time data processing pipelines, allowing for real-time monitoring and reporting of streaming data.Developed a comprehensive data migration plan for seamless integration of various data sources into the Hadoop system, facilitating unified data management leveraging Cassandra for JSON-documented data and HBase for region-based data storage.Conducted cluster capacity and growth planning, providing valuable insights for node configuration and resource allocation to accommodate future needs. Designed backup and disaster recovery methodologies for Hadoop clusters and related databases, ensuring data resiliency and business continuity.Hadoop & Data Engineer  Aug16 - Nov17Abbott Laboratories - Green Oak, IllinoisSummary: As a Hadoop & Data Engineer at Abbott Laboratories, I developed and maintained robust Hadoop architectures, optimizing ETL workflows with Apache Pig, Hive, and Spark for high-quality data processing. I managed real-time data ingestion using Apache Flume and Kafka, enabling timely analytics, while implementing performance tuning and security measures to ensure data integrity and compliance. Collaborating closely with data scientists, I provided structured datasets for machine learning projects, enhancing insights and decision-making across business units.Developed and maintained scalable Hadoop architectures, including HDFS and YARN, to enable efficient storage, processing, and analysis of large datasets across various business units.Created and optimized ETL workflows using Apache Pig, Hive, and Spark to extract, transform, and load data from diverse sources into Hadoop clusters, ensuring high data quality and availability for analytics.Managed data ingestion processes using Apache Flume and Kafka to stream data from various sources into the Hadoop ecosystem, enabling real-time analytics and reporting.Conducted performance tuning and optimization of Hadoop jobs and workflows, utilizing best practices for resource management, job scheduling, and query performance enhancement.Worked closely with data scientists and analysts to understand data requirements, providing clean, structured datasets for machine learning and data analysis projects, and facilitating data exploration.Established security protocols and compliance measures in Hadoop environments, utilizing Kerberos authentication and data encryption to safeguard sensitive information.Developed monitoring solutions for Hadoop clusters using tools like Cloudera Manager and Grafana, proactively identifying and resolving issues to ensure high availability and reliability of data pipelines.Created comprehensive documentation for Hadoop processes, data flow architectures, and best practices, enabling knowledge sharing and improving team collaboration.Provided training and mentorship to junior data engineers on Hadoop technologies, data modeling, and ETL processes, fostering a culture of continuous learning and development.Implemented data quality checks and validation procedures using Apache NiFi and Python, ensuring the accuracy and integrity of data processed within the Hadoop ecosystem.Implemented machine learning algorithms using PySpark's MLlib for predictive analytics, including clustering and classification, enabling more accurate insights from large datasets and improving model training efficiency by 20%.Designed and executed complex Apache Spark jobs for real-time data streaming using Spark Streaming and Kafka, processing and analysing live data feeds to support real-time decision-making with low-latency performance.Data Analyst  Jul14 - Jul16Virginia Commonwealth University - Richmond, VASummary: Data analytics efforts to develop AI models for generating representations of complex systems, enhancing predictive capabilities across various domains, including physics, ecology, chemistry, and business research. This project showcased strong skills in data engineering, data processing, and AI model preparation, contributing to enhanced predictive capabilities for VCU's research initiatives.Developed Python-based Alteryx notebooks for automated weekly, monthly, and quarterly reporting, streamlining data processing and analysis and orchestrated Spark Scala Datasets creation within Azure Databricks, defining schema structures through Scala case classes.Managed complex, long-running jobs in Azure Synapse Analytics, ensuring thorough pre-processing of product and warehouse data, resulting in clean and optimized data for downstream analysis applying Azure Stream Analytics to efficiently segment streaming data into batches for comprehensive processing in Azure Databricks.Wrote complex Python functions to clean, normalize, and transform large datasets, ensuring high data quality before loading into analytics platforms for further processing.Skillfully prepared data for machine learning modeling, addressing challenges with censored observations improving processing efficiency by repartitioning datasets after ingesting Gzip files, reducing processing time.Harnessed Azure Event Hubs for seamless data processing, ensuring uninterrupted data flow similar to Kafka Consumer. Leveraged Azure Databricks for Python-based solutions, using Data Frames and Azure Spark SQL API, accelerating data processing.Seamlessly interacted with Azure Data Lake Storage through Azure Databricks, efficiently processing stored data and demonstrating proficiency in populating data frames within Azure Databricks jobs, facilitating structured data loading into clusters.Transformed real-time data into a compatible format for scalable analytics using Azure Stream Analytics. Skillfully forwarded requests to source REST-based APIs from a Scala script via Azure Event Hubs Producer, streamlining data sourcing.EducationBachelors in Engineer of Science from Virginia Commonwealth University
Respond to this candidate
Your Message
Please type the code shown in the image: