Big Data Engineer Resume Irving, TX

Big Data Engineer Resume Irving, TX
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Big data engineer
Target Location	US-TX-Irving
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Denton, TX
Data Engineer Big Dallas, TX
Data Engineer Big McKinney, TX
Data Engineer Big Frisco, TX
Big Data Engineer Irving, TX
Real Estate Data Engineer Dallas, TX
Data Engineer Senior Plano, TX
Click here or scroll down to respond to this candidate
Candidate's Name
Email: EMAIL AVAILABLEPhone: PHONE NUMBER AVAILABLELinkdin: Https://LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARYOver 6 years of experience in the Hadoop/Spark ecosystem, including ingestion, storage, querying, processing, and analysis of big data using tools like Spark, MapReduce, HDFS, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Airflow, and Nifi.Extensive experience with AWS and Azure cloud platforms, with a strong understanding of Google Cloud Platform; skilled in deploying serverless applications and lambda functions in AWS.Expert in data warehousing techniques and dimensional modeling, with hands-on experience with Snowflake Multi-Cluster and Virtual Warehouses and migrating Teradata objects to Snowflake.Proven ability to build high-throughput ETL pipelines for data lakes using Apache Nifi, Spark SQL, Dataframe API, and other big data tools; developed complex ETL workflows and implemented MapReduce jobs for data processing.Strong knowledge and experience with real-time streaming technologies like Spark Streaming and Kafka, ensuring high performance and scalability for data processing applications.Proficient in Python, Shell Scripting, and Core Java, with experience in working with various Python libraries like NumPy, Pandas, and Matplotlib for data analysis and visualization.Hands-on experience with NoSQL databases such as Cassandra, HBase, and DynamoDB; implemented complex SQL queries and joins in relational and non-relational databases.Certified Databricks Professional with experience in data migrations and transformations using Databricks; skilled in configuring and managing Databricks clusters for optimal performance.Experienced with version control systems like Git and BitBucket; strong experience with Docker and Kubernetes for runtime environments, enabling efficient CI/CD workflows.Strong understanding of Agile, Scrum, Kanban, and Waterfall methodologies, with experience in managing projects using these frameworks to ensure timely and efficient delivery of solutions.Experienced in designing and implementing scalable data architecture solutions, ensuring optimal performance, reliability, and data integrity across diverse environments.Strong expertise in developing and optimizing data pipelines using Apache Airflow for scheduling and monitoring workflows.In-depth understanding of data visualization tools including Tableau and Power BI, with the ability to create interactive dashboards and reports for various business stakeholders.Skilled in utilizing Apache Kafka for building real-time data streaming applications, enhancing data integration and processing capabilities.Adept at performing data quality checks and implementing data validation frameworks to ensure accuracy and consistency across data pipelines.Knowledgeable in security best practices and data governance, including data encryption, role-based access control, and compliance with data protection regulations.Experienced in developing and deploying machine learning models using big data technologies to derive actionable insights and predictive analytics.Proficient in leveraging DevOps practices to streamline the development, testing, and deployment of data engineering solutions.Strong problem-solving skills with a focus on continuous improvement and innovation in data engineering practices.Excellent communication and collaboration skills, capable of working effectively with cross-functional teams to deliver high-quality data solutions.TECHNICAL SKILLS:Programming LanguagesSQL, Shell Scripting, Python, ScalaData Visualization toolsTableau Desktop, PowerBI, AlteryX, MicroStrategyNoSql DatabasesCassandra, HbaseRelational DatabaseSQL Server [SSIS,SSMS], ORACLE, PSql, MySQLCloud PlatformsAWS, Azure, GCP, DATBRICKS, SNOWFLAKEAWS Cloud PlatformsS3, Athena, Glue, Lambda, API Gateway, CloudWatch, SNS, EMR, DATBRICKS, Redshift, RDSOperating SystemsLinux, Windows, UnixBIG DATAHDFS, Apache Hive, Apache MapReduce, Apache Spark, YARN, Kafka,Apache NIFI, Oozie, Airflow, HbaseVersion ControlGIT, BitBucket, GitHub, GitLab, GitBash, Azure ReposEDUCATION:MS IN COMPUTER SCIENCE UNIVERSITY OF CENTRAL MISSOURI 2020 JAN -2021 MAYCERTIFICATIONS:DATABRICKS Certified Associate Developer for Apache SparkSNOWFLAKE SNOWPRO COREDATA ENGINEERING ESSENTIALSPROFESSIONAL EXPERIENCE:BLUE CROSS BLUE SHIELD, FLORIDA June 2022 PresentBig Data EngineerOptimized Spark job performance by implementing `repartition` and `coalesce` strategies to balance data distribution and reduce shuffle operations. Addressed data skew issues using salting techniques, leading to a 30% improvement in job execution time and more efficient resource utilization.Optimized Python based Spark job performance by leveraging `cache ` and `persist ` for frequently accessed data, and implementing advanced memory tuning techniques such as adjusting `spark.executor.memory`, `spark.driver.memory`, and `spark.memory.fraction`.Efficiently processed and transformed complex file formats, including Parquet, Avro, JSON, and XML, using Spark's robust data handling capabilities. Developed custom parsers and implemented schema evolution strategies to ensure data integrity and seamless integration, resulting in a 30% improvement in data processing speed.Configured and tuned AWS Redshift clusters, optimized Spark-Redshift connections for seamless data transfer, and conducted query optimization for enhanced performance.Deployed and monitored Python based Spark scripts in Databricks, including the creation and maintenance of Delta Live Tables for real-time data processing and consistency.Created and maintained workflows and job scheduling using Apache Airflow, including DAG creation with Python programming language, dependency management, and log monitoring for ETL jobs.Configured Databricks clusters to optimize performance by fine-tuning Spark configurations, adjusting executor memory and cores, and enabling adaptive query executionImplemented dynamic resource allocation and configured autoscaling policies to ensure efficient utilization of computational resources.Managed Databricks workloads effectively by scheduling jobs based on priority and resource requirements and utilizing cluster pools for job isolation and concurrency control. This ensured balanced resource distribution, prevented bottlenecks, and improved job throughput by 25%.Environments: Spark, S3, Redshift, Databricks, Python, Git, PowerBI, Airflow, Hive.US BANK, TEXAS June 2021 - June 2022Data EngineerDeveloped and optimized Spark jobs on Databricks to read data from Azure Blob Storage, implementing efficient data ingestion techniques and handling various file formats including CSV, JSON, and Parquet.Designed and implemented incremental data processing strategies to handle streaming and batch data, ensuring timely updates and minimizing reprocessing overhead.Configured Spark to write transformed data to Azure Synapse Analytics using the optimized DataFrame.write API, ensuring efficient data transfer and minimizing load times.Utilized PolyBase and bulk insert techniques to improve the performance of data loading into Azure Synapse Analytics.Administered and monitored Kafka brokers on Azure, ensuring efficient operation and stability by balancing load across brokers and tuning broker-level configurations for optimal throughput and low latency.Optimized Kafka producer and consumer performance by adjusting key configurations such as batch size, linger time, and fetch size. This resulted in reduced latency and improved data processing efficiency.Implemented message compression using Snappy and Gzip to reduce network bandwidth usage and storage requirements, enhancing Kafka throughput and performance.Achieved high throughput by configuring replication factors, in-sync replicas, and adjusting message acknowledgment settings, ensuring data durability and consistency while optimizing performance.Environments: Kafka, Spark, Python, Java, REST API, DATABRICKS, Azure Repos, Azure Synapse, Azure DataFactory.FIS GLOBAL, INDIA May 2017 - December 2019Data EngineerCreated a new data quality check framework project in Python that utilized Pandas, enhancing data validation and ensuring data accuracy across pipelines.Developed and managed AWS Glue pipelines that extracted data from various sources, transformed it according to business rules using Python scripts with PySpark, and consumed APIs to move data into Snowflake.Developed prototypes for Big Data analysis using Spark, RDD, DataFrames, and the Hadoop ecosystem with CSV, JSON, Parquet, and HDFS files, demonstrating the feasibility and scalability of Big Data solutions.Designed and developed data pipelines in S3 using boto3 for decryption and encryption of files, utilizing PySpark transformations for data processing.Automated task scheduling with Airflow Dags in Python for daily runs. Developed inbuilt Python connects for BOX Applications.Developed Snowflake Procedures to implement metadata configuration tables in Snowflake, resulting in million-dollar cost savings and reduced resource expenses through automation.Designed config files for parallel runs of ETL/ELT Pipelines in multiple env. Using YAML scripts.Experienced with deploying Pyspark scripts to EMR Cluster. View and Manage Spark Logs from EMR.Environments: Python, Airflow, Spark, BitBucket, Snowflake, AWS Glue, AWS S3.
Respond to this candidate
Your Message
Please type the code shown in the image: