| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Charlotte, North Carolina Street Address PHONE NUMBER AVAILABLE EMAIL AVAILABLESummaryInnovative Big Data Engineer with 5 years of experience known for high productivity and efficiency in taskcompletion. Possess specialized skills in Hadoop ecosystem, Spark programming, and data modeling thatcontribute to solving complex data challenges. Excel in analytical thinking, problem-solving, andcommunication, leveraging these soft skills to effectively collaborate with cross-functional teams and deliverinsightful data solutions.SkillsApplication Development, Testing and Data Processing and Analysis: Apache Spark,Deployment Apache Kafka, AWS EMRProgramming languages: Python, Scala, SQL, Big data technologies: Hadoop, HBase, Spark,Java Hive, Scikit-learnDatabases: MySQL, PostgreSQL, MongoDB, Continuous integration and Performance TuningCassandra Data Warehousing: Amazon Redshift, GoogleETL Tools: Apache NIFI, Microsoft SQL Server, BigQuery, SnowflakeApache Airflow, Integration Services (SSIS), Version Control Systems: Git, GitHub, GitLab,Informatica, Talend, AWS Glue BitbucketData visualization tools: Tableau, Power BIExperienceSenior Big Data Engineer 02/2022 to CurrentGainwell technologies Irving, TXResearched advancements in technology related to Big Data processing, storage, and analytics.Contributed to the design and implementation of efficient solutions for managing large-scale Big Dataworkloads.Utilized ETL techniques to extract data from multiple sources and populate the target system throughcustomized jobs.Created and deployed NoSQL databases including Cassandra, MongoDB, and HBase for efficientstorage of large-scale data.Executed the development and implementation of Spark applications using Python and Scala.Developed and maintained data pipelines to ingest, store, process, and analyze large datasets in AWSS3 buckets.Conducted comprehensive testing for all components within the Big Data architecture.Identified potential problems with performance and scalability by monitoring the production systems.Automated deployment processes for deploying applications across various cloud environments,including containerized applications on OpenShift, resulting in streamlined deployment workflows.Optimized data handling capabilities by creating and deploying high-performance real-time applicationswith Kafka Streams or Spark Streaming.Implemented automated monitoring of data flows using CloudWatch and Lambda functions, integratedwith OpenShift for enhanced operational insights.Designed and managed data integration workflows using Apache NiFi to ensure seamless datamovement across different platforms.Implemented data governance and security policies to protect sensitive data and ensure compliancewith industry standards.Collaborated with cross-functional teams to integrate machine learning models into production datapipelines, leveraging tools such as AWS SageMaker and TensorFlow.Conducted root cause analysis and debugging of data pipeline issues, ensuring timely resolution andminimal impact on data processing operations.Engaged in continuous learning and professional development to stay current with emerging trends andbest practices in Big Data technologies and tools.Big Data Developer 06/2020 to 01/2022Premier Inc Charlotte, North CarolinaPerformed analysis of large datasets using complex SQL queries and advanced Python scripting (e.g.,Pandas, NumPy), identifying data patterns, trends, and anomalies to inform system requirements andbusiness strategies.Architected and implemented Azure Storage solutions, including Blob Storage for unstructured data,Azure Files for shared storage, Azure Queue for message queueing, and Table Storage for NoSQLstorage, ensuring high availability, redundancy, and cost-effectiveness for diverse applicationrequirements.Configured Kafka clusters with custom Zookeeper setups and developed custom consumer applicationsusing Kafka Streams API to facilitate real-time data ingestion and processing from various sources,ensuring low-latency and fault-tolerant data streaming solutions.Developed and deployed Big Data applications leveraging Hadoop ecosystem components (Hadoop,MapReduce, HDFS, Hive, Pig) and Apache Spark, designing and implementing ETL pipelines forprocessing and analyzing petabyte-scale datasets.Optimized SQL queries on relational databases such as Oracle, SQL Server, and MySQL by usingindexing, query rewriting, and partitioning techniques to improve query performance and reduce latency.Developed and implemented automation scripts for Azure services using PowerShell, Python, and Bash,automating cloud infrastructure tasks such as provisioning, configuration management, and deployment,utilizing Azure CLI and Azure DevOps.Debugged existing Java applications using integrated development environments (IDEs) like IntelliJ andEclipse, employing debugging tools, logging frameworks (e.g., Log4j), and performance profiling toidentify and resolve application bugs and performance bottlenecks.Automated deployment processes for Kafka clusters and custom consumer applications using CI/CDpipelines (e.g., Jenkins, GitLab CI/CD), leveraging containerization with Docker and orchestration withKubernetes to ensure scalable and reliable application deployments across various cloud environments.Implemented and maintained Azure Storage solutions with performance tuning and cost optimizationstrategies, including lifecycle management policies for Blob Storage, automated backup and restore forAzure Files, and monitoring with Azure Monitor and Azure Storage Analytics.Big Data Intern 06/2018 to 07/2019Avon Technologies Hyderabad, TelanganaLeveraged Agile methodologies to efficiently progress development lifecycle from initial prototyping toenterprise-quality testing and final implementation.Designed and executed advanced data pipelines to transfer both structured and unstructured data intoHDFS.Created and integrated specialized user-defined functions to expand the functionality of HiveQL queries.Developed and optimized algorithms to analyze and manage substantial data volumes from different filesystems.Successfully deployed and managed Apache Spark applications on YARN clusters for efficientexecution of distributed computing tasks.Fine-tuned parameters based on analysis to enhance the performance of MapReduce jobs.Analyzed big data sets using R and Python libraries including SciPy and NumPy.Achieved performance improvements in MapReduce jobs by optimizing Apache Hadoop clusters.Continuously monitored and adjusted system configurations to ensure optimal performance of dataprocessing tasks.EducationMaster of Science: Computer Science 12/2020Southeast Missouri State University Cape Girardeau, MOBachelor of Technology: Computer Science And Engineering 05/2019Jawaharlal Nehru Technological University IndiaACADEMIC PROJECTS:Designed an ETL pipeline using Apache NiFi and Talend to ingest, transform, and load retail data intothe Amazon Redshift data warehouse.Real-Time Data Processing System: A real-time data processing pipeline was designed with ApacheKafka and Apache Spark for the analysis of streaming social media data.Built a machine learning pipeline using Python and Scikit-learn for predictive analytics of studentoutcomes and integrated with Apache Airflow for data processing automation.LinkedInLINKEDIN LINK AVAILABLE |