| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Data EngineerMemphis, Tennessee, Street Address |Phone: PHONE NUMBER AVAILABLE | Email: EMAIL AVAILABLESUMMARY
Around 3 years of experience as a Data Engineer, specializing in a broad array of technologies, Including Hadoop for distributed data management, Spark for advanced analytics, Kafka for real-time processing, Hive and Snowflake for data warehousing, all integrated with cloud solutions across various industries. Proficient in Python, Scala, SQL and R for big data processing, robust application development, and data analysis. Employed advanced features and libraries like NumPy, Pandas, SciPy, Scikit-Learn, and TensorFlow for streamlined data manipulation and machine learning tasks. Orchestrated diverse databases (MySQL, Oracle, MongoDB), optimizing for performance, scalability, and reliability. Engineered enterprise applications using the Hadoop ecosystem, integrating HDFS, MapReduce, Hive, Pig, Sqoop, and Oozie. Demonstrated proficiency in AWS, utilizing services like EC2, S3, Lambda, VPCs, AWS Glue, Redshift, CloudFormation, Cloud Watch for scalable and cost-efficient data solutions. Optimized data warehousing solutions, encompassing Amazon Redshift, Snowflake, and traditional warehouses, emphasizing performance and cost-effectiveness. Crafted powerful analytical prototypes using Power BI, Power Pivot, Tableau, and Matplotlib. Visualized reports with Power View and Power Map. Applied diverse Machine Learning techniques and Predictive Modeling, including Linear Regression, Logistic Regression, Na ve Bayes, Decision Trees, Random Forest, KNN, Neural Networks, and K-means Clustering. Implemented Git, GitHub, Jenkins, and Bitbucket for version control. Orchestrated CI/CD processes for code quality and reliability.SKILLS
Programming Languages:Python, Scala, R, SQLLibraries:NumPy, Pandas, SciPy, Scikit-Learn, TensorFlow, MatplotlibDatabase Management:MySQL, Oracle, MongoDBBig Data Technologies:Hadoop (HDFS, MapReduce, Hive, Pig, Oozie), Apache Spark, Apache KafkaETL Processes:Talend, Apache NiFi, Informatica, Custom ETL PipelinesData Warehousing:Amazon Redshift, SnowflakeCloud Platforms:AWS (EC2, S3, Lambda, VPCs, AWS Glue, Redshift, CloudFormation, Cloud Watch)Machine Learning:
Regression, Decision Tree, KNN, SVM, Naive Bayes, Random Forest, K-Means, Statistics (Hypothesis testing, ANOVA), A/B Testing, NLP, Time Series Analysis
Version Control Systems:Git, GitHub, Jenkins, Bitbucket, CI/CDTools:Docker, Kubernetes, Apache AirflowData Visualization:Tableau, Power BI, ExcelMethodologies :Agile/Scrum, Waterfall
EDUCATION
Masters in Computer Science Aug 2022 May 2024University of MemphisBachelor of Technology in Computer Science Jul 2017 May 2021Karunya Institute of Technology and Science Coimbatore, IndiaWORK EXPERIENCESData Engineer Intern | R K Software LLC Jun 2024 Current Facilitated data-centric process automation using Python, Scala, and custom ETL pipelines. Achieved a 30% reduction in manual data handling, optimizing efficiency and reducing the risk of errors in data processing workflows. Optimized Map Reduce jobs for distributed data processing, ensuring efficient execution across Hadoop clusters. Tune job configurations, implement combiners, and apply parallelization techniques for enhanced performance. Implemented real-time data streaming using Apache Kafka, ensuring reliable and scalable data ingestion, which increased data throughput by 40%. Designed and managed Kafka topics, producers, and consumers for continuous data processing. Performed advanced query optimization and performance tuning within Snowflake, utilizing features like materialized views, clustering, and indexing. Address and resolve performance bottlenecks for efficient data processing.Data Engineer | Mphasis Jun 2021 Jul 2022 Crafted and enhanced Pig scripts for Extract, Transform, Load (ETL) workflows within Hadoop contexts. Executed data transformation and processing rationale to prime data for subsequent analytical applications. Orchestrated Oozie workflows for daily incremental data loads from RDBMS sources, managed through Control-M. Applied Pig scripts for data cleansing and partition management, resulting in a 20% reduction in data processing time and improved data accuracy. Developed and deployed machine learning models with Scikit-Learn for classification, regression, clustering, and dimensionality reduction. Applied advanced techniques such as hyperparameter tuning and feature engineering. Build a real-time monitoring and alerting system using Apache Kafka and AWS CloudWatch to track key metrics and system health, reducing incident response time by 40% and improving system uptime by 20%. Leveraged ETL tools including Talend, Apache NiFi, and Informatica to automate the extraction, transformation, and loading of data, guaranteeing seamless integration with subsequent systems in the data pipeline. Optimized the data warehouse architecture by incorporating best practices and leveraging cloud-based data warehousing solutions using Redshift, achieving a 40% improvement in query performance and reducing response times by 50%. Implemented CI/CD pipelines using tools like Jenkins and Bitbucket to automate the deployment of data processing code and infrastructure, resulting in a 50% reduction in deployment time and enhanced development agility. Applied statistical methods and data analysis techniques to identify anomalies and conducted root cause analysis, leading to a 15% improvement in data quality and reliability.Data Engineer Intern | Merizon Technologies LLC Mar 2020 May 2021 Created and executed sophisticated data cleansing and preprocessing methods in Python to maintain the integrity and quality of extensive datasets, effectively resolving issues such as missing values, outliers, and discrepancies. Utilized advanced SQL techniques to manipulate and transform data, including the use of window functions, common table expressions (CTEs), and complex joins, to derive valuable insights from diverse and interconnected datasets. Leveraged the full power of Pandas and NumPy to perform complex data manipulation tasks, including multi-dimensional array operations, efficient indexing, and handling missing data, which improved data processing efficiency by 50% and reduced data preparation time from days to hours. Developed and maintained ETL processes to extract, transform, and load data into MySQL databases from various sources, resulting in a 95% accuracy rate in data migration and a 30% reduction in data processing time. Enhanced data organization in Hive through strategic partitioning and bucketing and transformed Hive SQL queries into Spark processes leveraging Scala for advanced data analysis and operations. Automated routine data processing tasks using scheduling and workflow management tools like Apache Airflow and custom scripts, which increased operational efficiency by 70% and freed up 5 hours per week for the data engineering team to focus on strategic initiatives. Troubleshoot and resolve issues within data pipelines, ETL processes, and data integration points, using AWS CloudWatch and other monitoring tools to identify performance bottlenecks and data discrepancies. Established and enforced data governance policies, ensuring data quality, security, and compliance with industry regulations. Implement encryption, access controls, and auditing mechanisms to safeguard sensitive information. |