Data Engineer Resume Greenwich, CT

Data Engineer Resume Greenwich, CT
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-CT-Greenwich
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Big Data Engineering Fords, NJ
Data Engineer Senior Piscataway, NJ
Data Analyst Engineer Manhattan, NY
Data Engineer C++ New York City, NY
Data Analyst/Engineer Jersey City, NJ
Data Engineer Analytics Whippany, NJ
Senior Data Engineer Manhattan, NY
Click here or scroll down to respond to this candidate
 Candidate's Name
Data EngineerMemphis, Tennessee, Street Address  |Phone: PHONE NUMBER AVAILABLE | Email: EMAIL AVAILABLESUMMARY
      Around 3 years of experience as a Data Engineer, specializing in a broad array of technologies, Including Hadoop for distributed data management, Spark for advanced analytics, Kafka for real-time processing, Hive and Snowflake for data warehousing, all integrated with cloud solutions across various industries.      Proficient in Python, Scala, SQL and R for big data processing, robust application development, and data analysis.      Employed advanced features and libraries like NumPy, Pandas, SciPy, Scikit-Learn, and TensorFlow for streamlined data manipulation and machine learning tasks.      Orchestrated diverse databases (MySQL, Oracle, MongoDB), optimizing for performance, scalability, and reliability.      Engineered enterprise applications using the Hadoop ecosystem, integrating HDFS, MapReduce, Hive, Pig, Sqoop, and Oozie.      Demonstrated proficiency in AWS, utilizing services like EC2, S3, Lambda, VPCs, AWS Glue, Redshift, CloudFormation, Cloud Watch for scalable and cost-efficient data solutions.      Optimized data warehousing solutions, encompassing Amazon Redshift, Snowflake, and traditional warehouses, emphasizing performance and cost-effectiveness.      Crafted powerful analytical prototypes using Power BI, Power Pivot, Tableau, and Matplotlib. Visualized reports with Power View and Power Map.      Applied diverse Machine Learning techniques and Predictive Modeling, including Linear Regression, Logistic Regression, Na ve Bayes, Decision Trees, Random Forest, KNN, Neural Networks, and K-means Clustering.      Implemented Git, GitHub, Jenkins, and Bitbucket for version control. Orchestrated CI/CD processes for code quality and reliability.SKILLS
Programming Languages:Python, Scala, R, SQLLibraries:NumPy, Pandas, SciPy, Scikit-Learn, TensorFlow, MatplotlibDatabase Management:MySQL, Oracle, MongoDBBig Data Technologies:Hadoop (HDFS, MapReduce, Hive, Pig, Oozie), Apache Spark, Apache KafkaETL Processes:Talend, Apache NiFi, Informatica, Custom ETL PipelinesData Warehousing:Amazon Redshift, SnowflakeCloud Platforms:AWS (EC2, S3, Lambda, VPCs, AWS Glue, Redshift, CloudFormation, Cloud Watch)Machine Learning:
Regression, Decision Tree, KNN, SVM, Naive Bayes, Random Forest, K-Means, Statistics (Hypothesis testing, ANOVA), A/B Testing, NLP, Time Series Analysis
Version Control Systems:Git, GitHub, Jenkins, Bitbucket, CI/CDTools:Docker, Kubernetes, Apache AirflowData Visualization:Tableau, Power BI, ExcelMethodologies :Agile/Scrum, Waterfall
EDUCATION
Masters in Computer Science 							                                    Aug 2022   May 2024University of MemphisBachelor of Technology in Computer Science 						                      Jul 2017   May 2021Karunya Institute of Technology and Science 						                              Coimbatore, IndiaWORK EXPERIENCESData Engineer Intern | R K Software LLC 					                  	   Jun 2024   Current      Facilitated data-centric process automation using Python, Scala, and custom ETL pipelines. Achieved a 30% reduction in manual data handling, optimizing efficiency and reducing the risk of errors in data processing workflows.      Optimized Map Reduce jobs for distributed data processing, ensuring efficient execution across Hadoop clusters. Tune job configurations, implement combiners, and apply parallelization techniques for enhanced performance.      Implemented real-time data streaming using Apache Kafka, ensuring reliable and scalable data ingestion, which increased data throughput by 40%. Designed and managed Kafka topics, producers, and consumers for continuous data processing.      Performed advanced query optimization and performance tuning within Snowflake, utilizing features like materialized views, clustering, and indexing. Address and resolve performance bottlenecks for efficient data processing.Data Engineer | Mphasis 					                                 		   Jun 2021   Jul 2022      Crafted and enhanced Pig scripts for Extract, Transform, Load (ETL) workflows within Hadoop contexts. Executed data transformation and processing rationale to prime data for subsequent analytical applications.      Orchestrated Oozie workflows for daily incremental data loads from RDBMS sources, managed through Control-M. Applied Pig scripts for data cleansing and partition management, resulting in a 20% reduction in data processing time and improved data accuracy.      Developed and deployed machine learning models with Scikit-Learn for classification, regression, clustering, and dimensionality reduction. Applied advanced techniques such as hyperparameter tuning and feature engineering.      Build a real-time monitoring and alerting system using Apache Kafka and AWS CloudWatch to track key metrics and system health, reducing incident response time by 40% and improving system uptime by 20%.      Leveraged ETL tools including Talend, Apache NiFi, and Informatica to automate the extraction, transformation, and loading of data, guaranteeing seamless integration with subsequent systems in the data pipeline.      Optimized the data warehouse architecture by incorporating best practices and leveraging cloud-based data warehousing solutions using Redshift, achieving a 40% improvement in query performance and reducing response times by 50%.      Implemented CI/CD pipelines using tools like Jenkins and Bitbucket to automate the deployment of data processing code and infrastructure, resulting in a 50% reduction in deployment time and enhanced development agility.      Applied statistical methods and data analysis techniques to identify anomalies and conducted root cause analysis, leading to a 15% improvement in data quality and reliability.Data Engineer Intern | Merizon Technologies LLC 					              Mar 2020   May 2021      Created and executed sophisticated data cleansing and preprocessing methods in Python to maintain the integrity and quality of extensive datasets, effectively resolving issues such as missing values, outliers, and discrepancies.      Utilized advanced SQL techniques to manipulate and transform data, including the use of window functions, common table expressions (CTEs), and complex joins, to derive valuable insights from diverse and interconnected datasets.      Leveraged the full power of Pandas and NumPy to perform complex data manipulation tasks, including multi-dimensional array operations, efficient indexing, and handling missing data, which improved data processing efficiency by 50% and reduced data preparation time from days to hours.      Developed and maintained ETL processes to extract, transform, and load data into MySQL databases from various sources, resulting in a 95% accuracy rate in data migration and a 30% reduction in data processing time.      Enhanced data organization in Hive through strategic partitioning and bucketing and transformed Hive SQL queries into Spark processes leveraging Scala for advanced data analysis and operations.      Automated routine data processing tasks using scheduling and workflow management tools like Apache Airflow and custom scripts, which increased operational efficiency by 70% and freed up 5 hours per week for the data engineering team to focus on strategic initiatives.      Troubleshoot and resolve issues within data pipelines, ETL processes, and data integration points, using AWS CloudWatch and other monitoring tools to identify performance bottlenecks and data discrepancies.      Established and enforced data governance policies, ensuring data quality, security, and compliance with industry regulations. Implement encryption, access controls, and auditing mechanisms to safeguard sensitive information.
Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: