| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Tel: PHONE NUMBER AVAILABLE E: EMAIL AVAILABLE LINKEDIN LINK AVAILABLESUMMARYExperienced Data Engineer with over 3 years specializing in cloud platforms, particularly AWS and Azure environments.Proficient in data pre-processing techniques such as data cleansing, correlation analysis, imputation of missing values, visualization, feature normalization, and dimensionality reduction, utilizing libraries like Pandas, NumPy, and Scikit-learn in Python.Skilled in Python and SQL programming, adept at managing large-scale datasets with distributed computing technologies, including Apache Spark.Developed and optimized ETL workflows to facilitate smooth data extraction, transformation, and loading processes for comprehensive analytics and reporting.Implemented and deployed machine learning algorithms, automating data-driven insights to enhance decision-making capabilities.Extensive background with Python, R, MATLAB, SAS, and PySpark for performing advanced statistical analysis and quantitative research.Knowledgeable in Big Data technologies, such as Hadoop (HDFS and MapReduce), Hive, Sqoop, Pig, and Apache Spark.Proficient in Agile methodologies, actively contributing to Scrum teams and using tools like Jira, ProjectLibre, and Git for project management and version control.SKILLSETL Tools: AWS Glue, Apache Nifi, Talend, DataStageData Processing: Apache Spark (including SparkSQL), KafkaData Integration: Apache Nifi, Talend, DataStageLanguages: Python, SQLBig Data Frameworks: PySpark, ScalaAWS: EC2, S3, Redshift, EMR, Lambda, GlueAzure: Azure Data Factory, Azure SQL DatabaseGCP: BigQueryAlgorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, PCALibraries: Scikit-Learn, TensorFlow, KerasSQL Databases: SQL Server, MySQL, PostgreSQLNoSQL Databases: MongoDB, CassandraTools: Tableau, Power BIIDEs: PyCharm, Jupyter Notebook, SpyderVersion Control: Git, GitHubPlatforms: Linux, Unix, WindowsHadoop Ecosystem: HDFS, Hive, Pig, SqoopPROFESSIONAL EXPERIENCEData Engineer Jan 2024 - PresentGoldman Sachs Global Remote, USAOptimized data ingestion workflows using Apache Spark and Delta Lake, resulting in a 40% improvement in data processing speed and enhanced system performance.Deployed predictive models like Decision Trees to analyze user behavior on media platforms, successfully identifying high-value users. Utilized frameworks like PyTorch and Keras for building and fine-tuning classification models.Designed and implemented scalable ETL pipelines with Azure Synapse and Databricks, efficiently processing over 5TB of data daily with minimal downtime.Enhanced model precision by employing ensemble techniques like stacking and boosting, deploying models in AWS SageMaker for real-time predictions in production environments.Streamlined SQL query performance and reporting processes using MySQL and Tableau, achieving a 25% reduction in query execution times and a 20% faster report generation cycle.Automated data pipeline orchestration with Apache NiFi, improving task scheduling and cross-environment workflow reliability, boosting overall operational efficiency.Transitioned legacy ETL systems to a cloud-based architecture using Snowflake, improving scalability by 30% while reducing operational costs.Implemented advanced data visualization techniques into BI dashboards, enabling real-time monitoring and improved business decision-making.Achieved a 15% increase in revenue through more accurate customer segmentation and targeting, leveraging machine learning insights for personalized marketing strategies.Reduced operational costs by 20% by automating and optimizing cloud-based data workflows, resulting in increased cost-efficiency and better resource allocation.Improved SQL query performance and optimized database schema, resulting in a 35% enhancement in query execution speed and quicker data retrieval.Data Engineer Jan 2020 - Jul 2022Coforges IndiaPerformed an in-depth evaluation of big data needs to design and implement ETL and BI solutions tailored to organizational goals.Optimized data integration workflows in Snowflake, reducing data duplication and facilitating real-time data ingestion from diverse sources into HDFS via Kafka.Created a comprehensive data warehouse schema in Snowflake, integrating over 100 datasets to improve data accessibility and coherence.Oversaw the ingestion and processing of structured and semi-structured data on AWS with S3 and Python, enabling the transition of on-premises big data systems to a cloud environment.Managed the extraction and processing of live data streams using Kafka and Spark Streaming, converting them into RDDs and DataFrames, and storing the processed data in Parquet format within HDFS.Developed and implemented RESTful APIs using Python, simplifying data access and boosting reporting efficiency, which led to a significant increase in operational productivity.Conducted thorough big data requirement analysis to design and execute ETL and BI solutions that met strategic business objectives.Utilized Pandas and NumPy for meticulous data cleansing, addressing issues like missing values and ensuring high data quality, consistency, and integrity.Built and deployed machine learning models including Logistic Regression, K-Nearest Neighbors, and Gradient Boosting with Python libraries such as Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn.Leveraged AWS to execute machine learning models on extensive datasets, optimizing model performance and scalability.Engineered and enhanced Spark-based Python modules for machine learning and predictive analytics within a Hadoop ecosystem, driving more accurate insights.Reduced data redundancy and incorporated real-time data feeds into HDFS through Kafka using Snowflake, designing a robust data warehouse architecture with over 100 datasets.Managed real-time data processing with Kafka and Spark Streaming, transforming and storing data in Parquet format within HDFS for efficient, scalable storage solutions.EDUCATION & CERTIFICATIONUniversity of Maryland, Baltimore County Masters in Data Science. Malla Reddy Engineering College Bachelors in Information Technology Microsoft Certified: Azure Data Engineer Associate (DP-203) |