Big Data Engineering Resume Manhattan, N...

Big Data Engineering Resume Manhattan, N...
Resumes | Register
Candidate Information
Title	Big Data Engineering
Target Location	US-NY-Manhattan
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Harrison, NJ
Big Data Engineering Fords, NJ
Senior Big Data Engineer Manhattan, NY
Data Engineer Big Edison, NJ
Data Engineer Big Mineola, NY
Senior Data / Cloud Engineer Manhattan, NY
Data Engineer Reliability Engineering West New York, NJ
Click here or scroll down to respond to this candidate
SHRAVYA SRI LinkedIn: LINKEDIN LINK AVAILABLEEmail: EMAIL AVAILABLE PH: PHONE NUMBER AVAILABLEPROFESSIONAL SUMMARY:Over 8 years of expertise in designing, developing, deploying, and maintaining Big Data Applications.Specialization in Data Engineering and Pipeline Design.Strong background in Waterfall and Agile SDLC methodologies.Proficient in Big Data technologies, including Spark, Hadoop, and NoSQL databases.Skilled in Python and Scala programming, with extensive experience in PySpark and Spark APIs.Expertise in ETL processes and data warehousing, with hands-on use of tools like Sqoop, Flume, Kafka, PowerBI, and Microsoft SSIS.Experienced in Natural Language Processing (NLP) techniques using libraries such as NLTK and SpaCY.Proficient in cloud services, including AWS (EC2, S3, EMR) and Azure (Data Lake, SQL Database, Data Factory).Strong problem-solving and leadership skills, with a proven track record in optimizing data systems and enhancing performance.Excellent communication and collaboration abilities, effective in cross-functional team environments.Experienced in developing and deploying large-scale, highly available web applications.Proficient in web services and integration patterns.Skilled in data analysis, data profiling, data integration, migration, data governance, and metadata management.Hands-on experience with reinforcement learning methodologies to optimize data engineering pipelines.Extensive experience in system analysis, E-R/Dimensional Data Modeling, database design, and implementing RDBMS specific features.Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline and GitHub.Proven ability to implement and enhance data-driven decision-making processes through the application of reinforcement learning and AI techniques.Adept at performance tuning, handling Slowly Changing Dimensions (SCD), and designing complex mappings.Expertise in developing user-defined functions (UDFs) in Python to extend Hive and Pig Latin functionality.TECHNICAL SKILLSData Modeling ToolsErwin Data Modeler, ER Studio v17, Snowflake, Dimension modeling, Tableau, PowerBIProgramming LanguagesSQL, PL/SQL, UNIX Shell Script, Python, Scala, R, Java, Pyspark, C, C++, Shell script, Perl, PostgreSQLMethodologiesRAD, JAD, System Development Life Cycle (SDLC), Microservices, Agile, WaterfallCloud PlatformAWS, Azure, Google Cloud Platform (GCP).Data Analysis ToolsBERT Algorithm, spaCY, NLTK, NLP, Sentiment AnalysisDatabasesOracle 12c/11g, Teradata R15/R14, MySQL, SQL server, MongoDB, Cassandra, DynamoDB, PostgreSQL, DB2, CosmosDBOLAP ToolsTableau, SSAS, Business Objects, Crystal Reports 9, ggplot2, matplotlibETL ToolsInformatica 9.6/9.1, Talend, SSISRDSSnowflake, Amazon Redshift, Google BigQuery, Microsoft Azure SQLOperating SystemWindows, Unix, Sun Solaris, Mac OSBig Data ToolsHDFS, Map Reduce, YARN, Hive, HBase, Kafka, PIG, Hadoop, Sqoop, Oozie, Apache Spark, Flume, NiFiMonitoring ToolAirflowFrameworksDjango REST framework, MVC.PROFESSIONAL EXPERIENCE:Data EngineerMaximus, Denver, CO September 2022 to PresentResponsibilities:Spearheaded comprehensive data cleansing and transformation initiatives across various formats using HiveQL and MapReduce techniques, ensuring data integrity and quality with SQL, PL/SQL, and UNIX Shell Scripting.Led collaborative efforts with stakeholders, driving Proof of Concepts (POCs) and documenting results meticulously to align technical implementations with business use cases, showcasing strong communication and project management skills.Optimized Hadoop performance with Spark technologies, leveraging Spark Context, Spark-SQL, Data Frames, and Spark YARN, proficiently coding in Python, Scala, and Java.Streamlined large file loading into HDFS with robust UNIX shell scripts, demonstrating expertise in Shell Scripting and Python for efficient data ingestion.Played a pivotal role in selecting the project tech stack, conducting end-to-end testing, and ensuring alignment with Agile/Waterfall methodologies and business objectives.Designed and implemented Python scripts for terabyte-scale CSV generation, enhancing data processing efficiency for Hadoop MapReduce jobs.Utilized Snowflake and Amazon Redshift for efficient data storage and query performance optimization, showcasing proficiency in Snowflake, AWS, and SQL.Applied Kafka analysis and feature extraction techniques using Apache Spark ML libraries, contributing to data-driven decision-making with expertise in Kafka, Apache Spark, and Python libraries.Developed and maintained Python code using GitHub and SVN for code reliability and efficient collaboration.Orchestrated on-premises application migration to AWS, ensuring scalability and reliability with EC2 and S3 services.Designed ETL data pipelines using Spark, optimizing workflows and ensuring data integrity with proficiency in ETL processes.Implemented monitoring solutions using CloudWatch and integrated Restful APIs for proactive issue resolution and automation, showcasing proficiency in CloudWatch and API integration.Analyzed SQL scripts and designed PySpark solutions for efficient data processing and analysis, ensuring optimal performance and reliability.Implemented audit logging mechanisms and resolved technical challenges to streamline data ingestion and analysis processes, demonstrating problem-solving skills.Adapted end results to requested formats for Source of Records (SORs), ensuring seamless data integration and usability, showcasing adaptability and attention to detailEnvironment: Spark, AWS, Python, Pandas, HiveQL, MySQL, Soap, snowflake, NIFI, Cassandra Spark SQL, Pyspark, Cloudera, HDFS, Hive, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins, Eclipse, Oracle, Git.Data EngineerUBS Weehawken, NJ May 2020 to August 2022Responsibilities:Led end-to-end data extraction, transformation, and loading (ETL) across various repositories, demonstrating expertise in ETL processes and best practices.Developed Spark applications in Scala, showcasing proficiency in Apache Spark and Scala programming for efficient data processing.Orchestrated Azure data ingestion and processing in Databricks, highlighting skills in Azure services and cloud-based data solutions.Implemented Apache Airflow for efficient data pipeline management, demonstrating expertise in workflow orchestration and automation.Designed and executed transformations for data messaging and migration, showcasing proficiency in data transformation techniques.Established high availability and failover mechanisms for production clusters, ensuring system reliability and uptime.Automated ETL pipelines and provided troubleshooting support, demonstrating automation skills and problem-solving abilities to ensure smooth data processing.Conducted unit tests for Spark transformations and pipeline design, ensuring the quality and reliability of data processing workflows.Developed batch jobs for processing data from multiple sources, demonstrating proficiency in batch processing and data integration.Utilized cloud technologies for automated analytics pipelines, demonstrating expertise in cloud computing and analytics for scalable solutions.Designed configurable data delivery pipelines for customer-facing stores, ensuring timely and accurate data delivery to end-users.Created ETL mappings and batch ingestion pipelines, ensuring seamless data integration and efficient data processing.Conducted performance tuning and optimization of queries, ensuring optimal performance of data processing workflows.Provided on-call support and collaborated with QA teams, ensuring smooth operation and quality assurance of data pipelines.Participated in daily meetings and project updates, ensuring effective communication and collaboration with team members.Analyzed system enhancements and impact for ETL changes, demonstrating analytical skills and a proactive approach to system improvement.Implemented Continuous Delivery pipelines, ensuring efficient deployment and delivery of data solutions as per business needs.Built scalable ETL processes and collaborated with stakeholders, ensuring scalability and effective stakeholder engagement throughout project lifecycles.Prepared documentation for specifications and testing, ensuring clear documentation and testing procedures for future reference and maintenanceEnvironment: Azure Data factory, U-SQL Azure Data Lake Analytics, Azure SQL, Azure DW, Databricks, GitHub, Docker, Talend Big Data Integration, Snowflake, Oracle, Sql Server, MySQL, No SQL, MongoDB, Hbase, Cassandra, Python- PySpark, Pytest, Pymongo, PyExcel, Psycopg, Matplotlib, NumPy and Pandas.Data Analyst /Data EngineerEdward Jones St. Louis, MO June 2018 to April 2020Responsibilities:Gathered business requirements and collaborated on the development of logical data models, showcasing skills in requirements gathering and data modeling.Created sophisticated visualizations and developed various types of reports using tools like Tableau and Matplotlib, highlighting data visualization and reporting skills.Conducted market sizing, competitive analysis, and business forecasting, demonstrating skills in market research and business analysis.Utilized Agile methodology and microservices architecture for data model implementation, showcasing proficiency in Agile practices and microservices.Performed reverse engineering using Erwin and translated business requirements into technical data requirements, demonstrating data analysis and technical translation skills.Developed Spark jobs using Scala for real-time analytics and utilized Spark SQL for querying, showcasing expertise in Spark and Scala programming.Developed and implemented machine learning models using Python and R, demonstrating skills in machine learning and predictive modeling.Conducted research on reinforcement learning and machine learning models, highlighting research skills and algorithm development.Implemented various statistical techniques for data manipulation and modeling, showcasing proficiency in statistical analysis.Designed and developed Python scripts for data preparation and transformation, demonstrating Python programming and scripting skills.Utilized market mix modeling and clustering techniques for data analysis and strategy development, showcasing skills in market analysis and strategy development.Used Grid Search and K-fold cross-validation for model evaluation and training, highlighting model evaluation and optimization skills.Developed models and algorithms using Python and Spark for analytic purposes, showcasing proficiency in data analytics and algorithm development.Conducted data cleaning, feature scaling, and engineering using Python packages, demonstrating skills in data preprocessing.Implemented various types of analysis on cleaned data using visualization techniques, showcasing data analysis and visualization skills.Processed image data through Hadoop distributed systems, demonstrating proficiency in Hadoop and big data processing.Designed dashboards and reports with Tableau for data visualization, showcasing skills in dashboard design and report generation.Utilized Waterfall methodology for project management and Git for version control, highlighting project management and version control skills.Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, TDD, Python, Hadoop, Azure, DynamoDB, Kibana, NOSQL, Sqoop, MYSQLAvon Technologies Pvt Ltd Hyd India November 2016 to March 2018Hadoop developerResponsibilities:Led requirement discussions and solution design sessions.Estimated Hadoop cluster requirements to ensure optimal performance.Selected appropriate Hadoop components including Hive, Pig,MapReduce, Sqoop, and Flume.Built scalable distributed data solutions using Hadoop technology stack.Setup Hadoop clusters and managed data ingestion using Sqoop.Imported streaming logs to HDFS via Flume, aggregating data from diverse sources such as web servers and network devices.Developed technical prototypes and implemented Hive and Pig use cases.Analyzed data using Hive, Pig, and custom MapReduce programs in Java.Implemented advanced techniques like partitioning and dynamic partitions in Hive for optimized data storage.Installed and configured Hive, Sqoop, Flume, and Oozie on Hadoop clusters.Scheduled Oozie workflows to execute Hive and Pig jobs efficiently.Tuned Hadoop clusters and monitored memory management and MapReduce jobs for optimal performance.Managed cluster maintenance tasks including node addition/removal, monitoring, and troubleshooting.Developed a custom framework to address small files issues in Hadoop environment.Administered large-scale Hadoop clusters comprising 70 nodes, in addition to smaller clustersEnvironment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (JDK 1.6), Eclipse.Couth InfoTech Pvt. Ltd, Hyderabad, India May 2015 to October 2016Java DeveloperResponsibilities:Contributed to all stages of application enhancement, including analysis, development, and testing.Prepared comprehensive High and Low-level design documents and facilitated Digital Signature generation.Developed logic and code for customer registration and enrollment validation.Designed and implemented web-based user interfaces using J2EE Technologies for seamless user interaction.Implemented client-side validations using JavaScript to enhance user experience and data integrity.Utilized Validation Framework for robust server-side data validations, ensuring data accuracy and security.Created comprehensive test cases for Unit and Integration testing to ensure application reliability.Integrated front-end components with Oracle database using JDBC API via JDBC-ODBC Bridge driver, facilitating efficient data management and retrieval.Environment: Java Servlets, JSP, JavaScript, XML, HTML, UML, Apache Tomcat, Eclipse, JDBC, Oracle 10g.EDUCATION:DEGREE: Bachelors in Computer ScienceUNIVERSITY NAME: CVR College of Engineering.LOCATION: Hyderabad, India.
Respond to this candidate
Your Message
Please type the code shown in the image: