| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateOver 6+ years of experience as a Data Engineer with demonstrated expertise in building and deployingdata pipelines using open-source Hadoop-based technologies such as Apache Spark, Hive, Hadoop, Python, JAVA, and PySpark.Monitored the performance of data pipelines and conducted regular optimizations, using Java, Spark, and other relevant technologies, to enhance the efficiency and speed of processing. Demonstrated a strong track record of project planning, execution, and completion, overseeing numbers of data engineering projects from inception to delivery. Assessed data quality and ensured data integrity during the AWS migration process. Created complex data workflows using AWS Glue's serverless architecture. Implemented end-to-end data pipelines by integrating Glue Jobs with AWS Lambda functions and other AWS services. Experienced in building and deploying Spark applications on Hortonworks Data Platform and AWS EMR. Proficient in creating scalable data frames in Spark, Spark Streaming, using Python. Experienced in working with Snowflake cloud data warehouse and Snowflake Data Modeling. Built ELT workflow using Python, JAVA and Snowflake COPY utilities to load data into Snowflake. Proficient in orchestrating scalable data solutions using Databricks within AWS ecosystems, leveraging Spark to enhance data processing and machine learning initiatives. Proficient in orchestrating scalable data solutions using Databricks within AWS ecosystems, leveraging Spark to enhance data processing and machine learning initiatives. Skilled in optimizing Oracle database performance, with a strong foundation in PL/SQL for complex data manipulation, stored procedures, and transactional database management. Solid background in Hadoop-based technologies, proficient in managing big data infrastructures and performing large-scale data analytics to drive business intelligence. Proven ability in developing and maintaining robust data pipelines, incorporating best practices in data quality and integrity across AWS cloud migrations. Managed and optimized large-scale data infrastructures on Hadoop and AWS EMR, ensuring high availability and fault tolerance.Leveraged Databricks on AWS for scalable data solutions, enhancing data processing capabilities and supporting machine learning initiatives.Implemented automated testing frameworks and CI/CD pipelines for data pipelines and infrastructure as code (IAC)Conducted performance tuning of SQL queries, Spark jobs, and ETL processes to optimize processing speed and system performance.Led successful data migration projects, ensuring seamless transitions between on-premises systems and AWS cloud environments.Created technical documentation and conducted knowledge-sharing sessions to disseminate best practices and empower team members.Designed and implemented data archival strategies to optimize storage costs and maintain data accessibility.SUMMARYSAI TEJA REPALAIrving, TX PHONE NUMBER AVAILABLE EMAIL AVAILABLE LINKEDIN DATA ENGINEERTECHNICAL SKILLSBig Data EcosystemsHDFS and MapReduce, Pig, Hive, Pig Latin, Apache Spark, Apache Kafka, DatabricksCloud TechnologiesAWS, GCPScripting LanguagesPython, Visual Basic Scripting, Windows Power Shell Programming LanguagesScala, Java, J2EE, JDK PHONE NUMBER AVAILABLE, JDBC, XML ParsersDatabasesMongoDB, Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, MYSQL, ORACLE, DB2, SNOWFLAKE, Teradata, Cassandra IDEs / ToolsEclipse, Anaconda Navigator,Maven, MS Visual StudioSr. Data Engineer May 2023 - CurrentAmerican Express, Irving, TXImplemented scalable data pipelines in AWS using EMR, EC2, and Glue to processmulti-terabyte data sets, achieving significant reductions in processing time. Designed and developed data transformations using PySpark on AWS Databricks, enhancing the analytics capabilities for financial datasets.Created a real-timedata ingestion system with AWS Lambda and S3, which facilitated efficient data storage and processing.Developed Python scriptsfor automated data quality checks,integrating with AWS services to ensure compliance and data integrity.Designed data lake solutions on AWS to support compliance and reporting requirements, using PySpark and Scala for data aggregation.Developed data visualization tools using Scala on AWS Quicksight to provide actionable insights into credit risk management.Developed an automated monitoring system using AWS CloudWatch and Lambda, which proactively manages, and scales data processing resources based on real-time analytics workload, ensuring optimal performance and cost efficiencyDesigned and executedcomplex SQL querieson AWS Redshift to performdata analysis and reporting, which supported strategic decision-making by providing deep insights into customer behaviors and market trends.Developed an automated testing framework usingPython to validatedata integrity and accuracy across multiple data pipelines, enhancing the reliability of data transformations and load processes PROFESSIONAL EXPERIENCEData Engineer Mar 2020 - May 2022Bank of America, Bangalore, IndiaExperience in working with Azure cloud platform (HDInsight, Databricks, DataLake, Blob Storage, Data Factory, Synapse, SQL DB, SQL DWH, and Data Storage Explorer). Involved in building an Enterprise Data Lake using Data Factory and Blob storage, enabling other teams to work with more complex scenarios and ML solutions. Used Azure Data Factory, SQL API, and Mongo API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).Developed Pyspark scripts for mining data and performed transformations on large datasets to provide real-time insights and reports.Supported analytical platform, handled data quality, and improved the performance using Pythons higher-order functions, lambda expressions, pattern matching and collections. Performed data cleansing and applied transformations using DataBricks and Spark data analysis. Used Azure Synapse to manage processing workloads and served data for BI and prediction needs. Designed and automated Custom-built input adapters using Spark, Sqoop, and Airflow to ingest and analyze data from RDBMS to Azure Datalake.Reduced access time by refactoring data models, query optimization, and implementing Redis cache to support Snowflake.Involved in developing automated workflows for daily incremental loads, moved data from RDBMS to Data Lake.Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from MS SQL to CosmosDB and improved the query performance. Created Automated ETL jobs in Talend and pushed the data to Snowflake data warehouse. Managed resources and scheduling across the cluster using Azure Kubernetes Service. Used Azure DevOps for CI/CD, debugging, and monitoring jobs and applications. Used Azure Active Directory and Ranger for security.Worked with the data science team to do preprocessing and feature engineering and assisted machine learning algorithm running in production.Fine-tuned parameters of Spark NLP applications like batch interval time, level of parallelism, and memory tuning to improve the processing time and efficiency. Facilitated data for interactive Power BI dashboards and reporting purposes. AWS Data Engineer Aug 2018 - Feb 2019Accenture, Hyderabad, IndiaDesigned and implemented a multi-tier data architecture on AWS, leveraging S3, Redshift, and RDS for high-volume data analytics.Created ETL frameworks using Python, integrated with AWS Lambda for automateddata handling from various data sources.Developed advanced analytics models on AWS EMR using Spark, providinginsights into transportation patterns and customer behavior.Implemented AWS Databricks for data aggregation, improving data qualityand preparation for machine learning applications.Designed real-time data ingestion systemswith AWS Kinesisand Lambda, optimizing data flows for immediate analysis.Developed a Scala-based real-time recommendation engine,leveraging AWS technologies to optimize ride-sharing matches.Implemented a data reconciliation framework using AWS Glue and Python, which ensured data accuracy and consistency across different storage platforms, significantly reducing discrepancies in reporting and analytics.Created automation scripts in Python to streamline the ETL processes, reducing the time required for data extraction, transformation, and loading by 30% while ensuring data consistency. Utilized SQL for complex data querying and management tasks, optimizing database performance and enabling more efficient data analysis and reporting across cloud-based and on-premises environments.Bachelor of Computer Science and Technology Jul 2014 - Jul 2018 St Peters Engineering CollegeMasters in Information Technology Sept 2022 - May 2024 Franklin UniversityMajor in Cyber SecurityEDUCATION |