| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Data EngineerSai SrinivasaEMAIL AVAILABLEPHONE NUMBER AVAILABLEPROFESSIONAL SUMMARY:
Over 5 years of IT experience with deep expertise in the Big Data ecosystem, including data acquisition, ingestion, modeling, storage analysis, integration, and processing. Azure Cloud Proficiency: Extensive experience with Azure Cloud services, including Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, and other tools for managing and transforming both structured and unstructured data. Data Pipeline Development: Skilled in creating and managing data pipelines using Azure Data Factory and Azure Databricks, and proficient in loading data into Azure Data Lake and Azure SQL Database/Data Warehouse with user access controls. AWS Experience: Proven experience with AWS services such as EC2, S3, EMR, Redshift, Lambda, and others, focusing on data storage, processing, and application development. Data Visualization: Proficient in using PowerBI, Tableau, and AWS QuickSight to create visualizations and reports, providing actionable insights from data. Database Management: Expertise in setting up and managing Azure SQL databases, performing migrations from Microsoft SQL Server to Azure SQL Database/Managed Instances, and overseeing database performance and maintenance. ETL and Data Ingestion: Hands-on experience with AWS services like Athena, Redshift, and Glue for managing and analyzing internal business data, and responsible for storing data on S3 with Lambda functions and AWS Glue using PySpark. Batch Ingestion and Frameworks: Skilled in batch ingestion processes for Snowflake and familiar with distributed frameworks such as Apache Spark and Presto on Amazon EMR, interacting with AWS data storage solutions like S3 and DynamoDB. Automation and Scripting: Experienced in automating ETL jobs using Apache Airflow, and proficient in Python and shell scripting for Spark job and Hive script automation. SQL and Data Modeling: Adept in Microsoft SQL Server database programming and ETL development using SSIS, SSRS, and SSAS, with strong skills in E-R and dimensional data modeling. Transformation and Formats: Capable of converting Hive/SQL queries into PySpark transformations and experienced with various data formats including JSON, Avro, Parquet, and CSV. Integration and Performance: Experienced in developing Spark applications using Python and Scala, comparing Spark performance with Hive, and integrating with Hive and HBase/MapRDB. Version Control and Incident Management: Skilled in using Git, SVN, Jira, ServiceNow, and Remedy for version control and incident tracking. Agile Methodology: Adept at working within Agile frameworks, including bi-weekly sprints and presenting progress through internal and external demonstrations. Project Management: Proven ability to manage multiple projects simultaneously, with a strong desire to learn and adopt new technologiesTOOLS AND TECHNOLOGIES:Hadoop/Big Data TechnologiesHadoop (MapReduce, HDFS, YARN), Oozie, Hive, Sqoop, Spark (PySpark, Spark SQL), Apache Nifi, Zookeeper and Cloudera Manager, Horton Works.Azure Cloud ServicesAzure Data Factory, Azure Synapse Analytics, Data Lake, Blob Storage, HDInsight, Azure Databricks, Azure Data Analytics, Azure functions, Azure Key Vault, Azure SQL DatabaseAWS Cloud ServicesEC2, EMR, Redshift, S3, Databricks, Athena, Lambda, Glue, AWS Kinesis, Cloud Watch, SNS, SQS, SESNO SQL DatabaseHBase, Dynamo DB, Mongo DBETL/BIPowerBI, Tableau, Snowflake, Informatica, Talend, SSIS, SSRS, SSAS, QlikView, Qlik Sense.Hadoop DistributionHorton Works, Cloudera.Programming & ScriptingPython, Scala, SQL, Shell Scripting, KafkaFile FormatsJson, Avro, Parquet, CSVOperating systemsLinux (Ubuntu, Centos, RedHat), Windows (XP/7/8/10).DatabasesOracle, MY SQL, Teradata, PostgreSQL, SQL Server.Version Control & CI/CDBitbucket, Git Lab, Git Hub, SVN, Azure DevOps, Apache AirflowEducation:Master s in Information Technology -Wilmington UniversityPROFESSIONAL EXPERIENCE:
PwC, Kansas City, MO Dec 2021 - PresentAzure Data EngineerDescription:
To design and implement a comprehensive data analytics platform that consolidates disparate data sources, enhances data accessibility, and provides advanced analytics capabilities for PwC s clients. This platform aims to enable data-driven decision-making by integrating various data sources into a unified system, leveraging advanced analytics tools, and ensuring data governance and security.Roles & Responsibilities: Integrated data from various On-Premises and external sources into Azure Data Lake, enhancing data accessibility and performance in Azure SQL Data Warehouse. Created Databricks notebooks to extract data from systems like DB2 and Teradata. Conducted data cleansing, wrangling, ETL processes, and loaded the processed data into Azure SQL Database. Designed and built data pipelines using Azure Data Factory, connecting to various databases through JDBC connectors. Managed ingested data with Azure Databricks and automated workflows using Python to enable multiple data loads and boost parallel processing capabilities. Facilitated the migration of data from Microsoft SQL Server to Azure SQL Database. Developed ETL pipelines to transfer data from Blob Storage to Azure Data Lake Gen2 using Azure Data Factory (ADF). Utilized RDD transformations for streaming analytics in Databricks, employing Spark Streaming to process ingested data. Leveraged Spark-SQL for data extraction, transformation, and aggregation from diverse file formats for in-depth analysis and transformation. Configured ingestion and orchestration pipelines in Azure Data Factory, setting up email alerts for trigger failures to assist in monitoring both Development and Production environments (CI/CD). Automated data processing tasks using Azure Data Factory and utilized the ingested data for analytics in Power BI. Implemented scalable data solutions using serverless Azure services, including Azure Data Factory, Synapse, Databricks, Azure Functions, and traditional SQL resources. Developed data warehousing solutions with Azure Data Analytics and Azure Synapse Analytics, utilizing Apache Spark pools and Synapse to manage and move data at scale. Built Spark applications using Python libraries like PySpark, Numpy, and Pandas for data transformations within Databricks and Azure Functions. Integrated Azure Key Vault with Azure services such as Data Factory, Databricks, and Azure Functions to securely manage and access data within pipelines and workflows. Created PySpark scripts to flatten complex nested JSON files for ingestion into raw tables. Worked on dimensional modeling using star and snowflake schemas, and managed slowly changing dimensions. Developed stored procedures in Snowflake for loading dimension and fact tables, and created views based on business requirements. Participated in an Agile development team, adhering to Agile methodologies and principles through bi-weekly sprints.Environment: Azure Data Factory (ADF v2), Azure Databricks, Azure Data Lake, MS-Azure, Azure SQL Database, Azure functions Apps, BLOB Storage, SQL server, UNIX Shell Scripting, ADLS Gen 2, Azure Cosmos DB, Azure Event Hub, Kafka, Spark Streaming, SQL , Agile Methodology, Snowflake.Varo Bank, San Francisco, CA Aug 2019 Nov 2021Data EngineerDescription:To build a real-time fraud detection system that identifies and mitigates fraudulent transactions as they occur. This system aims to protect Varo Bank s customers and minimize financial losses by leveraging data-driven insights to detect unusual patterns and anomalies.Roles & Responsibilities: Performed data transformations in Hive, utilizing partitioning and bucketing to optimize performance. Managed and worked with various Hadoop components, including HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark, and MapReduce. Configured and monitored cluster resources using Cloudera Manager, and utilized Search and Navigator for cluster management. Created external Hive tables for data consumption, storing data in HDFS using formats such as ORC, Parquet, and Avro. Built ETL pipelines using Apache PySpark, leveraging Spark SQL and DataFrame APIs for data processing. Analyzed and worked with Hadoop ecosystem tools like Pig, Hive, HBase, Spark, and Sqoop for big data analytics. Utilized Sqoop to load data from relational databases and dynamically generated files into the
Implemented partitioning, dynamic partitions, and bucketing in Hive to enhance data organization and performance. Developed Hive Query Language (HQL) queries, mappings, tables, and external tables for comprehensive data analysis, including optimization and execution. Monitored and managed the Hadoop cluster continuously with Cloudera Manager. Successfully migrated data from on-premises systems to AWS EMR and S3 buckets using shell scripts. Executed Python scripts for large-scale data transformations within AWS Kinesis. Created reusable mappings with worklets and mapplets for efficient data transformation. Automated data movement between various components using Apache NiFi. Loaded data from multiple sources, including SQL, DB2, and Oracle, into HDFS with Sqoop, and organized it in Hive tables. Migrated data from Teradata into HDFS using Sqoop.Environment: Hive 2.3, Pig 0.17, Python, HDFS, Hadoop 3.0, HDFS, AWS, NoSQL, Sqoop 1.4, Oozie, Power BI, Agile, OLAP, Sqoop, Cloudera Manger, ORC, Parquet, Avro etc.T- Mobile, Atlanta, GA Jun 2018 - Aug 2019Data Engineer
Description:To develop a real-time network performance monitoring system that tracks and analyzes T-Mobile s network performance metrics. The goal is to enhance network reliability, quickly identify and resolve issues, and optimize network performance using real-time data analytics.
Roles & Responsibilities: Designed and implemented data warehouses and data lakes using high-performance databases such as Oracle, SQL Server, and big data platforms like Hadoop (Hive and HBase). Engineered and deployed scalable applications on Hadoop/Spark. Crafted ad-hoc SQL queries using joins, database connections, and transformation rules to retrieve data from legacy DB2 and SQL Server databases. Translated business requirements into logical and physical data models for OLTP and OLAP systems. Developed BTEQ, Fast Export, Multi Load, TPump, and Fast Load scripts for extracting data from various production systems. Reviewed stored procedures for reporting and wrote test queries to validate results against DataMart (Oracle) compared to source systems (SQL Server-SSRS). Conducted data profiling and preliminary analysis to manage anomalies such as missing values, duplicates, outliers, and irrelevant data. Utilized Proximity Distance and Density-based techniques to remove outliers. Analyzed, designed, and implemented solutions based on business user requirements. Applied supervised, unsupervised, and regression techniques to build models. Performed market basket analysis to identify asset groupings and provided recommendations on associated risks. Developed ETL procedures and data conversion scripts using Pre-Stage, Stage, Pre-Target, and Target tables. Built data pipelines using advanced Big Data frameworks and tools. Extracted relevant features from datasets to handle bad, null, and partial records using Spark SQL. Stored data frames in Hive tables using Python (PySpark). Ingested data into HDFS from relational databases like Teradata using Sqoop, and exported data back to Teradata for storage. Developed various Spark applications using Spark-shell (Scala). Implemented partitioning, dynamic partitions, and bucketing in Hive. Executed Hive queries on ORC tables to analyze data according to business needs. Used Spark to process data from HDFS and store it back into HDFS. Developed a Python script to load CSV files into AWS S3 buckets, managed bucket folders, logs, and objects. Created Airflow scheduling scripts in Python to automate the extraction of data across a wide range of datasets. Managed file movements between HDFS and AWS S3, extensively working with S3 buckets. Developed Spark SQL scripts using Python for efficient data processing. Used Sqoop to extract data from warehouses and SQL Server, and loaded it into Hive. Applied the Spark framework to transform data for analytical applications. Scheduled Oozie workflows to automate Hive jobs and data loading into HDFS, as well as pre-processing with Spark. Conducted exploratory data analysis using R and generated graphs and charts with Python libraries. Implemented SQL Server tasks using SQL Developer, and managed continuous integration and automation with Jenkins. Developed Service-Oriented Architecture (SOA) using JMS for web service messaging. Executed multiple business plans and projects, ensuring business needs were met and interpreting data to identify trends for future datasets. Worked on a pilot project to transition to Amazon EMR and other AWS cloud solutions. Developed interactive dashboards and various ad-hoc reports in Tableau, connecting to multiple data sources.Environment: Python, SQL server, Hadoop, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, Mahout, LSTM, RNN, Spark MLLib, MongoDB, AWS, Tableau, Unix/Linux. |