| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateName: Sarath RaviEmail: EMAIL AVAILABLEPH: PHONE NUMBER AVAILABLELinkedIn: LinkPROFESSIONAL SUMMARY: With 9+ years of hands-on experience in designing, developing, and maintaining data pipelines, data governance and data warehouses, with proficiency in Python, Advanced SQL, PL/SQL, Scala, Apache Spark for data engineering tasks. Extensive working knowledge on traditional databases like MS SQL Server, Oracle, PostgreSQL and Netezza. Proficient in Big Data Technologies like Hadoop, Spark, Kafka, etc. Skilled in designing and implementing efficient data models for both OLTP and OLAP systems. Extreme conceptual knowledge on data warehouses, data aggregation strategies, and designing schema architectures like star and snowflake schemas to efficiently store and query large volumes of structured data. Strong analytical skills to interpret data and provide actionable insights using reporting tools like Tableau and Power BI. Implemented automation in data workflows using tools like Airflow, Jenkins, or other CI/CD tools. Expert in managing and delivering data engineering projects on time and within scope. Expertise in integrating data from various sources, including APIs, flat files, and streaming data. Extreme Understanding of conceptual level on Insurance, Pharmaceutical, HR services and Sales data. Ability to work closely with data scientists, analysts, and business stakeholders to understand requirements and deliver solutions. Led a team offshore, assigning daily tasks and collecting the status. Experience with Snowflake, AWS, Microsoft Azure, Google Cloud Platform, BigQuery for data storage and processing dealing with large scale datasets. Proficient in using version control systems like Git for code and data pipeline management. TECHNICAL SKILLS:ProgrammingLanguagesSQL, PL/SQL, Python, Shell Scripting, Machine Learning, Ruby, Apache Spark, Scala ETL Tools Informatica PowerCenter, IICS, SSIS, Azure Data Factory, Snaplogic Databases Oracle, MS SQL Server, PostgreSQL, Netezza Cloud Services Amazon Web Services (AWS Redshift, S3, Lambda, EC2) Google Cloud Platform (GCP), Microsoft Azure, BigQuery, SnowflakeReporting Tools Power BI, TableauBigDatatechnologiesHive, Hadoop, MapReduceVersion Control Git, Bitbucket, Jenkins, DockerPROFESSIONAL EXPERIENCE:Client: Federated Mutual Insurance, Boston, MA Oct 2021 - Present Role: Sr. Data EngineerResponsibilities: I am a data expert, as I work primarily with databases and different types of software. Worked in complete Software Development Life Cycle (SDLC) which includes gathering and analyzing business requirements, understanding the functional workflow of information from source systems to destination systems. Ingest, analyze, and interpret large data sets to develop technical and data-driven solutions to difficult business problems using tools such as SQL and Python. Worked with databases, both relational and multidimensional. Migrated applications from Teradata DB to Azure Data Lake Storage Gen 2 using Azure Data Factory The data in Azure cloud from web service and loaded it to Azure SQL DB. Used Python to develop spark applications to load files with large volumes of data and load them into Azure SQL DB. Also Develop and automate pipelines using Databricks. Build data pipelines in airflow in GER for ETL related jobs using different airflow operators Used PyTorch an open-source machine learning library for Python that is primarily used for developing deep learning models and used features like dynamic computational graphs for easier debugging. Worked with devops and Terraform Manage the Teradata servers across the globe, monitor, respond to alerts, tuning, creating appropriate indexes, and creating incidents with Teradata and assistance/follow up with CSR. Performed system level and application-level tuning and supported the application development team for database needs and guidance using tools and utilities like explain, visual explain, PMON, DBC views. Developed shell scripts, stored procedures, and macros in order to automate to run based on the frequency. Involved in taking care of day-to-day needs for the development team within the specified SLA and also taking care of ad-hoc requests. Strong experience in NoSQL databases like Cassandra. Used postgres geospatial data types and functions that deal with location data, such as mapping and geocoding, making it an excellent choice for applications. Used Postgres as a data warehouse for storing large volumes of data for analytics purposes. Also supported advanced SQL features such as window functions, common table expressions, and subqueries, which are useful for complex analysis. Used Postgres as the database backend for content management systems (CMS) such as Drupal and Joomla. Used Data lineage to track data's path, transformations, and interactions with various systems and processes throughout its lifecycle. Used data lineage in many areas such as data integration, data migration, data quality management, and data governance. Responsible for migration activities under the qualified environment. Used Databricks, Scala, and Spark for creating the data workflows and capturing the data from Delta tables in Delta Lakes. Used open-source data integration platform like air bytes to replicate data from various sources, transform it, and load it into different destinations which is a simple and flexible way to connect data sources and data warehouses. Used pre-built connectors for popular data sources like MySQL, Postgres, MongoDB, Salesforce, and Google Analytics with air bytes. Performed Streaming of pipelines using Azure Event Hubs and Stream Analytics to analyze the data from the data- driven workflows. As the primary focus of this position is to solve problems, I have also been mining data and presenting it in an understandable way. Creating tables, writing reports on their findings, and simplifying highly technical language for others in the Company is also part of the job. AI-powered chatbots and virtual assistants are increasingly being used by telecom companies to handle customer queries, reduce wait times, and improve customer satisfaction. Used AI and ML are to predict when network equipment will require maintenance or replacement. This helped to reduce downtime and prevent service disruptions. Developed and tested the scripts for the new billing system and validated the data. Developed several packages, functions and triggers to use in series of other backend programs using PL SQL Coded PL/SQL subprograms and modified the existing PL/SQL program units Loaded customer data from flat files to oracle using PL/SQL procedures Provided temporary work around and worked on a permanent fix for P1 incidents which were caused due to data issues.Environment: Teradata 14,Oracle, Teradata SQL Assistant, SQL Developer, BTEQ, Fast Load, Multiload, Fast Export, UNIX, Shell scripting, View Point, Teradata Administrator,Tableau, Python, SQL, Cassandra DB, Azure Data Lake Storage Gen 2, Azure Data Factory, Azure SQL DB, Spark, Databricks, SQL Server, Kafka, Apache Spark, Delta Lake, Azure Event Hubs, Stream Analytics, Azure Blob Storage, PowerShell, Apache Airflow, Hadoop, YARN, PySpark, Hive Client: Splunk, San Francisco, CA Dec 2019 - Sep 2021 Role: Data EngineerResponsibilities: Worked on creating Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure. Worked on developing the process and ingested the data in Azure cloud from web service and loaded it to Azure SQL DB. Worked with Spark applications in Python for developing the distributed environment to load high volume files using PYSPARK with different schema into PYSPARK Data frames and process them to reload into Azure SQL DB tables. Designed and developed the pipelines using Databricks and automated the pipelines for the ETL processes and further maintenance of the workloads in the process. Worked on creating ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server. Worked with ETL operations in Azure Databricks by connecting to different relational databases using Kafka and used Informatica for creating, executing, and monitoring sessions and workflows. Worked on automating data ingestion into the Lakehouse and transformed the data, used Apache Spark for leveraging the data, and stored the data in Delta Lake. Ensured data quality and integrity of the data using Azure SQL Database and automated ETL deployment and operationalization. Used Databricks, Scala, and Spark for creating the data workflows and capturing the data from Delta tables in Delta Lakes. Worked with Azure Blob Storage and developed the framework for the implementation of the huge volume of data and the system files. Implemented a distributed stream processing platform with low latency and seamless integration, with data and analytics services inside and outside Azure to build your complete big data pipeline. Worked with PowerShell scripting for maintaining and configuring the data. Automated and validated the data using Apache Airflow. Worked on optimization of Hive queries using best practices and right parameters and using Hadoop, YARN, Python, and PYSPARK. Used Sqoop to extract the data from Teradata into HDFS and export the patterns analyzed back to Teradata. Used Accumulators and Broadcast variables to tune the Spark applications and to monitor the created analytics and jobs. Tracked Hadoop cluster job performance and capacity planning and tuning Hadoop performance for high availability and Hadoop cluster recovery. Worked with Tableau for generating reports and created Tableau dashboards, pie charts, and heat maps according to the business requirements.Environment: Python, SQL, Cassandra DB, Azure Data Lake Storage Gen 2, Azure Data Factory, Azure SQL DB, Spark, Databricks, SSIS, SQL Server, Kafka, Informatica, Apache Spark, Delta Lake, Azure Event Hubs, Stream Analytics, Azure Blob Storage, PowerShell, Apache .Client: Copart, Dallas, TX Sep 2017 - Nov 2019Role: Data EngineerResponsibilities: Managed data collection, transformation, and normalization processes using Azure Data Factory, leveraging Azure Synapse Analytics for efficient data processing. Integrated insurance data such as policy databases, claims systems, and customer information from diverse on- premises data platforms like PostgreSQL to Azure cloud data services, including Azure Synapse Analytics and Azure Data Lake Storage, ensuring adherence to data governance policies and standards specific to the insurance industry. Managed data collection, transformation, and normalization processes using Azure Data Factory, leveraging Azure Synapse Analytics for efficient data processing. Leveraged Azure Data Factory and Azure Databricks for orchestrating and scheduling intricate data processing workflows, ensuring reliable and timely execution while adhering to data lifecycle management policies. Utilized Databricks notebooks integrated with Azure to write and execute Spark SQL queries, documenting data transformation steps and analysis results. Optimize SQL queries and data processing workflows for performance and scalability, utilizing Spark SQL's query optimization and execution planning capabilities. Automated the scheduling and execution of ETL jobs using Azure Data Factory pipelines, providing increased visibility into data processing tasks and simplifying troubleshooting procedures while maintaining data security and privacy measures. Collaborated with actuarial and risk management teams to perform predictive analytics on insurance data using Azure Machine Learning services, applying algorithms such as predictive modeling, risk scoring, and fraud detection to improve insurance underwriting and claims management processes. Generated dashboards on Power BI to review the sales data by conducting in-depth analysis using DAX (Data Analysis Expressions) to calculate metrics, perform aggregations, and create calculated measures to bring meaningful insights which increased the sales by 20%. Provided leadership to associates by assigning tasks, collecting progress updates, and offering guidance on Azure data and analytics best practices to a team of associates. Environment: Azure Synapse Analytics, Azure Data Lake Storage, Azure Data Factory, Azure Databricks, Spark SQL, Azure Machine Learning, PostgreSQL, Power BI, PythonClient: McAfee, San Jose, CA June 2015 - Aug 2017Role: Data AnalystResponsibilities: Good knowledge and hands-on experience in Hadoop, big data technologies such as Hive, Pig map reduce, Yarn, OOZIE, SQOOP and cloud environments such as AWS. Designed and implemented a scoring algorithm to give custom credit rating in PYSPARK involving data science techniques including RFM segmentation, Anomaly Detection, Peer Group scoring, Suspicious Count. As one of the senior members within the team, worked on converting the whole Medical Provider Fraud analytics into PYSPARK and Apache Spark - Scala which was earlier written in SPSS, DB2 and Python and reduced model executions time and cost substantially by about 45%. Created charts and graphs in PYSPARK using Handy Spark and in matplotlib visualization package by converting PYSPARK data frames into pandas data frames. Implemented long term strategy, roadmaps, milestones in addition to being responsible for hiring and providing technical direction to the teams. Reduced development costs by 20% by creating a plan to merge related projects across business lines into one streamlined project. Mentored 30+ employees on project deliveries and gave knowledge transfers on solutions. Led a team building ELT Spark framework from scratch to migrate Marketing and Sales Application on to CEDP using Apache Spark-PYSPARK, Apache Airflow, Cloud technologies and Kubernetes. Responsible for initiating, planning, and executing projects while also tracking, managing dependencies, predicting, and mitigating risks. Worked in a cross functional marketing automation platform squad and responsible for creating end to end data analytics systems for MAP ODS by onboarding IBM cloud subscription and usage data. Responsible for developing, documenting, and delivering data analytics solutions to support Onboarding of several offerings into digital Marketing Platform using NoSQL CLOUDANT, DB2 Warehouse, DataStage, Python. Environment: Informatica 8.1, Oracle 10g/SAP Teradata 13.1, Control M, Tableau, Airflow, Hadoop, YARN, PYSPARK, Hive, Teradata, Sqoop, HDFS, Spark, Agile. |