| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Contact: PHONE NUMBER AVAILABLE
Email: EMAIL AVAILABLE
Sr.Data EngineerLinkedIn:
LINKEDIN LINK AVAILABLE
__________________________________________________________________________________________Profile Summary: Having around 10+ years of professional IT experience which includes in Developing, Implementing, configuring, Python, Hadoop& Big Data Technologies.
Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Expertise in Hadoop architecture and various components such as HDFS, YARN, Hive, Pig, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, Apache, Cassandra, and MapReduce programming paradigm. Very Good experience working in Azure Cloud, Azure DevOps, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HD Insight, Big Data Technologies (Hadoop and Apache Spark) and Data bricks. Experience in using various tools like Sqoop, Flume, Kafka, and Pig to ingest structured, semi-structured and unstructured data into the cluster.
Hands-on experience in developing web applications and RESTful web services and APIs using Python, Flask, and Django. Good working knowledge of the Snowflake and Teradata databases. Strong experience in migrating other databases to Snowflake. Proficient with Apache Spark ecosystem such as Spark, Spark Streaming using Scala and Python. Developed highly optimized Spark applications to perform various data cleansing, validation, transformation, and summarization activities according to the requirement.
Constructed data staging layers and fast real-time systems to feed BI applications and machine learning algorithms. Hands on Experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets APIs.
3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily. Having hands on experience in Application Deployment using CICD pipeline. Experience in implementing Spark using Scala and Spark SQL for faster processing of data. Strong experience in extracting and loading data using complex business logic using Hive from different data sources and built the ETL pipelines to process terabytes of data daily. Experienced in transporting and processing real time event streaming using Kafka and Spark Streaming. Designed and developed spark pipelines to ingest real time event-based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive. Experience working with GitHub/Git 2.12 source and version control systems.Technical Skills:HadoopHadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Spark, Pyspark, Scala and Mongo DBCloud Technologies:Azure Analysis Services, Azure SQL Server, Azure Synapse, Dynamo DB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HD Insight, AWS S3 bucket, EMR, EC2, Lambda, SQSDBMS:Amazon Redshift, PostgreSQL, Oracle 9i, SQL Server, IBM DB2 And TeradataETL Tools:Data Stage, Talend and AbinitioDeployment Tools:Git, Jenkins, Terraform and CloudFormationProgramming Language:Java, Python, ScalaScripting:Unix Shell and Bash scriptingEducation: Bachelors in Computer Science from Jawaharlal Nehru Technological University, India -2013Certification: Microsoft Certified: Azure Data Engineer Associate. Credential ID: 724D15B45563911F.Work Experience:State Farm Insurance, Bloomington, IL May2024 - Present
Role: Sr. Azure Data EngineerProject: Real-time Data Ingestion and Processing for State Farm ClaimsDescription: The extensive aim of the project includes consolidating which stores on-premises data sources into a scalable cloud data platform, enhancing existing analytics environment for advanced visualization. The project scope includes implementing robust data ingestion pipelines using Azure Data Factory, enabling the extraction of data from various sources and seamless loading into the designated data lake or databases. The firm was strategically positioned to make faster data-backed decisions, mitigate risks proactively while also gaining a competitive edge within the dynamic financial services industry.Responsibilities:.
Analyzing the requirements and discussed with managers and leads on the functionality. Architected a scalable Azure data platform, leading data modeling efforts for efficient migration of claims data to Azure Data Lake Storage. Designed Self-hosted Integration Runtime (SHIR) for secure connectivity between on-premises systems and the Azure cloud. Leveraged Azure Data Lake Storage Gen2 for high-volume storage and backup of structured and unstructured data. Designed and implemented ETL pipelines for migrating claims data from on-premises systems to Azure cloud services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL. Managed data ingestion into Azure Data Lake, Azure Blob Storage, Azure SQL Database, and Synapse Analytics, leveraging Azure Databricks with Apache Spark for scalable real-time data processing. Worked with Spark applications in Python for developing the distributed environment to load high volume files using Pyspark with different schema into Pyspark Data frames and process them to reload into Azure SQL DB tables. Implemented Azure Event Hubs for real-time ingestion of claims data and customer interactions, enabling seamless integration with the cloud data platform. Consolidated data into Azure Data Lake Storage Gen2 for scalable and secure cloud storage, ensuring efficient data retrieval and integration with Azure Synapse Analytics for complex queries and reporting. Set up Azure SQL Database for structured data storage and supported data warehousing with Azure Synapse Analytics to enable advanced data analytics and visualization. Implemented real-time data streaming pipelines with Apache Kafka and Azure Stream Analytics, optimizing transformation and timely delivery of insights. performance tuning of PL/SQL queries to enhance execution speed and optimize resource usage, resulting in improved overall system performance. Automated data validation scripts using Python to ensure data integrity and consistency between on-premises and cloud data environments, minimizing data discrepancies and errors post-migration. Analyzed the SQL scripts and designed the solution to implement using Pyspark. Developed custom aggregate functions using Spark SQL and performed interactive querying.
Azure Monitor and Azure Log Analytics to track pipeline performance, troubleshoot issues, and optimize the cloud infrastructure for cost-efficiency and scalability. Integrated Azure Key Vault for secure management of encryption keys and credentials across data pipelines Conducted performance tuning for Azure Synapse Analytics and Databricks to improve query response times and real-time data processing efficiency. Worked on Jenkins continuous integration for deployment of project and deployed the project into Jenkins using GIT version control system. Designing and maintaining reports in Power BI, built on top of Azure Synapse/Azure Data Warehouse, Azure Data Lake, Azure SQL.Environment: Azure SQL, Azure Data Warehouse, Azure Databricks, Azure Data Lake Analytics, Azure Blob Storage, MapReduce, Snowflake, Python, PySpark, T-SQL, Git and GitHubVirtusa/Dow Jones, NY May 2021 - Apr 2024Role: Azure Data EngineerProject: Enhancing Financial Data Analysis and Data Optimization InitiativeDescription: Engaged in a pivotal project as an Azure Data Engineer in collaboration with Virtusa and Dow Jones. Utilized Azure's data tools to optimize the processing of financial data at Dow Jones. This project focus was on enhancing the structure and analysis of financial data, with the goal of efficiently delivering premium content to users.
Responsibilities: Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake. Design Setup maintain Administrator for the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory. Expertise in snowflake to create and Maintain Tables and views.
Implemented data ingestion and handling clusters in real time processing using Apache Storm and Kafka. Evaluate Snowflake Design considerations for any change in the application. Developed database unloader framework using SPARK and Scala which extracts data from PostgreSQL database and saves output file in Azure Blob Storage.
Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity. Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Developed and implemented data pipelines to integrate IoT device data into Snowflake for scalable storage and analysis. Utilized Snowflake s data warehousing capabilities to enable real-time analytics and reporting on IoT data. Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Collaborated with cross-functional teams to design data models that improved data accessibility and decision-making processes. Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc. Proficient with Azure Data Lake Services (ADLS), Databricks & Databricks Delta lakes. Migrate the Data using Azure Database Migration Service (AMS). Migrate SQL Server and Oracle database to Microsoft Azure Cloud. Extracted real time feed using Kafka and Spark streaming and converted it to RDD and processed data in the form of Data Frame and saved the data as Parquet format in HDFS. Used Azure Data Factory extensively for ingesting data from disparate source systems and used as an orchestration tool for integrating data from upstream to downstream systems. Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. Developed Spark scripts to import large files from Azure Blob storage buckets and imported the data from different sources like HDFS/HBase into Spark RDD.
Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
Designed and architected scalable data processing and analytics solutions, including technical feasibility, integration, development for Big Data storage, processing, and consumption of Azure HDInsight. Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster. Developed data processing tasks using Pyspark such as reading data from external sources, merging data, perform data enrichment and loading into target data destinations. Designing and maintaining reports in Power BI, built on top of Azure Synapse/Azure Data Warehouse, Azure Data Lake, Azure SQL. Also creating workspaces, designing security including row level security for various reports. Working with users to gather requirements and training users.Environment: Azure Storages, Azure SQL, ADLS, AMS, ADF, Azure Data Warehouse, Databrick, Hive, Sqoop, Linux, Scala, Snowflake, Apache MapReduce, YARN, Pig, Zookeeper, Kafka, HBase, Spark, Scala, Python, PostgreSQL, MongoDB, and airflow.Baker&Taylor,NC Feb 2019 - Apr 2021Role: Big Data engineerProject: Data Infrastructure EnhancementDescription: The goal of Baker & Taylor was to create a comprehensive analytics and data processing platform for handling massive amounts of heterogeneous data. The project's tasks included ensuring effective data storage and transfer, streamlining data pipelines, and extracting data from multiple sources. Key tasks included developing and managing data workflows, optimizing data processing performance, and implementing solutions for advanced data analytics and reporting to support business insights and decision-making.Responsibilities: Participated in extracting customer big data from various sources, transferring it to Azure Blob Storage for storage in data lakes, which included handling data from mainframes and databases. Created Azure Data Factory pipelines to copy the data from source to target systems. Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
Involved in loading data from UNIX file system and FTP to HDFS. Experience in optimizing Map Reduce Programs using combiners, partitioners, and custom counters for delivering the best results. Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing. Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability. Migrated data from Client data scope to Azure SQL server and Azure Data Lake Gen2. Set up monitoring and logging for Azure Synapse to proactively identify and resolve data issues, optimizing data pipeline performance. Created multiple components using Java and employed Spring Batch for executing ETL batch processing tasks. Used Azure Blob Storage and Azure Data Factory (ADF) to effectively move data across databases, and Azure Event Hubs to feed server log data. Monitoring daily running Azure pipelines for different applications and supporting multiple applications. Introduced partitioning and bucketing techniques in Hive to enhance the organization of data for improved efficiency. Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tools. Expressions and Tabular Model. Skills in Power BI visualization, high quality Power BI reports and Dashboards. Used Azure Logic Apps workflow engine to manage interdependent jobs and to automate several types of Hadoop jobs, including Java MapReduce, Hive and Sqoop as well as system-specific jobs. Applied transform rules Data Conversion, Data Cleansing, Data Aggregation, Data Merging and Data split. Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop. Worked with Hadoop ecosystem tools, including HDFS, Hive, and Sqoop for data management.Environment: Azure services, HDFS, Map Reduce, Spark, YARN, Hive, Sqoop, Pig, Java, Python, Jenkins, SQL, ADF, Databricks, Data Lake, ADLS Gen2, Blob, MySQL, Azure Synapse, Power BI.Wipro/BestBuy, India Mar 2017 - Jan 2019
Role: AWS Data EngineerProject: E-commerce Solutions
Description: Wipro/BestBuy aimed to develop a scalable data processing and analytics platform to manage and analyze large volumes of retail data. The project involved designing, implementing, and optimizing ETL pipelines and cloud-based data storage solutions to enable advanced analytics and reporting. In order to improve operational insights and decision-making, key duties included creating frameworks for data ingestion, transferring data from on-premises systems, and enabling real-time data processing.Responsibilities: Created ETL Framework using spark on AWS EMR in Python. Developed a POC for project migration from on premise Hadoop MapReduce system to AWS. Worked on AWS Data Pipeline to configure data loads from S3 to Redshift. Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau. Design and Develop ETL Process in AWS Glue to migrate Campaign data from external sources. Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups, EBS, Snowflake, IAM. Generated a script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code. Created ingestion framework using Kafka, EMR, Cassandra in Python. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Very capable of using AWS utilities such as EMR, S3 and to run and monitor Hadoop/Spark jobs. Develop Cloud Functions in Python to process JSON files from source and load the files to BigQuery. Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS. Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse. Developed a capability to implement audit logging at required stages while applying business logic. Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operationsEnvironment: AWS Glue, S3, EC2, VPC, Redshift, EBS, EMR, Apache Spark, PySpark, SQL, Python, HDFS, Hive, Apache Kafka, Sqoop, YARN, Oozie, Shell scripting, Linux, Eclipse, Jenkins, Git, GitHub, MySQL, Cassandra, and Agile Methodologies.Datagaps, Hyderabad, IndiaRole: Python Developer Feb 2015 - Feb 2017Project: Integrated Data Platform DevelopmentDescription: The Integrated Data Platform Development involved the creation of a robust and versatile data ecosystem to streamline processes, enhance data accessibility, and ensure seamless integration across various sources and formats. Mphasis spearheaded the architecture, development, and maintenance of this sophisticated platform, aiming to streamline processes, enhance data accessibility, and ensure seamless integration across diverse sources and formats.
Responsibilities: Developed frontend components using Python, HTML5, CSS3, AJAX, JSON, and jQuery for interactive web applications. Analyzed system requirements specifications and actively engaged with clients to gather and refine project requirements. Established and maintained automated continuous integration systems using Git, MySQL, and custom Python and Bash scripts. Successfully migrated Django databases between SQLite, MySQL, and PostgreSQL while ensuring complete data integrity. Developed custom directives using Angular.js and interfaced with jQuery UI using Python and Django for efficient content management. Created test harnesses and utilized Python's unit test framework for comprehensive testing to ensure application quality and reliability. Developed multi-threaded standalone applications in Python for various purposes, including viewing circuit parameters and performance analysis. Proficient in performing various mathematical operations, data cleaning, features scaling, and engineering using Python libraries like Pandas and NumPy. Developed machine learning algorithms such as classification, regression, and deep learning using Python, alongside creating, and writing result reports in different formats. Utilized Python libraries like Beautiful Soup for web scraping to extract data for analysis and visualization. Developed REST APIs using Python with Flask and Django frameworks, integrating various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files. Led the development of data platforms from scratch, participating in requirement gathering, analysis, and documentation phases of the project.Environment: Python, Jupiter Notebook, PyCharm, Django, RDBMS, Shell scripting, SQL, Pyspark, Pandas, Numpy, Matplotlib, MySQL packages, Postgres, UNIX, Jenkin, GIT, XML, HTML, CSS, JavaScript, Shell Scripts, Oracle, PostgreSQL, JSON.Magneto IT Solutions, India Dec 2013 - Jan 2015
Role: Software Engineer
Project: E-commerce SolutionsDescription: Magneto IT Solutions revolve around innovative technology to empower businesses, improve operational efficiency, and enhance their digital presence in the competitive market by developing robust and scalable e-commerce platforms to enable businesses to establish and grow their online presence, facilitating seamless transactions and customer interactions.Responsibilities:
Used Micro services architecture to break down the monolithic application to independent components.
Involved in development of REST Web Services using Spring MVC to extract client related data from databases and implementing Microservices to base on RESTful API utilizing Spring Boot with Spring MVC.
Customized RESTful Web Service layer to interface with DB2 system for sending JSON format data packets between front-end and middle-tier controllers.
Improved the performance of the backend batch processes using Multithreading and concurrent package API.
Used JBPM to control some of the workflows in a different module of the application providing the interface documents for approval.
Developed the persistence layer using Hibernate Framework by configuring the various mappings in hibernate files and created DAO layer.
Developed Hibernate and Spring Integration as the data abstraction to interact with the database of Oracle DB.
Experience working on Application Servers like Apache Tomcat application server.
Used Maven as build automation tool for deploying the project on Tomcat Application Server.
Use Jenkins alongside Maven to Compile & Build Microservices code and configure the Build Triggers.
Worked on JIRA for User requirements and used Jira as bug tracking tools.
Worked on Agile methodology for the software development process of functional and scalable applications.
Used GIT to maintain the version of the files and took the responsibility to do the code merges and creating new branches when new feature implementation starts.
Environment: Java, MySQL, Spring Boot, Oracle Database, Apache Tomcat, Maven, Jenkins, Jira, Git, GitHub, and Agile Methodologies. |