| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Data EngineerPhone: PHONE NUMBER AVAILABLEEmail: EMAIL AVAILABLELinkedIn: LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY:Overall 5+ years of hands-on experience as a Data Engineer in Data Warehousing, Data Integration, and Data Modelling, including design, development, coding, testing, bug fixing, and production support of data warehousing applications.Experience in Creating and executing Data Pipelines in Azure.Experience gathering requirements, analyzing requirements, designing, developing, testing, and implementing business intelligence solutions using data warehouse/data mart design/data modelling, ETL, OLAP, OLTP, BI, and client/server applications.Excellent working knowledge of Star Schema, Snowflake, and Sqoop for importing data from RDBMS into HDFS and vice versa.Build a program with Python and Apache beam and execute it in cloud Dataflow to run Data validation between raw source files and big query tables.Experienced with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their applications (CI/CD) to deploying either on a public or private cloud.Experience with Snowflake Lambda data processing, such as collecting, aggregating, and moving data from multiple sources using Apache Flume and Kafka.Extensive knowledge of Spark real-time streaming data using Kafka and the Spring Boot API.Experience with Apache Spark, Scala, and Python to convert Hive/SQL queries into RDD transformations.Experience in designing and implementing end-to-end data integration and ETL workflows in Azure Synapse pipelines.Collaborative team members, working closely with Azure Logic Apps administrators and DevOps engineers to monitor and resolve issues related to process automation and data processing pipelines.Proficient in implementing CI/CD frameworks for data pipelines using tools like Jenkins, ensuring efficient automation and deployment.Expertise in enabling CI/CD processes using Jenkins for integrating all object deployments such as ETL code, SQL files, Shell scripts, and Python code base files.Working knowledge of version control tools such as SVN, JIRA, GitHub, and GitLab.Database modelling experience with SQL, PostgreSQL, and MySQL. Working knowledge of various operating systems, including Windows, Linux, UNIX.EDUCATION:Masters in Information Technology Management from Webster University 2022 to 2023TECHNICAL SKILLS:Big Data Technologies:HDFS, Hive, Map Reduce, Pig, Hadoop distribution, and HBase, Spark, Spark Streaming, Kafka.Cloud Services:Azure (Data bricks, Azure Data Lake, GCP, Azure HDInsight)Databases:Oracle, MySQL, SQL Server, Mongo DB, Dynamo DB, Cassandra, Snowflake.Programming Languages:Python, PySpark, Shell script, Perl script, SQL, Java.Version Control:Git, Git Hub, Maven.Operating Systems:Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OSVisualization/ Reporting:Tableau, Power BI matplotlibBig Data Ecosystems:HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Spark, Kafka, Spark, HBase.Deployment Tools:Git, Jenkins, Terra form and Cloud FormationCloud Technologies:Azure Analysis Services, Azure SQL Server, Dynamo DB, Step Functions, Glue, Cloud Watch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Data bricks and HDInsightData Visualization:Tableau, BO Reports, Power BI.PROFESSIONAL EXPERIENCE:Client: Publix Supermarkets Feb 2024 to till nowRole: Azure Data EngineerResponsibilities:Migrated vendor-related order processing and invoicing data from on-prem SQL Server to Azure Delta Lake for faster reconciliation and payment processing.Designed, developed, and maintained data pipelines using Azure Data Factory, enabling efficient data extraction, transformation, and loading (ETL) processes.Utilized Azure Data Factory (ADF) for real-time data ingestion into Azure Blob Storage.Managed and optimized Azure SQL databases for high-performance data storage.Applied data transformation and validation using Azure Databricks (PySpark) to ensure data quality before loading into Azure Delta Lake.Managed data lakes and storage accounts on Azure Blob Storage and Azure Data Lake Storage Gen2.Developed real-time data processing solutions using Azure Stream Analytics.Implemented continuous integration and continuous deployment (CI/CD) pipelines using Azure DevOps.Integrated Azure Databricks with Azure Data Factory for big data processing, optimizing data workflows and improving data processing speed.Implemented complex data transformation logic using ADF Data Flows, including data mapping, data type conversions, aggregations, filtering, and custom data manipulations.Utilized Delta Lake features such as schema evolution, time travel, and optimization techniques for query performance.Designed and built data pipelines on Azure Databricks using Spark-based transformations, including data cleansing, aggregation, data type conversions, and advanced analytics.Integrated Databricks with other Azure services like Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.Integrated Azure Functions with other Azure services like Azure Storage, Azure Event Hubs, Azure Cosmos DB, etc., to build robust and scalable serverless applications.Wrote Databricks notebooks (Python) for handling large volumes of data, transformations, and computations to operate with several types of file formats.Developed and optimized Spark applications using Spark Core, Spark SQL, Spark Streaming, or Spark MLlib.Utilized Databricks built-in libraries and tools, such as Delta Lake, MLflow, and Databricks Runtime, for advanced data engineering and machine learning tasks.Implemented CI/CD pipelines in Azure for automated build, test, and deployment of applications and infrastructure.Expertise with both version control platforms Git, and Agile methodologies and supporting tools Jira and Jira Service Desk.Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.Environment: Azure Data Factory (ADF), Azure SQL Database, Azure Databricks, Azure Synapse Analytics, Azure Blob Storage and Azure Data Lake Storage, Azure Stream Analytics, Azure HDInsight, GCP, Azure Cosmos DB, Azure Data Lake Analytics, Azure DevOps, CI/CD, Azure Monitoring and Security, Azure Data Catalog, Spark SQL, Delta Lake, Azure Functions, Python, algorithms, Git, Jira.Client: Molina Healthcare May 2023 to Dec 2023Role: Data EngineerResponsibilities:Responsible for Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, data bricks, Pyspark, Spark SQL and U-SQL Azure Data Lake Analytics.Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from Azure SQL, Blob storage, and Azure SQL Data warehouse.Worked with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL.Designed and implemented a real-time data streaming solution using Azure EventHub.Conducted performance tuning and optimization activities to ensure optimal performance of Azure Logic Apps and associated data processing pipelines.Point person from OMS team for all data migration related tasks.Develop a Spark Streaming application to process real-time data from various sources such as Kafka, and Azure Event Hubs.Build streaming ETL pipelines using Spark Streaming to extract data from various sources, transform it in real-time, and load it into a data warehouse such as Azure Synapse AnalyticsUse tools such as Azure Databricks or HDInsight to scale out the Spark Streaming cluster as needed.Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.Experienced in version control tools like GIT and ticket tracking platforms like JIRA.Environment: Azure, Hadoop, HDFS, Yarn, MapReduce, Hive, Sqoop, Oozie, Kafka, SparkSQL, Spark Streaming, Eclipse, Informatica, Oracle, CI/CD, PL/SQL UNIX Shell Scripting, Cloudera.Client: Yana Software Private Limited Hyderabad, India. May 2018 - Dec 2021Role: Big data DeveloperResponsibilities:Used Sqoop to import and export data between Hadoop and relational databases.Utilized Sqoop's built-in support for Kerberos authentication to securely transfer data between Hadoop and external systems.Developed and optimized Hive queries using techniques like partitioning, bucketing, and indexing.Created and managed Hive tables, views, and functions to organize and manipulate data efficiently.Developed Spark applications using Spark Core, Spark SQL, and Spark Streaming.Worked with Spark SQL for querying structured data in Hadoop.Monitored Oozie workflows for troubleshooting and optimization.Designed and developed automated data ingestion pipelines using Sqoop and Apache Oozie, allowing seamless data.Integrated Oozie with other Hadoop ecosystem tools, such as Sqoop and Hive, to create end-to-end data processing pipelines.Created reports, dashboards, and visualizations based on SQL query results using tools like Tableau, Power BI, or Excel.Conducted data validation and integrity checks using SQL constraints, data profiling, and data cleansing techniques to ensure data accuracy and consistency.Created several Databricks Spark jobs with PySpark to perform several tables to table operations.Developed automatic job flows and run through Oozie daily and when needed which runs MapReduce jobs internally.Designed interfaces using Boom where quality and performance were of utmost importance.Environment: Apache Sqoop, Apache Hive, Apache Spark, Apache Oozie, Apache HDFS, Hadoop, SQL, Tableau, Power BI, Microsoft Excel, Jenkins, Git. |