| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate
Jahnavi REMAIL AVAILABLE | PHONE NUMBER AVAILABLE
SUMMARY 3+ years experience as a Data Engineer with expertise in Big Data, Cloud platforms, and Data Warehousing, and ensuring compliance with data governance policies. Proficient with AWS services like S3, Redshift, EMR, Lambda, Glue and SageMaker to build data pipelines. Experienced with Azure Data Factory, Databricks, Synapse Analytics, Cosmos DB for ETL and data processing, with knowledge of medallion architecture. Good knowledge of GCP services like BigQuery, Dataflow, Dataproc for data engineering workloads. Skilled in designing star and snowflake schema data models optimized for analytics, with a central fact table for aggregations, and additional derived dimensions (snowflake) to reduce redundant joins. Strong skills in data visualization and developing interactive Power BI dashboards and reports to deliver actionable insights. Expertise in data warehousing techniques, encompassing data cleansing, handling Slowly Changing Dimensions, assigning surrogate keys, and implementing change data capture (CDC) for Snowflake modelling. Strong working experience with SQL and NoSQL databases (CosmosDB, MongoDB, HBase, Cassandra) for data modelling, tuning, disaster recovery and data pipelines and knowledge of Machine Learning techniques. Skilled developer with extensive experience in utilizing GitHub and Bitbucket for version control and team collaboration. Strong understanding of Microsoft Dynamics CRM for customer relationship management, data integration, and reporting capabilities. Skilled in SQL, Python (NumPy, Pandas), Spark, Scala, Spark SQL to aggregate and transform data from formats like XML, JSON, CSV, and Parquet. Well-versed in DevOps (CI/CD) best practices including setting up automated build triggers, integration pipelines, test suites with Jenkins and Docker for consistent deployments, and end-to-end monitoring.EXPERIENCE State Farm, TX Data Engineer Jul 2023 - Present Developed healthcare data pipelines integrating S3, Redshift, Glue, Lambda, SageMaker, Airflow, and Spark. Created monitoring, access controls, and alerts using IAM, CloudWatch, and security best practices. Developed PySpark scripts for data transformations at scale, handling large datasets. Optimized SQL queries and implemented indexing for improving Redshift performance. Used Boto3 and Python for programmatic interactions with AWS services when orchestrating data pipelines and workflows. Utilized Tableau to design and develop intuitive dashboards and visualizations for insurance data stored in the Snowflake data warehouse, enabling data-driven decision-making and insights. Automated 1000s of ETL jobs with Airflow, transformed data at scale with PySpark, loaded data into RedshiftBCBS, TXData Engineer Nov 2022 May 2023 Built ETL pipelines, workflows, CI/CD automation with Azure Data Factory, Databricks, Synapse, and Spark. Created interactive Power BI dashboards and optimized SQL scripts and Spark processes. Utilized SQL for data exploration, cleaning, and preparing datasets for analytics. Created reusable PySpark functions for common ETL transformations within Databricks notebooks. Developed Python scripts for data preprocessing, cleaning, and integration tasks within Databricks notebooks. Strong working knowledge of Azure data platforms, ETL processing, and traditional RDBMS.Espire Infolabs, IndiaData Engineer Aug 2019 - Jul 2021 Wrote efficient SQL queries and scripts for data manipulation using complex DDL, DML, joins, subqueries, window functions, triggers, CTEs, Views, packages, functions, and stored procedures. Experience establishing scalable and reliable data pipelines, ETL processes, and scheduling workflows. Designed batch and streaming data pipelines using SSIS, Sqoop, Kafka, Hive, Spark, Hadoop, Power BI. Developed dimensional data models and star schema data marts using Slowly Changing Dimensions (SCD) and Change Data Capture (CDC) processes. Designed and developed Tableau dashboards and reports for data visualization and analysis. Performed data profiling and cleansing using Python scripts and SQL queries. Implemented slowly changing dimensions (SCD) logic using SQL for handling data mutations. Utilized SQL Server Reporting Services (SSRS) to create and deploy reports, including writing complex SQL queries, creating data sources and datasets, designing report layouts, and scheduling report deliveries, providing valuable insights to business stakeholders.CERTIFICATIONDP 900: Microsoft Azure Data Fundamentals.EDUCATION Master s: Data Science, University of North Texas, Denton, Texas [Aug 2021 May 2023]. Bachelor s: Information Technology, Sastra Deemed University, Tamil Nadu, India [June 2016 June 2020].SKILLS Programming Languages: Python, SQL, HiveQL, Scala, Java, C++.Databases: MS-SQL SERVER, Oracle, MS-Access, MySQL, Teradata, Postgres, DB2.Big Data Technologies: HDFS, Yarn, MapReduce, Pig, Hive, HBase, Cassandra, Oozie, Apache Spark, Scala, Impala, Apache Kafka.AWS Services: AWS S3, EC2, EMR, Redshift, Cloud Watch, Glue, Lambda, Athena, SNS, SQS.Azure Services: ADF(Azure Data Factory), ADLS Gen 2(Azure Data Lake Storage), Synapse Analytics, Data bricks, Logic Apps, Functional AppMethodologies: Agile/Scrum, Waterfall.Development Tools: Eclipse, Microsoft Office Suite (Word, Excel, PowerPoint, Access). Soft skills: problem-solving, Communication, collaboration, decision-making. |