Big Data Azure Resume Denton, TX

Big Data Azure Resume Denton, TX
Resumes | Register
Candidate Information
Title	Big Data Azure
Target Location	US-TX-Denton
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Denton, TX
Big Data, Machine Learning, ETL and ELT and Data Analysis Denton, TX
Big Data Advanced Analytics Richardson, TX
Data Engineer Big Dallas, TX
Azure data engineer Allen, TX
Big data engineer Irving, TX
Azure data engineer Frisco, TX
Click here or scroll down to respond to this candidate
Sr Data EngineerName: MANJU RANIEmail: EMAIL AVAILABLEMobile: PHONE NUMBER AVAILABLEProfessional Summary:Having around 9+ years of professional experience in IT, working with various Legacy Database systems, which include work experience in Big Data technologies as well.Good experience in understanding of architecting, designing and operation of large-scale data and analytics solutions on Snowflake Cloud Data Warehouse.Experience on Migrating database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.Experience in Requirement gathering, System analysis, handling business and technical issues & communicating with both business and technical users.Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies likeAgile and hybrid methods.Stay up to date on the latest advancements in data engineering tools and technologies (e.g., Apache Spark, Airflow, Snowflake, Data Bricks).Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop, and Flume.Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.Good understanding of Apache Airflow.Designed, implemented, and maintained end-to-end data pipelines on Databricks to ingest, transform, and load large volumes of structured and unstructured data.Leveraged Databricks Delta Lake for managing and optimizing data storage, ensuring reliability, consistency, and ACID compliance.Experience in workflow scheduling with Airflow, AWS Data Pipelines, Azure, SSIS, etc.Good working knowledge of Snowflake and Teradata databases.Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Ka a (distributed stream- processing).Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizationsusing SAS and Python.Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation andaggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Hands-on experience building PySpark, Spark Java and Scala applications for batch and stream processing involving Transformations, Actions, Spark SQL queries on RDDs, Data frames.Strong experience writing, troubleshooting, and optimizing Spark scripts using Python, Scala.Experience with PostgreSQL features such as ACID transactions, row-level locking, and replication.Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Build multiple Data LakesImplemented security best practices in Azure Databricks, including authentication, authorization, and data encryption, ensuring data security and compliance with Azure data governance standards.Involved in building Data Models and Dimensional Modeling with 3NF, Star, and Snowflake schemas for OLAP and Operational data store (ODS) applications.Collaborated with cross-functional teams to design and implement data solutions using Azure Databricks, showcasing teamwork and communication skills in a collaborative Azure environment.Staying up to date with the latest Azure Databricks features, updates, and best practices, continuously improving skills and knowledge in Azure Databricks and data processing technologies in the Azure ecosystem.Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.Experience in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.Experienced in building highly scalable Big-data solutions using NoSQL column-oriented databases like Cassandra, MongoDB and HBase by integrating them with Hadoop Cluster.Implemented advanced Spark optimizations, such as partitioning, caching, and parallelism, to improve query performance and resource utilization.Experience with using PostgreSQL in a cloud environment, such as AWS, Azure, or GCP.Experience with Jira, Confluence, and Rally for project management and Oozie, Airflow scheduling toolsSolid Excellent experience in creating cloud-based solutions and architecture using Amazon Web services (Amazon EC2, Amazon S3, Amazon RDS, EMR, Glue) and Microsoft Azure.Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance anddesign - development - implementation of solutions.Experience with designing, developing, and maintaining data pipelines using PostgreSQL.Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data.Extensive working experience in an agile environment using a CI/CD model.Experienced in Strong scripting skills in Python, Scala and UNIX shell.Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and incorporate complex UDF's in business logic.Technical Skills:Big Data & Hadoop EcosystemHadoop 3.3/3.0, Hive 2.3, Solr 7.2, Apache Flume 1.8, Sqoop 1.4, Ka a 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream setsCloud Technologies:AWS, Glue, EC2, EC3, EMR, Redshift & MS Azure, SnowflakeProgramming Languages:Python, Scala, SQL, Java, C/C++, Shell ScriptingData Modeling Tools:Erwin R9.7, ER Studio v16Packages:Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal ServerRDBMS / NoSQL Databases:Oracle 19c, Teradata R15, MS SQL Server 2019, Cosmos DB, Cassandra 3.11, HBase1.2,Testing and defect tracking Tools:HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source SafeOperating System:Windows 10/8, Unix, Sun SolarisETL/Data warehouse Tools:Informatica 9.6, SAP Business Objects XIR3.1/XIR2, Talend, TableauMethodologies:RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.Client: AT&T, St. Loui Dec2023-TillDateRole: Sr. Data EngineerResponsibilities:Involved in building database Model and Views utilizing Python to build an interactive web-based solution.Collaborated with other developers to handle complicated issues related to deployment of Django based applications.Handled development and management of front-end user interfaces with the help of HTML, CSS, jQuery and JavaScript.Modify the existing Python/Django modules to deliver certain format of data and add new features.Automated a reporting process, using Python, Luigi (library for task workflow and dependencies), and another APIs.Written Python scripts using Python libraries such as pandas, NumPy that does read/write operations on large CSV files, perform data aggregations and compare data by columns.Developed real-time data processing solutions using Databricks Streaming (Structured Streaming), processing streaming data from Kafka, Kinesis, or other message brokers.Involved in porting the existing on-premises Hive code migration to GCP (Google Cloud Platform) Big Query.Experience in integrating Python RESTAPI Framework using Django.Working experience of Data Warehouse ETL/ design and implementation of complex big data pipelines.I used Python, PySpark, shell script, oracle scheduler, Luigi, Oracle, PlSql etc.Developed JSON scripts for deploying pipelines in Azure Data Factory (ADF) that process data using Azure Synapse SQL Activity, demonstrating experience with data pipeline development and integration with Azure services.Designed and optimized streaming workflows for low-latency data ingestion, transformation, and analytics in Databricks notebooks.Used Azure Databricks for processing large-Scale data using distributed computing, including data ingestion, data transformation, and data analysis tasks, showcasing proficiency in utilizing Azure Databricks for bigdata processingDeveloped and implemented machine learning models using Azure Databricks, leveraging its built-in machine learning libraries and distributed computing capabilities, demonstrating expertise in machine learning on the Azure platform.Involved in migration of an Oracle SQL ETL to run on Google cloud platform using cloud Dataproc &Big Query, cloud pub/sub for triggering the Apache Airflow jobs.Design, develop and manage the Data Warehouse in Redshift, Snowflake & Data Lake for Analytics and ReportingAs a Data Engineer I am responsible for building scalable distributed data solutions using Hadoop.Leveraged Azure Databricks notebooks for interactive data exploration, visualization, and analysis, show casing proficiency in utilizing the collaborative notebook environment for data analysis and exploration.Integrated Azure Databricks with other Azure services, such as Azure Synapse Analytics, Azure Blob storage, and Azure SQL Database, for seamless data processing and analysis workflows, demonstrating proficiency in building end-to- end data solutions using Azure Databricks.Building Data pipelines & Data integration using Snowflakes snow pipes.Utilized Azure Databricks for real-time data processing and stream analytics, showcasing expertise in processing and analyzing data in real-time using Azure Databricks streaming capabilities.Designed workflows using Airflow to automate the services developed for Change data capture.Implemented data quality checks and validation routines within Databricks pipelines to ensure data accuracy, completeness, and consistency.Implemented monitoring and alerting solutions for Databricks workloads using tools like Prometheus, Grafana, or Databricks Monitoring, to proactively identify and address performance issues.Used Jenkins to deploy our code into different environments and scheduling jobs.Expertise in Apache Spark and Databricks, including Spark SQL, Data Frame API, and Spark ML lib, for distributed data processing, analytics, and machine learning.Used bug-tracking tools like Jira, confluence and version controls Git, Git Lab.Environment: Python, Django, Luigi, windows, Linux, MySQL, SQL, Cassandra, AWSRDS, AWSS3, AWSEC2, Kaa, JSON, Restful API, MVC architecture, GitLab, Agile, PostgreSQL, Enterprise Scheduler, Bit vise SSH Client, Scrum, JIRA, GIT.Client: Elevance Health, Texas Sep2022Nov 2023Role: Data EngineerResponsibilities:As a Data Engineer I am responsible for building scalable distributed data solutions using Hadoop.Involved in Agile Development process (Scrum and Sprint planning).Handled Hadoop cluster installations in Windows environment.Migrated on-premises environment in GCP (Google Cloud Platform)Migrated data warehouses to Snowflake Data warehouse.Defined virtual warehouse sizing for Snowflake for different types of workloads.Integrated and automated data workloads to Snowflake Warehouse.Extensive experience in building and maintaining data pipelines on AWS Databricks using Python and SQL.Utilized Dbt Cloud/Core to architect and develop data models, ensuring scalability, flexibility, and maintainability of data pipelines for analytics.Proficient in Snowflake's data loading mechanisms, including bulk loading and Snow pipe.As a Data Engineer I am responsible for building scalable distributed data solutions using Hadoop.Proficiently managed resources in Spark-on-Kubernetes by leveraging the improvements brought by the Kubernetes Resource Staging Server (RSS), ensuring efficient resource allocation and managementHands-on experience building PySpark, Spark Java and Scala applications for batch and streamCreated tables in snowflake DB, loading and analyzing data using Spark-Scala scripts.Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL.Written POCs in Python to analyze the data quickly before applying big data solutions to process at a scale.Responsible for data governance rules and standards to maintain the consistency of the business element namesBuild Data Warehouse in Azure platform using Azure data bricks and data factory.Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.Designed ETL using Internal/External tables and store in parquet format for efficiency.Involved in porting the existing on-premises Hive code migration to GCP (Google Cloud Platform) Big Query.Involved in migration an Oracle SQL ETL to run on Google cloud platform using cloud Dataproc &Big Query, cloud pub/sub for triggering the Apache Airflow jobs.Experience in designing and implementing Snowflake data models, schemas, tables, and views to support efficient data organization and querying.Extracted data from data lakes, EDW to relational databases for analyzing and getting more meaningful insights using SQL Queries and PySpark.Experience with real-time streaming data processing using AWS Databricks Streaming and integrating with AWS services like Kinesis or Kafka.Later Migrated applications from Django to Flask and NoSQL (DynamoDB) to SQL(Snowflake)Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoopcluster on AWS EMR.Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.designed and implemented Data Vault 2.0 methodologies using the automated package within Dbt, ensuring scalability and flexibility in data warehousing solutions.Wrote Sqoop Scripts for importing and exporting data from RDBMS to HDFS.Set up Data Lake in Google cloud using Google cloud storage, Big Query and Big Table.Developed scripts in Big Query and connected it to reporting tools.Designed workflows using Airflow to automate the services developed for Change data capture.Carried out data transformation and cleansing using SQL queries and PySpark.Used Ka and Spark streaming to ingest real time or near real time data in HDFS.Worked related to downloading Big Query data into Spark data frames for advanced ETL capabilities.Built reports for monitoring data loads into GCP and drive reliability at the site level.Participated in daily stand-ups, bi-weekly scrums and PI panning.Environment: Hadoop, GCP, Big Query, Snowflake DB, Big Table, Spark, Sqoop, ETL, HDFS, Snowflake DW, Oracle, SQL, MapReduce, Kaa and Agile process.Client: Aetna, Connecticut Sep2018-Mar2022Role: Data EngineerResponsibilities:Worked as Data Engineer to review business requirement and compose source to target data mapping documents.Conducted technical orientation sessions using documentation and training materials.Gathered the business requirements from the Business Partners and Subject Matter Experts.Served as technical expert guiding choices to implement analytical and reporting solutions for client.Worked closely with the business, other architecture team members and global project teams to understand, document and design data warehouse processes and needs.Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) onEC2.Developing code using Apache Spark and Scala, IntelliJ, NoSQL databases (Cassandra), Jenkins, Docker pipelines, GITHUB, Kubernetes, HDFS file System, Hive, Kafka for streaming Real time streaming data, Kibana for monitor logs etc. authentication/authorization to the data Responsible to deployments to DEV, QA, PRE-PROD (CERT) and PROD using AWS.Developed reconciliation process to make sure elastic search index document count matches to source records.Maintained Tableau functional reports based on user requirements.Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.Used Agile (SCRUM) methodologies for Software Development.Developed data pipelines to consume data from Enterprise Data Lake (Map R Hadoop distribution - Hive tables/HDFS) for analytics solution.Created Hive External tables to stage data and then move the data from Staging to main tables.Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.Developed incremental and complete load Python processes to ingest data into Elastic Search from Hive.Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.Developed Rest services to write data into Elastic Search index using Python Flask specificationsDeveloped complete end to end Big-data processing in Hadoop eco system.Used AWS Cloud with Infrastructure Provisioning / Configuration.Created dashboards for analyzing POS data using Tableau.Developed Tableau visualizations and dashboards using Tableau Desktop.Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.Implemented partitioning, dynamic partitions and buckets in Hive.Deployed RMAN to automate backup and maintaining scripts in recovery catalog.Worked on QA the data and adding Data sources, snapshot, caching to the report.Environment: AWS, Python, Agile, Hive, Oracle12c, Scala2.1.1, Tableau, HDFS, PL/SQL, Snowflake, Sqoop, FlumeClient: Zensar Technologies Aug2015 May 2018Role: Data AnalystResponsibilities:Effectively led client projects. These projects contained heavy Python, SQL, Tableau and data modelling.Performed data merging, cleaning, and quality control procedures by programming data object rules into a database management system.Created detailed reports for management.Reported daily on returned survey data and thoroughly communicated survey progress statistics, data issues, and their resolution.Involved in Data analysis and quality check.Extracted data from source files, transformed and loaded to generate CSV data files with Python programming and SQL queries.Stored and retrieved data from data-warehouses.Created the source to target mapping spreadsheet detailing the source, target data structure and transformation rule around it.Worked on writing Scala programs using Spark on Yarn for analyzing data.Wrote Python scripts to parse files and load the data in database, used Python to extract weekly information from the files, Developed Python scripts to clean the raw data.Worked extensively with Tableau Business Intelligence tool to develop various dashboards.Worked on datasets of various file types including HTML, Excel, PDF, Word and its conversions.Analyzed data from company databases to drive optimization and improvement of product development, marketing techniques and business strategiesDeveloped Spark Streaming Jobs in Scala to consume data from Kafka topics, made transformations on data and inserted to HBase.Implemented Spark using Scala and Spark, SQL for faster testing and processing of data.Performed Database and ETL development per new requirements as well as actively involved in improving overall system performance by optimizing slow running/resource intensive queries.Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.Participated in data modelling discussion and provided inputs on both logical and physical data modelling.Reviewed the Performance Test results to ensure all the test results meet requirement needs.Created master Data workbook which represents the ETL requirements such as mapping rules, physical Data element structure and their description.Environment: Oracle10g, UNIX Shell Scripts, MS Excel, Scala, MS PowerPoint, Python, SQL.
Respond to this candidate
Your Message
Please type the code shown in the image: