Senior Data Engineer Resume Chandler, AZ

Senior Data Engineer Resume Chandler, AZ
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Senior Data Engineer
Target Location	US-AZ-Chandler
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Candidate's Name
Mobile: PHONE NUMBER AVAILABLEEmail:EMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLEProfessional SummaryOverall, 12+ years of diverse experience in the field of Data Space, which includes experience in requirement gathering, design and development, and Implementation of various applications as well as experience as a Data Engineer with cloud platforms like AWS, AZURE, GCP, SNOWFLAKE and DATABRICKS. Proficient in Druid and other big data technologies using Spark for data cleansing, data analysis, structured and unstructured data transformations, data modeling, data warehousing, and data visualizations using PySpark, SparkSQL, Python, SQL, Airflow, Kafka, SSIS, Sqoop, Oozie, Hive, Tableau and Power BI. In addition, I hold a DP-203 Azure data engineering certification.Having experience in designing and developing Azure Cloud solutions, specializing in data migrations, Business Intelligence, ETL, ELT, Data Integration, and BI Reports development. In addition, I hold a DP-203 Azure data engineering certification.Experience in dealing with Apache Spark and Apache Hadoop components like RDDs, Data Frames, Spark-Streaming, HDFS, MapReduce, HIVE, HBase, PIG, and SQOOP.Proficient in working with Azure non-relational data solutions like Azure Data Factory, Azure Cosmos DB, Large-Scale Data Processing with Azure Data Lake Storage, Data Streaming Solutions with Azure Streaming Analytics, Azure Synapse Analytics, and Data engineering with Databricks, Spark, and languages such as Python.Experience in data ingestion and processing, utilizing a combination of Azure services including Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics.Proficient in responsible for designing, developing, and maintaining data transformation pipelines using DBT, enabling data-driven decision-making for the organization.Experience in Developing Spark applications using Spark - SQL/PYSPARK in Databricks for ETL from multiple files (Parquet, CSV file, AVRO) formats for analyzing & transforming the data to customer usage patterns.Working experience in creating ADF Pipeline using Linked Services/Datasets/Activities to Extract and load data from different sources like Azure SQL, ADLS, Blob storage, Azure SQL Data warehouse/ Synapse.Successfully migrated data from on-premises SQL Server to Azure Synapse Analytics (DW) and Azure SQL DB, enabling streamlined access to data from the cloud.Having Experience in migrating Jenkins to Harness using Kubernetes and Terraform for CICD in Azure DevOps.Well-versed in Python,PySpark, Databricks Clusters, Delta Lake, Unity catalog, Workflows, SQL DWH Endpoint, and Azure Data Factory's Dynamic Expression Language, with a strong understanding of how to use them in data migration, processing, and analysis tasks.Data Migration Specialist with hands-on experience in successfully executing large-scale data migration projects from On-premises to Google Cloud Platform (GCP) using Talend.Proficient in building and executing SSIS packages within Azure Data Factory using SSIS integration runtime, facilitating seamless data movement across on-premises and cloud environments.Proficient in working with GitHub, and Azure DevOps Repo, and successfully deployed the Azure DevOps Release Pipeline with ADF ARM Templates and Databricks Notebooks.Utilized Python,PySpark within Azure Databricks to process and analyze data stored in one or more Azure services, providing actionable insights for stakeholders.Strong understanding of Data Modeling (Relational, dimensional, Star and Snowflake Schema), Data analysis, implementations of Data warehousing using Windows and UNIX.Experience with Data Cleansing, Data Profiling and Data analysis. UNIX Shell Scripting, SQL and PL/SQL coding.Experienced in Azure Data Purview, Data Governance, Risk Management, and Data Flows, providing a comprehensive approach to data management.Implemented Data Lineage in Azure Purview from ADF Pipelines.Created Data Sources like Oracle, Azure Data Lake Storage Gen2, and Snowflake in Azure purview.Implemented various Glossary creations in Azure Purview.Experience in managing Azure resource groups, subscriptions, Azure blob, File Storage, and Azure active directory users, groups, and Service Principles.Hands-on experience with Apache Airflow, an open-source platform for orchestrating and scheduling data workflows. Developed and maintained Airflow DAGs (Directed Acyclic Graphs) for automating data pipelines, including task dependencies, scheduling, and monitoring.Experience in configuring Spark Streaming to receive real-time data from Apache Kafka and store the stream data to HDFS. and expertise in using spark-SQL with various data sources like JSON, Parquet, and Hive.Experienced in Optimizing the Virtual warehouses and the SQL Query in terms of the cost in Snowflake.Well-versed with Hadoop distribution and ecosystem components like HDFS, YARN, MapReduce, Spark, Sqoop, Hive, and Kafka.Proficient in Spark for processing and manipulating complicated data utilizing Spark Core, Spark Context, Spark SQL, Data Frame, Pair RDD, and Spark Streaming.Demonstrated expertise in native Cloud architecture/On-Prem networking, Azure SQL Database, SQL Managed instance, SQL Elastic pool on Azure & SQL Server in Azure VM.Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, and Text Files into AWS Redshift.Experience using Fivetran for data integration, automating the ETL process, and ensuring accurate data is available for analysis promptly.Utilized Fivetran connectors to seamlessly integrate data from various sources such as Salesforce, Hubspot, and Google Analytics into our data warehouse, enabling us to confidently make data-driven decisions.Expertise in AWS services including S3, EC2, SNS, SQS, RDS, EMR, Kinesis, Lambda, Step Functions, Glue, Redshift, DynamoDB, Elasticsearch, Service Catalog, CloudWatch, and IAM.Experienced in utilizing AWS utilities such as EMR, S3, and CloudWatch to run and monitor Hadoop and Spark jobs on Amazon Web Services (AWS).Proficient in working with Amazon EC2 to provide a solution for computing, query handling, and storage across a wide range of applications.Experienced in utilizing AWS S3 to support data transfer over SSL and ensure automatic encryption of data whenever it is uploaded.Proficient in Docker containerization technology, leveraging it to package applications and dependencies into lightweight, portable containers.Experienced in deploying Docker containers to streamline application development, deployment, and scalability processes.Skilled in Kubernetes orchestration platform, enabling the deployment, scaling, and management of containerized applications across clusters.Knowledgeable in leveraging Kubernetes to achieve high availability, fault tolerance, and automated scaling for distributed systems.Expertise in configuring and deploying Hadoop clusters on AWS EC2 instances.Proficient in using AWS Glue to extract, transform, and load data from various sources to target destinations.Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).Experienced in working with Google Cloud Platform, Cloud Function, and OKTA-Integration.Developed and maintained data pipelines using Airflow, reducing data processing time.Strong RDBMS concepts created and maintained Views, Stored Procedures, User-Defined Functions, and System Functions using SQL Server, T-SQL, and worked on the design of star and snowflake schemas.Technical SummaryProgrammingPython, PySpark, Scala, Java, Shell Script, Perl, TCL, Power Shell, SQL, JavaScript, HTML, CSS.DatabasesOracle, MySQL, SQL Server, Mongo DB, Cassandra, Dynamo DB, PostgreSQL, Informatica, Teradata, Cosmos, Neo4j.Cloud TechnologiesMicrosoft Azure, Azure Data Factory v2, Data Bricks, Azure Purview, Azure Analysis Server, Azure Data Lake Store, Storage Blob, Logic Apps, Log Analytics, Azure SQL DB, Azure Synapse, Azure Search, Power BI, Terraform, AWS(Lambda, EC2, EMR, Amazon S3, Kinesis, Sagemaker, Athena, Redshift, Glue, DynamoDB, Elasticsearch, CloudWatch and IAM), Google Cloud Platform.Big Data TechnologiesHadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, Nifi, Kafka, Zookeeper, Yarn, Apache Spark, SparkLib, Python,FrameworksFlask, DjangoToolsJupyter, Data Bricks Notebook, PyCharm, Eclipse, DBT, Visual Studio, OKTA, SQL*Plus, SQL Developer, TOAD, SQL Navigator, Query Analyzer, SQL Server Management Studio, Informatica, Talend, SSIS, Eclipse, Postman, Maven, Jenkins, Harness, Terraform, Kubernetes, Docker.Versioning ToolsSVN, Git, GitHub (Version Control)Network SecurityKerberosDatabase ModellingDimension Modelling, ER Modelling, Star Schema Modelling, Snowflake ModellingMonitoring ToolApache Airflow, Agile, Jira, RallyVisualization/ReportingTableau, ggplot2, Matplotlib, SSRS, and Power BIWeb TechnologiesHTML, CSS, Bootstrap, JavaScriptMachine LearningPandas, SparkMLExperienceSr. Big Data EngineerHSBC, Los Angeles, CA. Mar 2023-presentKey Responsibilities:Collaborating with managers leads to understanding project requirements and creating functional specifications for Azure-based data platforms.Led the initiation of an Azure Data Governance project to address data security and compliance requirements.Implemented data classification policies to categorize data based on sensitivity and regulatory requirements.Ensured consistent tagging of data assets to enforce security and compliance standards.Configured and deployed Azure Data Catalog to centralize metadata management and data asset discovery.Maintained data catalog to provide a unified view of available data resources.Established data lineage tracking to visualize the flow of data within the organization.Conducted impact analysis to assess the consequences of data source changes on downstream processes.Implemented data quality improvement measures, including data profiling, data cleansing, and data monitoring.Implemented Data Lineage in Azure Purview from ADF Pipelines.Created Data Sources like Oracle, Azure Data Lake Storage Gen2, and Snowflake in Azure purview.Implemented various Glossary creations in Azure Purview.Reduced data errors and inconsistencies, resulting in improved data accuracy.Developed the Scalable ETL pipeline using PySpark on Azure Databricks, loaded the enriched data into Azure Data Lake and finally loaded it to Snowflake.Leveraged Cloudera's Apache Hadoop and Apache Spark for scalable and distributed data processing in the live project & designed and implemented optimized data pipelines for efficient data ingestion, processing, and analysis.Employed Cloudera tools such as Apache Hive and Apache Spark to optimize query performance and accelerate data processing & developed data management strategies to handle large-scale datasets, including data partitioning, compression, and indexing.Developed the Spark Scala scripts and UDFs to read from Azure Blob storage to perform transformations on large datasets using Azure Databricks.Developed the scalable data ingestion pipelines on Azure HDInsight Spark cluster using Spark SQL. Also Worked with Cosmos DB.Developed the robust ETL pipeline in Azure Data Factory to integrate data from both on-premises to cloud and applied transformations using Python, PySpark to load the enriched data to Azure SQL Data warehouse.Configured spark streaming to receive real-time data from the Apache Flume and store the stream data using Scala to Azure Table, utilized Spark Streaming API to stream data from various sources optimized existing Scala code, and improved the performance.Create pipelines in ADF using linked services to extract, transform, and load data from multiple sources like Azure SQL, Blob storage, and Azure SQL Data warehouse.Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory and Azure HD Insights.Implemented Dimensional Data Modeling to deliver Multi-Dimensional STAR schemas and Developed Snowflake Schemas by normalizing the dimension tables as appropriate.Designed & implemented the new Azure Subscriptions, data factories, Virtual Machines, SQL Azure Instances, SQL Azure DW instances, and HD Insight clusters, and installed DMGs on VMs to connect to on-premises servers.Implemented Unity Catalog in Azure Databricks, enhancing data security, lineage tracking, and access control.Streamlined data management processes by integrating Unity Catalog with existing data pipelines, improving data accessibility and operational efficiency.Developed the Spark Data Frames from various Datasets and applied business transformations and data cleansing operations in Azure Data Bricks.Demonstrated expertise in using Apache Spark's core functionalities for distributed data processing.Proficient in writing Spark applications using Scala, Java, and/or Python.Experience in using Spark SQL for querying structured data within Spark.Expertise in working with Spark DataFrames and Datasets for efficient data manipulation and transformation.Hands-on experience with Spark Streaming for real-time data processing and analytics.Implemented schema evolution and enforcement within Delta Tables(DTL) to dynamically handle changes in data structure, reducing ETL errors and simplifying maintenance.Employed Delta Lake's(DTL) time travel feature for data auditing and version control, enabling easy rollback to previous data states and ensuring compliance with data governance standards.Integrated Delta Tables(DTL) with BI tools (e.g., Power BI, Tableau) to provide real-time data insights, enhancing decision-making processes for stakeholders.Implementation of machine learning algorithms using Spark MLlib for scalable machine learning tasks.Implemented version-controlled CI/CD pipelines using tools like Jenkins and Git, ensuring consistent and reliable deployment processes across multiple environments.Managed versioning of application code and infrastructure as code (IaC) within CI/CD pipelines, enabling automated rollbacks and traceability of changes.Integrated CI/CD workflows with version control systems (e.g., Git), ensuring seamless collaboration, code reviews, and version management among development teams.ENVIRONMENT: Microsoft Azure, Azure Purview, Databricks, PySpark, Python, Azure Data Factory V2, Azure Synapse (DWH), Azure Data Lake, Azure Analysis Server, Power-BI, Azure Logic Apps, Snowflake, Oracle.Sr. Data EngineerBank of America, Plano, Texas. Jun-2019-Mar-2023Key Responsibilities:Collaborating with managers leads to understanding project requirements and creating functional specifications for Azure-based data platforms.Utilized FiveTran for data integration and automation of the ETL process.Integrated data from various sources such as Salesforce, and HubSpot.Enabled data-driven decision-making with accurate and timely data in the data warehouse.Developing architecture and data models for Azure Data Lake and Azure Data Factory to enable the migration of claims data from upstream DBs such as Oracle and storage accounts.Responsible for orchestrating the seamless Extract, Transform, and Load (ETL) process from various Source Systems into Azure Data Storage services.Creating Linked Services, Datasets, and Pipelines in Azure Data Factory to extract, transform, and load data into Azure services such as Azure Data Lake, Azure Storage, Azure SQL, and AzureSynapse DW.Developed and maintained data pipelines using Apache Airflow, automating the processing and transformation of large-scale datasets.Implemented Kafka high-level consumers to get data from Kafka partitions and move it into HDFS.Designed and implemented a real-time analytics system using Druid, Apache Kafka, and Apache Flink to analyze user behavior on our Banking platform.Work closely with the data team to ensure data integrity, quality, and consistency across all DBT models and downstream applications.Monitor and troubleshoot DBT pipelines, identifying and resolving performance bottlenecks and issues.Implemented complex task dependencies and scheduling within Airflow, ensuring efficient and reliable execution of data workflows.Work with other Databricks admins and users to ensure the effective use of the Unity Catalog. This includes providing training and support to users and working with other admins to troubleshoot issues.Expertise in utilizing Python (PySpark), Scala, and Spark SQL to efficient Databricks notebooks, facilitating the smooth transformation of data from Raw to Stage and Curated zones within Azure Data Lake Gen2.Configuring Spark clusters and optimizing high concurrency clusters in Azure Databricks to improve data processing performance.Extracting, transforming, and loading data from source systems to Azure Data Storage using Azure Data Factory and leveraging Azure Data Lake storage gen2 to store various file formats such as Excel files and parquet files.Creating data models in Azure Analysis Services to enable users to query data directly from Power BI.Developing complex Stored Procedures, Functions, Views, and tables to meet application requirements.Creating distribution tables in Azure Synapse DB and implementing user-level security in Synapse DB tables.Automating error notification and Power BI data auto-refresh using Logic Apps within the Azure Data Factory framework (V2) using Web Activity.Implementing Azure DevOps branching and merging processes for team development in Databricks Notebooks, Azure Analysis Service Tabular cubes, Azure Synapse, and Azure Data Factory pipelines for Continuous Integration (CI).Worked on Snowflake Schema, Data Modeling and Elements, and Source to Target Mappings, Interface Matrix, and Design elements. Performed data quality issue analysis using Snow SQL by building analytical warehouses on Snowflake.Implemented StreamSets for a real-time data integration project in the financial services sector. The objective was to ingest data from various sources such as relational databases and APIs, transform it in real-time, and load it into Kafka and HDFS for downstream analytics.Leveraging Azure Databricks to process and analyze data, including creating data flows and data governance and tracking flows in Azure Purview.Collaborating closely with data architects and engineers to establish robust data governance and cataloging practices. Taking an active role in implementing Master Data Management (MDM) and security measures, including the management of key vaults, and network security at both schema and row levels.Analyzing claims data using Content Manager and CRS application to identify trends and insights.ENVIRONMENT: Microsoft Azure, Databricks, PySpark, Python, Azure Data Factory V2, Azure Synapse (DWH), Azure Analysis Server, Power-BI, Azure Logic Apps, Snowflake, Content Manager, CRS, Azure Purview.AWS Data EngineerAGFA Health care,Bangalore, India Mar 2016  May 2019Key Responsibilities:Worked on building a centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift, and Athena.Lead the team in deploying methodologies, ensuring cohesiveness and adherence to best practices.Lead data curation efforts to ensure high-quality datasets, including data validation, cleansing, and monitoring processes.Involved in designing, developing, and maintaining SAP BW-related flows and reports, which are used for business intelligence and data analytics purposes. These flows and reports may include data extraction, transformation, and loading (ETL) processes, data modeling, and report development using SAP BW tools.Experience in creating HANA views, which are virtual data models that can be used to simplify data access and enable faster query processing. HANA views can be used for reporting, analytics, and other data-driven applications.Worked on migrating datasets and ETL workloads with Scala from On-prem to AWS Cloud services.Extensive experience in utilizing the ETL Process for designing and building very large-scale data using Apache Spark.Migrating the data from the local Teradata data warehouse to AWS S3 data lakes.Built a series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.Worked extensively on building and automating data pipelines and moving terabytes of data from existing data warehouses to the cloud.Responsible for Data Ingestion projects to inject the data into a data lake using multiple data source systems using Talend Bigdata.Worked extensively on fine-tuning spark applications and providing production support to various pipelines running in production.Developed Python code to gather the data from HBase and designed the solution to implement using PySpark.Developed/modified the IBM DataStage ETL Jobs, Python, UNIX BASH scripts to process the feeds and load the data into the Target database.Created wrapper scripts using UNIX shell scripting to execute Teradata utilities.Developed and optimized Python-based ETL pipelines in both legacy and distributed environments.Developed Spark with Python-based pipelines using Spark data frame operations to load data to EDL using EMR for jobs execution & AWS S3 as a storage layer.Worked closely with business teams and data science teams and ensured all the requirements were translated accurately into our data pipelines.Migration of on-premises data (SQL Server / MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).Worked on the full spectrum of Data Engineering pipelines, data ingestion, data transformations, and data Analysis/consumption with Python.Extracted the data from AWS Aurora Databases for big data processing.Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.Worked on automating the infrastructure setup, launching, and termination of EMR clusters, etc.,Created Hive external tables on top of datasets loaded in AWS S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis.Used Scala data pipelines to perform transformations on the EMR clusters and loaded the transformed data into S3 and from S3 into Redshift.Worked on creating Kafka producers using Kafka Java Producer API for connecting to external Rest live stream applications and producing messages to Kafka topic.Environment: AWS S3, Lambda, SNS, EMR, Kinesis, Sage maker, DynamoDB, Elasticsearch, CloudWatch, Redshift, Aurora, Athena, Glue, Talend, Spark, Python, Java, Hive, Kafka.Data EngineerBSW soft, Bangalore, India May 2013  Jan 2016Key Responsibilities:Configured AWS S3 buckets with policies for automatic archiving of infrequently accessed data to storage classes.Designed and implemented real-time data streaming solutions using Solace & utilized Solace messaging capabilities to enable timely data delivery and event-driven processing.Configured Solace message queues and topics for efficient and reliable data streaming & collaborated with the development team to ensure seamless integration of Solace within the project architecture.Demonstrated a commitment to continuous improvement by exploring and adopting new features and capabilities of Solace and Cloudera.Implemented DQ/data validation check using Spark on incoming messages/beacons and then stored the results in Elastic search. These DQ results were then shown on Kibana through Elastic search integration.Implemented CI/CD by automating the spark jobs build and deployment using Jenkins and Ansible.Developed the Python scripts using Boto3 to supplement automation provided by Terraform for tasks such as encrypting EBS volumes backing AMIs and scheduling Lambda functions for routine AWS tasks.Developed scalable data integration pipelines to transfer data from AWS S3 bucket to AWS Redshift database using Python and AWS Glue.Built Apache airflow DAGs to export the data to AWS S3 buckets by triggering to invoke an AWS lambda function.Analyzing files from the S3 data lake using AWS Athena and AWS Glue without importing the data into a database.Created ad-hoc tables to add schema and structure data into AWS S3 bucket using lambda function and performed data validation, filtering, sorting, and transformations for every data change in a Dynamo DB table and load the transformed data to Postgres database.Implemented Hadoop file system (HDFS), AWS S3 storage, and Bigdata formats including Parquet and AVRO JSON for enterprise data lake.Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be loaded to Glue Catalog and can be queried from AWS Athena.Optimized Druid query performance by tuning memory settings, caching strategies, and segment size.Designed and implemented a real-time analytics system using Druid, Apache Kafka, and Apache Flink to analyze user behavior on our e-commerce platform.Implemented AWS Elasticsearch for storing massive datasets in a single cluster for extensive log analysis.Automated ETL operations using Apache Airflow, optimized queries, and fine-tuned performance in AWS Redshift for large dataset migration.Configured AWS Redshift clusters, spectrums for querying, and data sharing for data transfer between clusters.The collected data in InfluxDB was viewed in Grafana  where the metrics can be queried and visualized easily.Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, HBase.Highly proficient in developing lambda functions to automate tasks on AWS using CloudWatch triggers, S3 events as well as kinesis streams.Knowledge of Airflow's built-in features for managing and scaling data pipelines, such as pooling and task prioritization.Proficient in creating calculated fields, parameters, and groups to generate custom views and analysis, sharing reports and dashboards with colleagues and clients, including publishing to Tableau Server or Tableau Online.ENVIRONMENT: Apache Spark, Python, SQL, AWS S3, Athena, SNS, SQS, Lambda, CloudWatch, Dynamo DB, Redshift, AWS Glue, EMR, Apache Kafka, Apache Hadoop, HDFS, HBase, Map Reduce, Hive, Pig, Sqoop, Flume, Shell Scripting, SFTP, Tableau.Big Data EngineerApsis Technologies, Bangalore, India May 2012  Apr 2013Key Responsibilities:Built a GCP cost optimization dashboard and automated GCP instance shutdown/startup based on Stack driver metrics.Created dashboard using Python Flask RESTful API.Integrated Flask-Application into OKTA.Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Python API.Used Spring AOP to implement Distributed declarative transactions throughout the application.Designed and developed Java batch programs in Spring Batch.Proficient in writing Stored Procedures, User Defined Functions, and Views as per project requirements.Experience in database testing to ensure data accuracy and consistency.Developed datasets using stored procedures and created reports with multi-value parameters.Deployed SSIS packages and reports in Dev and QA environments and created deployment documentation.Created SSRS reports and dashboards based on customer specifications, ensuring satisfaction with data visualization.Generated technical documentation, gathered business requirements, and analyzed data for SSIS and SSRS development.Deployed SQL Databases in the cloud (Azure) and designed SSIS packages to import data from multiple sources to control upstream and downstream of data into SQL Azure database.Improved VM performance by increasing block size from 4k to 64kb and moved all databases to another drive including system databases.Developed data reconciliation SQL queries and reports to validate overall data migration and support all scheduled ETL jobs for batch processing.Installed and configured Pig and wrote Pig Latin scripts.Created and maintained technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.Developed workflow using Oozie for running MapReduce jobs and Hive Queries.Involved in loading data from UNIX file system to HDFS.Created Java operators to process data using DAG streams and load data to HDFS.Assisted in exporting analyzed data to relational databases using Sqoop.Involved in Developing monitoring and performance metrics for Hadoop clusters.Continuous monitoring and managing of the Hadoop cluster through Cloudera Manager.Environment: Python, Flask, Google Cloud Platform, AWS, S3, Elastic Search, OKTA, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle, SSIS, SSRS, MS SQL Server, Azure VMBachelor of Computer Science Technology, JNTU Hyderabad University in 2009
Respond to this candidate
Your Message
Please type the code shown in the image: