Quantcast

Senior Cloud Engineer Resume Liberty, NC
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Senior Cloud Engineer
Target Location US-NC-Liberty
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Senior Quality Engineer / Scrum Master Apex, NC

Senior App. Development Engineer Cary, NC

Test QA Engineer Senior Software Durham, NC

Infrastructure Engineer Senior Principal Apex, NC

Information Technology Cloud Engineer Apex, NC

Senior Network Engineer Durham, NC

Project Manager Senior Cary, NC

Click here or scroll down to respond to this candidate
Senior Data EngineerName: Manali
Phone: EMAIL AVAILABLEEmail: PHONE NUMBER AVAILABLEPROFESSIONAL SUMMARY:      Senior Data Engineer with around 10+ years of IT experience specializing in the analysis, design, development, implementation, and testing of Data Warehousing applications. Proficient in Data Modeling, Data Engineering, Data Extraction, Data Transformation, Data Loading, Data Analysis, and Performance Tuning Techniques.      Strong experience in Apache Hadoop ecosystem, including Hadoop HDFS, Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hado0p Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, and Spark for large-scale data processing using Java and Scala.      In-depth understanding of Spark architecture, distributed and parallel processing, utilizing Spark Core, Spark SQL, Spark Streaming, Spark DataFrames APIs, Spark execution framework, and PySpark.      Extensive expertise in AWS services including AWS EC2, AWS S3, AWS EMR, AWS SageMaker, AWS RDS (Aurora), AWS Redshift, AWS DynamoDB, AWS Elasticache (Memcached & Redis), AWS QuickSight, AWS Athena, AWS Glue, AWS Lambda, and other components of the AWS ecosystem.      Experienced in designing, developing, documenting, and testing ETL jobs and mappings using Informatica, Talend, SSIS, and Apache NiFi for populating tables in Data Warehouses and Data Marts.      Strong knowledge of Azure cloud platforms such as Azure HDInsight, Azure Data Lake, Azure Databricks, Azure Blob Storage, Azure Data Factory (ADF), Azure Synapse Analytics, Azure Stream Analytics, Azure Cosmos DB, Azure Functions, and Azure DevOps.      Proficient with databases including Oracle, PL/SQL, MySQL, PostgreSQL, Microsoft SQL Server, Cassandra, and MongoDB, involving stored procedures, triggers, functions, indexes, and packages.      Skilled in cloud technologies like AWS Step Functions, AWS Lambda, AWS QuickSight, AWS CloudWatch, AWS Glue, AWS Athena, AWS Redshift, AWS IAM, AWS EMR, AWS IAM, and AWS SNS.      Expert in migrating big data systems built on Java, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive and Hadoop Pig to Spark and cloud-based systems.      Hands-on experience in GCP, BigQuery, GCS bucket, G-cloud function, GCP cloud migration, GCP Cloud Dataflow, GCP Pub/Sub, GCP Cloud Shell, GSUTIL, GCP DataProc, and Stackdriver.      Experience in CI/CD pipelines and automation using Git, Jenkins, Terraform, Kubernetes, and Docker.      Expertise in designing and implementing data warehouses and data marts with concepts like conformed Facts & Dimensions, Slowly Changing Dimensions (SCD), Change Data Capture (CDC), Surrogate Keys, Star Schema, and Snowflake Schema.      Experience developing Apache Kafka producers and Apache Kafka Consumers for streaming millions of events per second on streaming data, along with expertise in Apache Flink.      Proficient in performance tuning and optimization techniques, including SQL Trace, Explain Plan, Indexes, Hints, Table Partitions, Global temporary tables, and Materialized Views.      Strong background in Data Modeling Relational, Data Modeling of Data warehouse, Data Migration Using ETL tools like Informatica, shell scripting, and Data Reporting.      Experience in normalization/denormalization, data extraction, data analysis, data cleansing, data profiling, data manipulation, distributed data processing, and slowly changing dimensions techniques.      Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.      Familiarity with both Waterfall and Agile methodologies, utilizing tools like JIRA and Confluence.      Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables, OLAP.      Knowledge of various data serialization formats including Avro, Parquet, ORC, JSON, and XML.      Expertise in Data Pipeline Optimization, Data Governance, Data Security, GDPR and CCPA Compliance, Data Encryption, Data Ingestion, Data Integration, Data processing, Data Migration, Data Transformation, Data Extraction, Data Loading, Real-Time Data Processing, Data Pipeline Optimization, and Data Modeling.      Familiar with Data Profiling, Data Deduplication, Data Validation, and Data Cleansing.      Experience in using Tableau and Power BI for data visualization and reporting.      Skilled in utilizing Apache Airflow, Luigi, and Dagster for workflow orchestration and management.
EDUCATION: Bachelors in Information technology from Mumbai University, India   2011      SQL      HDFS      Pig      PL/SQL      Hadoop Hive      Cloudera - CDH      T-SQL      Impala      Spark Core      HiveQL      MapReduce      Spark Sql      SparkQL      GreenPlum      Spark Streaming      GraphQL      Hadoop Sqoop      SparkR      Hadoop      Spark Data Frames      Spark performance tuning      ADF      Azure Data Lake      AWS DynamoDB      Apache Kafka      Jenkins      Hadoop Hbase      Spark MLlib      Informatica      AWS Glue      Snowflake      Oracle SQL      MongoDB      AWS Elasticsearch      PySpark      Spark GraphX      SSIS      AWS Redshift      Apache Iceberg      Teradata      Apache Airflow      Python, Scala, JavaTECHNICAL SKILLS AND ABILITIESWORK SUMMARY:LTImindtree/Delta, Atlanta, GA							   July 2021- CurrentSenior Data Engineer
      Developed various data loading strategies and performed various transformations for analyzing the datasets using Hortonworks Distribution for the Hadoop ecosystem, including HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, Oozie, and Spark.      Developed PySpark applications for Spark SQL, DataFrames, and transformations using Python APIs to perform business requirements on Hive staging tables, and loaded the final transformed data into Hive master tables.      Involved in ingesting large volumes of credit data from multiple provider data sources to AWS S3. Created modular and independent components for AWS S3 connections, data reads.      Developed Spark code using Python to run in the EMR clusters and utilized Spark Core, Spark SQL, and Spark Streaming.      Created User Defined Functions (UDF) using Scala to automate some business logic in the applications.      Automated jobs and data pipelines using AWS Step Functions, AWS Lambda, and configured various performance metrics using AWS CloudWatch.      Worked using Apache Hadoop ecosystem components like HDFS, Hive, Kafka, Pig, and MapReduce.      Designed AWS Glue pipelines to ingest, process, and store data interacting with different services in AWS.      Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).      Developed a process to migrate local logs to CloudWatch for better integration and monitoring.      Populated database tables via AWS Kinesis Firehose and AWS Redshift.      Developed Spark code using Python and Spark-SQL for faster testing and data processing.      Created Hive External tables, loaded the data into tables, and queried data using HQL.      Developed ETL modules and data workflows for solution accelerators using PySpark and Spark SQL.      Used Spark SQL to process a huge amount of structured data.      Extracted data from MySQL, PostgreSQL, Cassandra, and AWS RedShift into HDFS using Kinesis.      Developed PySpark application for creating reporting tables with different masking in both Hive and MySQL DB and made them available for newly built fetch APIs.      Wrote numerous Spark code in Scala for information extraction, transformation, and aggregation from numerous record designs.      Supported Kafka integrations, including topics, producers, consumers, Schema Registry, Kafka Control Center, KSQL, and streaming applications.      Built and maintained complex statistical routines using PC SAS macros, Enterprise Guide, PL/SQL, and software written by self and others.      Experienced in utilizing Apache Airflow, Luigi, and Dagster for workflow orchestration and management.      Developed data pipelines using Informatica, Talend, SSIS, and Apache NiFi for ETL processes and data integration.      Proficient in data scripting and automation tasks using Python and shell scripting.      Utilized Docker and Kubernetes for containerization and orchestration of applications in CI/CD pipelines.      Familiar with data visualization tools like Tableau and Power BI for creating insightful reports.      Experienced in working with data serialization formats, including Avro, Parquet, ORC, JSON, and XML.      Ensured GDPR and CCPA compliance, data encryption, data governance, data security, data cleansing, data validation, data profiling, data deduplication, data ingestion, data integration, data processing, data migration, data transformation, data extraction, data loading, real-time data processing, and data pipeline optimization.      Expertise in designing and implementing data warehouses and data marts using Star Schema and Snowflake Schema for OLAP systems, including fact tables and dimension tables.      Skilled in developing and managing data pipelines for data modeling and ETL processes.      Experienced in using GIT, SVN, JIRA, and Confluence for version control and project management.      Proficient in utilizing Terraform and Jenkins for infrastructure as code and CI/CD pipeline automation.      Knowledgeable in data modeling concepts (Dimensional & Relational), including normalization denormalization, slowly changing dimensions techniques, and OLAP.      Developed and maintained data processing frameworks using Apache Flink for real-time data processing.      Implemented and managed Snowflake as a data warehouse solution for scalable data storage and analytics.      Leveraged Databricks Delta Lake for optimized data lake management and enhanced data reliability.Environment: SQL, NoSQL, Python, Java, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hadoop Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, Spark, PySpark, Spark Core, Spark SQL, Spark Streaming, PostgreSQL, MySQL, Cassandra, Informatica, Apache Airflow, Tableau, Docker, Kubernetes, Apache Kafka, Apache Flink, Snowflake, Databricks Delta Lake, Star Schema, Snowflake Schema, OLAP, Fact Tables and Dimension Tables, Avro, Parquet, ORC, JSON, XML, AWS S3, AWS EC2, AWS Glue, AWS Lambda, AWS Athena, AWS EMR, GIT, SVN, JIRA, Confluence, Terraform, Jenkins, ETLAT&T, Dallas, TX					                      		 	 Apr 2020 - Jun 2021Senior Cloud Data Engineer
      Developed detailed specifications and prepared technical documentation to ensure adherence to project guidelines and successful implementation.      Utilized Azure Databricks, Azure SQL, PostgreSQL, Microsoft SQL Server, and Oracle to design and implement ETL processes, extracting, transforming, and loading data from diverse sources into target databases.      Migrated existing data pipelines to Azure Databricks using PySpark Notebooks to enhance data processing capabilities for the analytical team.      Designed and developed systems integration solutions and cloud architecture leveraging Azure services such as Azure Data Lake, Azure Data Factory, Azure Stream Analytics, Azure Cosmos DB, Azure Synapse Analytics, Azure Functions, and Azure DevOps.      Developed a PySpark script to encrypt raw data using hashing algorithms, enhancing data security for sensitive information.      Analyzed a Hadoop cluster using tools like Pig, Hive, HBase, MapReduce, YARN, Sqoop, and Oozie, improving data processing and management.      Performed data transformations, cleaning, and filtering on imported data using Hive, MapReduce, and Spark for retail analytics.      Loaded tables from Azure Data Lake to Azure Blob Storage and then to Snowflake, creating sophisticated SQL queries for financial and regulatory reporting.      Configured PostgreSQL Streaming Replication and pgpool for load balancing and optimized performance monitoring using tools like PgBadger, Kibana, Grafana, and Nagios for an e-commerce application.      Processed and loaded various gold layer tables from Delta Lake into Snowflake, optimizing data management and reliability.      Developed ETL pipelines using SSIS (SQL Server Integration Services) and Informatica to prepare data lakes for various domains, including healthcare and finance.      Executed Python API programs to support Apache Spark and PySpark for real-time data processing.      Developed Hadoop jobs for analyzing data in text format files, sequence files, Parquet files, Avro, ORC, JSON, and XML using Hive and Pig.      Analyzed Hadoop clusters and components such as Pig, Hive, Spark, and Impala to improve big data analytics for telecommunications clients.      Orchestrated and automated ETL data pipelines using Apache Airflow, and implemented Airflow for authoring, scheduling, and monitoring data pipelines.      Monitored PostgreSQL databases using Nagios for performance and reliability in a logistics application.      Integrated Azure Databricks with Snowflake to streamline data processing workflows.      Designed and implemented Oracle PL/SQL and Shell Scripts for data import/export, data conversions, and data cleansing for banking applications.      Imported data from PostgreSQL to HDFS and Hive using Sqoop, facilitating efficient data migration.      Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats. Integrated Kafka with Spark Streaming to handle real-time data processing.      Utilized Docker and Kubernetes for containerization and orchestration of applications in CI/CD pipelines, improving deployment efficiency for various projects.      Created data visualization reports using Power BI, providing insights and data governance for clients in multiple industries.      Implemented comprehensive data security measures, ensuring compliance with GDPR and CCPA.      Automated CI/CD pipelines using Git, SVN, Jenkins, and Terraform, enhancing project management and deployment efficiency for software development projects.      Managed data warehouses and data marts using Star Schema and Snowflake Schema for OLAP systems, including fact tables and dimension tables.      Developed real-time data processing solutions using Apache Flink, optimizing data flow and processing for IoT applications.Environment:   SQL, NoSQL, Java, Scala, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hadoop Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, Spark, PySpark, Spark Core, Spark SQL, Spark Streaming, Microsoft SQL Server, Oracle, MongoDB, SSIS , Python, Apache Airflow, Power BI, Docker, Kubernetes, Apache Kafka, Apache Flink, Snowflake, Azure Synapse Analytics, Databricks Delta Lake, Star Schema, Snowflake Schema, OLAP, Azure Data Lake, Azure Databricks, Azure Data Factory, Azure Stream Analytics, Azure Cosmos DB, Azure Functions, Azure DevOps, GIT, SVN, JIRA, Confluence, Terraform, Jenkins, ETL, CI/CDFarmers Insurance, Boston, MA						           Jan 2018 - Mar 2020Data Engineer      Involved in the development of Hadoop System and improving multi-node Hadoop Cluster performance, utilizing HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, and Oozie.      Designed Big Data analytics platform for processing customer interface preferences and comments using Hadoop technologies such as HDFS, MapReduce, and Hive.      Contributed to the development of data translation processes, transferring data from the client's relational databases (SQL, MySQL, and Oracle) to the data warehouse.      Worked as a Hadoop consultant on technologies like MapReduce, Pig, and Hive to optimize data processing workflows.      Used Spark RDDs and PySpark to convert Hive/SQL queries into Spark transformations, improving query performance and scalability.      Installed and configured Pig, and wrote Pig Latin scripts to perform data transformations and analysis.      Developed multiple proof-of-concepts in Python and deployed them on the YARN cluster, comparing the performance of Spark, Hive, and SQL.      Analyzed SQL scripts and designed Python solutions to implement them, leveraging PySpark for enhanced processing capabilities.      Created data platforms, pipelines, and storage systems using Apache Kafka, Apache Flink, and search technologies such as Elasticsearch.      Implemented solutions for ingesting data from various sources utilizing Big Data technologies such as Hadoop, Kafka, MapReduce frameworks, Hive, and Spark.      Queried and analyzed data using Spark SQL on top of the PySpark engine to derive insights and generate reports.      Migrated iterative MapReduce programs into Spark transformations, significantly reducing processing time and improving efficiency.      Analyzed and processed data stored in S3 using AWS Athena, Glue crawlers, and creating Glue jobs for ETL operations.      Implemented AWS services like EC2, S3, Athena, Glue, Lambda, and Redshift to deploy multi-tier advertiser applications ensuring fault tolerance, high availability, and auto-scaling in AWS CloudFormation.      Created Spark scripts using Python shell commands as required for data processing tasks.      Designed Hive-compatible table schemas on top of raw data in Data Lake, partitioned by time and product dimensions, and performed ad-hoc queries using AWS Athena.      Used Hive to analyze partitioned and bucketed data, calculating various metrics for reporting and business insights.      Analyzed large datasets using MapReduce programs to determine the best ways to aggregate and report on the data.      Designed complex data maps ranging from simple to intricate to meet business requirements.      Utilized Talend for data integration tasks, enhancing ETL processes and ensuring seamless data flow between systems.      Employed Docker and Kubernetes for containerization and orchestration, streamlining application deployment and management.      Used Tableau for data visualization, creating insightful dashboards and reports for stakeholders.      Applied data cleansing, validation, profiling, and deduplication techniques to ensure data quality and integrity.      Ensured compliance with GDPR and CCPA regulations by implementing robust data governance and security measures, including data encryption.      Automated CI/CD pipelines using Git, SVN, Jenkins, and Terraform, enhancing project management and deployment efficiency.      Managed data warehouses and data marts using Star Schema and Snowflake Schema for OLAP systems, including fact tables and dimension tables.      Developed real-time data processing solutions using Apache Flink, optimizing data flow and processing for various applications.      Employed AWS Glue for ETL tasks and AWS Lambda for serverless data processing to streamline workflows and improve efficiency.      Ensured data ingestion and integration from multiple sources, maintaining seamless data processing pipelines for continuous data flow.      Used Luigi for workflow management, improving the orchestration and scheduling of data pipelines.      Handled NoSQL databases like MongoDB for flexible data storage solutions.      Utilized Avro, Parquet, ORC, JSON, and XML for data serialization and storage formats to enhance data processing efficiency.Environment: SQL, NoSQL, Python, Java, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hadoop Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, Spark, PySpark, Spark Core, Spark SQL, Spark Streaming, MySQL, Oracle, MongoDB, Talend, Luigi, Tableau, Docker, Kubernetes, Apache Kafka, Apache Flink, AWS Redshift, Databricks Delta Lake, Star Schema, Snowflake Schema, OLAP, Fact Tables and Dimension Tables, Avro, Parquet, ORC, JSON, XML, AWS S3, AWS EC2, AWS Glue, AWS Lambda, AWS Athena, AWS EMR, GIT, SVN, JIRA, Confluence, Terraform, JenkinsFidelity Investments, Boston, MA	             	                                           	Sep 2015 - Dec 2017                       Hadoop Big Data Engineer      Worked on HDFS, YARN, MapReduce, Apache Pig, Hive, Sqoop, Oozie, and Kafka frameworks within the Hadoop ecosystem.      Utilized ETL tools and Hadoop transformation to work with Hadoop data intake (using MapReduce, Spark, and Apache NiFi).      Created scalable distributed data processing solutions using Hadoop's Hive, HBase (NoSQL), and Sqoop.      Developed several MapReduce tasks in Java for data cleansing and processing.      Installed and configured Hadoop, MapReduce, and HDFS (Hadoop Distributed File System).      Configured shell scripts to monitor Hadoop daemon services, ensuring health checks and responding to failure or alert situations.      Extracted useful information from raw data to aid business decision-making.      Created, deployed, and troubleshooted ETL workflows using Hive, Pig, Sqoop, and Spark, integrating with Apache NiFi and Dagster for workflow management.      Built big data solutions with HBase, handling millions of records for trend analysis, and exporting them to Hive.      Automated scripts and workflows using Apache Airflow and shell scripting to ensure daily execution in production environments.      Performed data transformations, cleaning, and filtering on imported data using Hive and MapReduce, then loaded the resulting data into HDFS.      Created Pig UDFs to preprocess data before analysis, enhancing data quality and accuracy.      Configured Apache Airflow workflows for managing and scheduling Hadoop jobs.      Utilized Python for scripting and automation tasks, integrating with PySpark for enhanced data processing.      Employed Docker and Kubernetes for containerization and orchestration, streamlining deployment processes.      Developed real-time data processing solutions using Apache Flink, optimizing data flow for various applications.      Used Tableau and Power BI for data visualization, creating insightful dashboards and reports for stakeholders.      Automated CI/CD pipelines using Git, SVN, Jenkins, and Terraform, enhancing project management and deployment efficiency.      Managed data warehouses and data marts using Star Schema and Snowflake Schema for OLAP systems, including fact tables and dimension tables.      Handled NoSQL databases like MongoDB for flexible data storage solutions.      Utilized Avro, Parquet, ORC, JSON, and XML for data serialization and storage formats to enhance data processing efficiency.      Implemented data integration and ingestion pipelines using Talend and Apache NiFi.      Utilized Azure services like Azure Synapse Analytics, Azure Data Lake, Azure Databricks, Azure Data Factory, Azure Stream Analytics, Azure Cosmos DB, Azure Functions, and Azure DevOps for various cloud-based data solutions.      Applied data cleansing, validation, profiling, and deduplication techniques to ensure data quality and integrity.      Designed and implemented data extraction, transformation, and loading (ETL) processes to facilitate seamless data migration and integration.      Developed and managed real-time data processing solutions using Spark Streaming, ensuring continuous data flow and processing.      Optimized data pipelines for performance and scalability, ensuring efficient data processing and integration.      Utilized Confluence and JIRA for project management and collaboration, ensuring effective communication and documentation.Environment:
SQL, NoSQL, Python, Java, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hadoop Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, Spark, PySpark, Spark Core, Spark SQL, Spark Streaming, PostgreSQL, Oracle, MongoDB, Apache NiFi, Dagster, Power BI, Docker, Kubernetes, Apache Kafka, Apache Flink, Azure Synapse Analytics, Databricks Delta Lake, Star Schema, Snowflake Schema, OLAP, Fact Tables and Dimension Tables, Avro, Parquet, ORC, JSON, XML, Azure Data Lake, Azure Databricks, Azure Data Factory, Azure Stream Analytics, Azure Cosmos DB, Azure Functions, Azure DevOps, GIT, SVN, JIRA, Confluence, Terraform, Jenkins, ETLZensar Technologies, Pune, India						         Mar 2012 - Oct 2014Big Data Engineer      Drafted and optimized SQL scripts to assess the flow of online quotes into the database, ensuring data validation and maintained SQL and PL/SQL stored procedures, triggers, partitions, primary keys, indexes, constraints, and views.      Created bucketed tables in Hive to optimize map side joins and job efficiency, including data partitioning for Hive queries.      Wrote MapReduce programs and Hive queries for data loading and processing within the Hadoop File System.      Configured and maintained Apache Hadoop clusters and tools like Hive, HBase, and Sqoop.      Utilized Sqoop to transfer data from Oracle databases into Hive tables.      Used AWS Redshift procedures to load aggregated data into AWS Redshift tables.      Configured session and mapping parameters for adaptable runs with variable modifications.      Monitored workflows using Apache Airflow and Workflow Monitor.      Managed Metadata Warehouse, establishing naming and warehouse standards for future applications.      Enhanced mapping performance by optimizing target bottlenecks and implementing pipeline partitioning.      Utilized Docker and Kubernetes for containerization and orchestration, streamlining deployment processes.      Employed Apache Kafka for real-time data streaming and integration with big data tools.      Implemented real-time data processing solutions using Apache Flink, optimizing data flow for various applications.      Used Tableau for data visualization, creating insightful dashboards and reports for stakeholders.      Automated CI/CD pipelines using Git, SVN, Jenkins, and Terraform, enhancing project management and deployment.      Managed data warehouses using Star Schema and Snowflake Schema for OLAP systems, including fact tables and dimension tables.      Used AWS services such as AWS EC2, S3, Athena, Glue, Lambda, and EMR for deploying and managing applications.      Handled NoSQL databases like MongoDB for flexible data storage solutions.      Implemented data integration and ingestion pipelines using Talend and Apache NiFi.      Applied data cleansing, validation, profiling, and deduplication techniques to ensure data quality and integrity.      Designed and implemented data extraction, transformation, and loading (ETL) processes to facilitate seamless data migration and integration.      Developed and managed real-time data processing solutions using Spark Streaming, ensuring continuous data flow and processing.      Optimized data pipelines for performance and scalability, ensuring efficient data processing and integration.Environment: SQL, NoSQL, Java, Hadoop, HDFS (Hadoop Distributed File System), Hadoop MapReduce, Hadoop YARN, Hadoop Hive, Hadoop Pig, Hadoop HBase, Hadoop Sqoop, Hadoop Oozie, Spark, Oracle, MongoDB, Informatica, Apache Airflow, Tableau, Docker, Kubernetes, Apache Kafka, Apache Flink, AWS Redshift, Star Schema, Snowflake Schema, OLAP, Fact Tables and Dimension Tables, Avro, Parquet, ORC, JSON, XML, AWS S3, AWS EC2, AWS Glue, AWS Lambda, AWS Athena, AWS EMR, GIT, SVN, JIRA, Confluence, Terraform, Jenkins

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise