| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Senior Data EngineerPHONE NUMBER AVAILABLEEMAIL AVAILABLEProfessional Summary:10+ years of IT experience with expertise in Hadoop ecosystem components such as HDFS, MapReduce, Yarn, HBase, Pig, ETL Data Stage, and Cloud technologies like AWS, Azure, Snowflake, Sqoop, Spark SQL, Spark Streaming, and Hive for scalability, distributed computing, and high-performance computing.Experience in gathering customer requirements, writing test cases, and partnering with developers to ensure a complete understanding of internal/external customer needs.Developed and optimized high-performing data processing frameworks leveraging Google Cloud Platform (GCP) services such as BigQuery, Dataflow, Pub/Sub, and Composer for seamless data collection, storage, processing, transformation, and aggregation.Utilized GCP services including Cloud Composer for workflow orchestration, enabling automated scheduling and monitoring of data pipelines, enhancing operational efficiency and reliability.Developed and maintained data pipelines on the Azure and AWS platforms, including the use of AWS Glue for data ingestion, transformation, and orchestration.Implemented serverless workflows using AWS Step Functions to automate and orchestrate complex data processing tasks.Experience with Spark Core, Spark SQL, and its core abstraction layer API like RDD and Data frame.Experience configuring Spark Streaming to receive real-time data from Apache Kafka and storing the stream data to HDFS and using spark-SQL with various data sources like JSON, Parquet, and Hive.Extensively used Spark Data Frames API over the Cloudera platform to perform analytics on Hive data and used Spark Data Frame Operations to perform required data validation.Hands-on experience in streaming processes like Kafka and Spark.Expertise in documenting sources to target data mappings and Business rules associated with the ETL processes.Experience in deploying agile development methodology and actively participating in daily scrum meetings.Experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.Extensive experience in storing and documentation using NoSQL Databases like MongoDB, Snowflake, and HBase.Experience visualizing the data using BI and services amp tools such as Power BI, Tableau, Plotly, and Matplotlib. Experience with Azure transformation projects and implementing ETL and data solutions using Azure Data Factory (ADF) and SSIS.Good experience in recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL Datawarehouse environment.Experience working with Amazon Web Services (EC2, S3, RDS, Elastic Load Balancing, Elastic Search, Lambdas, SQS, Cloud Watch, and Glue).Experience in setting up and configuring AWS's EMR Clusters and used Amazon IAM to grant fine-grained access to AWS resources to usersExperience working on AWS Databases like Elastic Cache (Memcached & Redis) and NoSQL databases HBase, Cassandra & MongoDB for database performance tuning & data modelling.Good expertise in HQL queries to do data analytics on top of Bigdata.Experience in creating a private cloud using Kubernetes that supports DEV, TEST, and PROD environments.Expertise in writing Sqoop jobs to migrate vast amounts of data from Relational Database Systems to HDFS/Hive Tables and vice-versa according to clients requirement.Proficient in writing Pig Scripts to generate Map reduce jobs and performed ETL procedures on the data in HDFS.Expertise in developing Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.Hands on experience in implementing many pipelines on Airflow and have good expertise on optimizing the issues related to it.Expertise in NoSQL databases and their integration with Hadoop clusters to store and retrieve vast amounts of data.Good experience in writing SQL queries and creating database objects like stored procedures, triggers, packages, and functions using SQL and PL/SQL for implementing business techniques.Experienced in development and support on Oracle, MySQL, SQL Server and MongoDB.Experience in data warehousing and business intelligence area in various domains.Expertise in creating Tableau dashboards designed with large data volumes from data source SQL servers.Extract, Transform and Load (ETL) source data into respective target tables to build Data Marts.Conducted Gap Analysis and created Use Cases, workflows, screenshots, and PowerPoint presentations for various Data Applications.Good experience in designing and implementing data structures and commonly used data business intelligence tools for data analysis.Active involvement in all scrum ceremonies Sprint Planning, Daily Scrum, Sprint Review, and Retrospective meetings - assisted Product owner in creating and prioritizing user stories.Technical Skills:Hadoop DistributionCloudera distribution and HortonworksHadoop/Spark EcosystemHadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, AirflowAWS ServicesAmazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDBAzure ServicesAzure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsightETL/BI ToolsInformatica, SSIS, Tableau, PowerBI, SSRSCI/CDJenkins, Splunk, Ant, Maven, Gradle.ContainerizationDocker, KubernetesTicketing ToolsJIRA, RemedyOperating SystemsLinux, Windows, UnixDatabasesOracle, SQL Server, MySQL, Snowflake, HBase, MongoDBProgramming LanguagesPython, Scala, Hibernate, PL/SQL, R, Shell Scripting, JavaScriptWeb/Application ServerApache Tomcat, WebLogic, WebSphereVersion ControlGit, Subversion, TFSSDLCAgile, Scrum, Waterfall, KanbanEducation:Bachelor of Technology, Computer Science Engineering May 2013 Anurag Group of Institutions, Hyderabad, Telangana, IndiaMaster of Science, Computer Science August 2015 Wright State University, Dayton, Ohio, United States of AmericaProfessional Experience:Wells Fargo, Seattle December 2022
PresentLead Big Data Engineer/Data AnalystResponsibilities:Implemented medium to large-scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HD Insight/Databricks, and NoSQL DB).Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target snowflake database.Implemented data engineering solutions leveraging Google Cloud components such as Dataflow, Data Proc, and BigQuery, optimizing data processing and analysis capabilities.Managed and automated ETL pipelines using Airflow and Cloud Composer, ensuring efficient data workflows and scheduling for timely data delivery.Wrote scripts in Hive SQL for creating complex tables with high-performance metrics like partitioning, clustering, and skewing.Designed and implemented scalable ETL processes to load data from various sources into the EDW, ensuring data integrity and consistency.Applied extensive knowledge of Capital Markets domain, including Equity and Fixed Income instruments, to develop and enhance data solutions.Loaded the data through HBase into Spark RDD and implemented in-memory data computation to generate the output response. Continuously tuned Hive UDF for faster queries by employing partitioning and bucketing.Developed PySpark script to encrypt the raw data by using Hashing algorithms concepts on client-specified columns.Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines.Worked on migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).Implemented Copy activity and custom Azure Data Factory Pipeline activities.Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store, and HD Insights.Moved data to Azure Data Lake to Azure Data Warehouse using PolyBase. Created external tables in ADW with 4 compute nodes and scheduled.Worked with data ingestions from multiple sources into the Azure SQL data warehouseTransformed and loading data into Azure SQL Database. Maintained data storage in Azure Data Lake.Configured Spark streaming to receive real-time data from Kafka and store the stream data to HDFS.Responsible for the development and maintenance of the data pipeline on the Azure Analytics platform using Azure Databricks.Integrated machine learning models into data pipelines, ensuring that data flows seamlessly from ingestion to model prediction and output generation.Optimized database performance by indexing, partitioning, and tuning SQL queries, resulting in faster data retrieval and improved overall system efficiency.Implemented data governance and quality control measures, including data profiling, cleansing, and validation, to ensure high data accuracy and reliability.Developed purging scripts and routines to purge data on Azure SQL Server and Azure Blob storage.Developed Azure Databricks notebooks to apply the business transformations and perform data cleansing operations.Developed complex data pipelines using Azure Databricks and Azure Data Factory (ADF) to create a consolidated and connected data lake environment.Worked on Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.Converted Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool.Created Sqoop job with incremental load to populate Hive External tables.Skilled in optimizing Thoughtspot performance through techniques like query tuning, indexing, and data source optimization.Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines.Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.Worked on a direct query using Power BI to compare legacy data with the current data and generated reports and stored dashboards.Environment: Python, Hadoop, Spark, Spark SQL, Spark Streaming, PySpark, Hive, Scala, MapReduce, HDFS, Kafka,Sqoop, HBase, MS Azure, Blob Storage, Data Factory, Data Bricks, SQL Data Warehouse, Apache Airflow, Snowflake, Oracle, MySQL, UNIX Shell Script, Perl, PowerShell, SSIS, Power BICisco, California June 2020 December2022Big Data EngineerResponsibilities:Developed a Data pipeline using Spark, Hive, Impala, and HBase to analyse customer behavioural data and financial histories in the Hadoop cluster.Developed Oozie Workflows for daily incremental loads, which get data from Teradata and then imported into hive tables. Involved in data ingestion of log files from various servers using NIFI.Worked datasets stored in AWS S3 buckets and used Spark data frames to perform pre-processing in Glue.Designed and developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.Collaborated with business analysts and stakeholders to define data requirements, ensuring that the EDW architecture supported business needs and reporting requirements.Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).Developed and maintained comprehensive data models, including star and snowflake schemas, to support complex analytical queries and reporting.Designed and implemented robust data pipelines to ingest, transform, and load data into Thoughtspot from various sources (e.g., databases, data warehouses, APIs).Developed Python scripts to manage AWS resources from API calls using Boto3 SDK and worked with AWS CLI.Designed and built solutions for data ingestion in both real-time& batch using Sqoop, Impala, Kafka, and Spark.Designed and implemented NLP data pipelines using tools such as Apache Kafka, Apache Spark, and AWS Glue.Created an end-to-end Machine learning pipeline in PySpark and Python.Developed ETL (Extract, Transform, Load) pipelines using Apache Airflow to automate data workflows, ensuring timely data availability for machine learning models.Developed and maintained data pipelines using Snowflake, leveraging the platform's built-in support for data integration and data warehousing.Contributed to the development of a business intelligence platform using Snowflake, used SQL to extract and transform data, and created visualizations using Tableau.Optimized the PySpark jobs to run on Kubernetes Cluster for faster data processing Build ETL/ELT pipeline in data technologies like PySpark, Hive, Presto, and Databricks.Worked in data architecture best practices, integration, and data governance solutions (Data Catalog, Data Governance frameworks, Metadata and Data Quality)Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Set up the CI/CD pipelines using Maven, GitHub, and AWS.Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).Wrote real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.Created UNIX shell scripts for parameterizing the Sqoop and Hive jobs.Integrated Oozie with the rest of the Hadoop stack, supporting several types of Hadoop jobs Map Reduce, Hive, and Sqoop, and system-specific jobs such as Java programs and Shell scripts.Successfully implemented ETL solutions between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.Environment: Python, PySpark, Spark, Hive, MapReduce, Sqoop, Spark-SQL, Kafka, Oozie, Impala, Nifi, Airflow, Snowflake, Amazon Web Services (AWS), Oracle, Maven, GitHub, Boto3, Unix Shell Scripting, TableauMerchants Insurance Group, Michigan February 2018 June2020Data EngineerResponsibilities:Designed and implemented end-to-end data solutions (storage, integration, processing, and visualization) in Azure. Developed Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analysing and transforming the data to uncover insights into customer usage patterns. Implemented ETL and data movement solutions using Azure Data Factory, SSISDeveloped dashboards and visualizations to help business users analyse data and provide insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.Migrated data from traditional database systems to Azure SQL databases.Utilized data warehousing tools and technologies such as Informatica, Talend, and AWS Redshift to manage and transform large volumes of data.Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsightCollaborated with stakeholders to understand requirements and translated them into effective, user-friendly Thoughtspot dashboards and reports.Utilized tools like MLflow or Weights & Biases to track experiments, manage model versions, and log parameters and metrics for reproducibility.Designed and implemented streaming solutions using Kafka or Azure Stream AnalyticsManaged Azure Data Lakes (ADLS) and Data Lake Analytics and understanding how to integrate with other Azure Services. Used U-SQL for data transformation as part of a cloud data integration strategy.Extracted, Transformed and Loaded data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, and U-SQL Azure Data Lake Analytics.Collaborated with data scientists, software engineers, and business analysts to understand business needs, design solutions, and deploy NLP models and pipelinesPerformed data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, and Azure DW) and processed the data in Azure Databricks.Integrated Kubernetes with the network, storage of security to provide a comprehensive infrastructure, and orchestrating the Kubernetes containers across multiple hosts.Worked with similar Microsoft on-prem data platforms, specifically SQL Server and related technologies such as SSIS, SSRS, and SSAS.Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL Datawarehouse environment. Experience in DWH/BI project implementation using Azure DFInvolved in designing Logical and Physical Data Models for Staging, DWH, and Data Mart layer.Created POWER BI Visualizations and Dashboards as per the requirementsUsed various sources to pull data into Power BI, such as SQL Server, Excel, Oracle, SQL Azure, etc.Developed dashboards and visualizations to help business users analyse data and provide insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.Environment: Python, Azure Data Lake, Scala, Hive, HDFS, Apache Spark, Kubernetes, Oozie, Sqoop, Cassandra, Shell Scripting, PowerBI, Mongo DB, Jenkins, UNIX, JIRA, GitMorgan Stanley, New York May 2016 February 2018Data Engineer/Data AnalystResponsibilities:Loaded structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.Implemented data ingestion and handling clusters in real time processing using Kafka.Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.Used Spark Streaming for collecting data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase).Worked on Spark framework on both batch and real-time data processing.Leveraged Thoughtspot's native data connectors and data prep capabilities to simplify and automate the ETL/ELT process.Developed programs for Spark streaming which takes the data from Kafka and pushes into different sources.Built scalable distributed Hadoop cluster running Hortonworks Data Platform.Built dimensional modelling, data vault architecture on Snowflake.Developed Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, PairRDD'sSerialized JSON data and storing the data into tables using Spark SQL.Loaded the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.Created different Pig scripts and converted them as shell command to provide aliases for common operation for project business flow. Implemented Partitioning, Bucketing in Hive for better organization of the data.Created few Hive UDF's to as well to hide or abstract complex repetitive rules.Performed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables. Scheduled different Snowflake jobs using NiFi. Used NiFi to ping snowflake to keep Client Session alive.Developed bash scripts to bring log files from FTP server and then processing it to load into Hive tables.Developed MapReduce programs for applying business rules on the data.Developed a NiFi Workflow to pick up the data from Data Lake as well as from server and send that to Kafka broker.Loaded and transformed large sets of structured data from router location to EDW using an Apache NiFi data pipeline flow.Implemented receiver-based approach, worked on Spark streaming for linking with Streaming Context using Python, and handled proper closing and waiting stages as well.Used Tableau to generate reports, graphs, and charts that offer an overview of the presented data.Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.Environment: Hadoop HDP, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Apache Spark, Git, SQL, LinuxAnthem, VAData Developer/Data Analyst May 2014 May2016Responsibilities:Developed highly optimized Spark applications to perform data cleansing, validation, transformation, and summarization activitiesThe data pipeline consists of Spark, Hive, and Sqoop and custom build Input Adapters to ingest, transform, and analyse operational data.Created Spark jobs and Hive Jobs to summarize and transform data.Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for huge volumes of data.Converted Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.Used different tools for data integration with various databases and Hadoop.Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for vast volumes of data.Built real-time data pipelines by developing Kafka producers and spark streaming applications for consumption.Ingested Syslog messages, parsed them, and streamed the data to Kafka.Handled importing data from different data sources into HDFS using Sqoop, performing transformations using Hive and Map Reduce, and loading data into HDFS.Exported the analysed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysisAnalysed the data by performing Hive queries (Hive QL) to study customer behaviour.Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.Developed Hive scripts in Hive QL to de-normalize and aggregate the data.Scheduled and executed workflows in Oozie to run various jobs.Implemented business logic in Hive and wrote UDFs to process the data for analysis.Addressed the issues occurring due to the vast volume of data and transitions.Designed and documented operational problems by following standards and procedures using JIRA.Environment: Python, Spark, Scala, Hive, Apache NiFi, Kafka, Flume, HDFS, Oracle, HBase, MapReduce, Oozie,SqoopCertifications: Azure certified associate.Snowflake certified |