| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateMANOGNAEmail: EMAIL AVAILABLEPH: PHONE NUMBER AVAILABLESr. Big Data Engineer / Data EngineerPROFESSIONAL SUMMARYOver 9+ years of extensive hands - on Big Data Capacity with the help of Hadoop Ecosystem across internal and cloud- based platforms.Expertise in Cloud Computing and Hadoop architecture and its various components - Hadoop File System HDFS, MapReduce, Spark, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.Strong experience using HDFS, MapReduce, Hive, Spark, Sqoop, Oozie, and HBase.Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Map R, and Amazon EMR) to fully implement and leverage new Hadoop features.Experience in developing Spark Applications using Spark RDD, Spark-SQL and Data frame APIs.Worked with real-time data processing and streaming techniques using xzSpark streaming and Kafka.Worked on Matillion to extract or transform the real time data and data integration tasks.Experienced in to ensure data quality and consistency within the data pipelines by using Matillion.Worked on the integration with Matillion on other data tools as a source of dara warehouses.Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.Database design, modeling, migration and development experience in using stored procedures, triggers, cursor, constraints and functions. Used My SQL, MS SQL Server, DB2, and OracleExperience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.Experience with Software development tools such as JIRA, Play, and GIT.Experience on Migrating SQL database to Azure Data Lake, Azure data Lake Analytics, Azure SQL Database, and Data Bricks.Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Schema Modeling, Fact and Dimension tables.Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.Experience working in different Google Cloud Platform Technologies like Big Query, Dataflow, Dataproc, Pub sub, Airflow.Design and Development of Ingestion Framework over Google Cloud and Hadoop cluster.Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud migration, cloud dataflow, Pub/sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver.Strong understanding of Java Virtual Machines and multi-threading processes.Experience in writing complex SQL queries, creating reports and dashboards.Proficient in using UNIX based Command Line Interface.Strong experience with ETL and/or orchestration tools (e.g., Talend, Oozie, Airflow)Experience setting up AWS Data Platform - AWS CloudFormation, Development End Points, AWS Glue, EMR and Jupyter/ Sagemaker Notebooks, Redshift, S3, and EC2 instancesExperienced in using Agile methodologies including extreme programming, SCRUM and Test- Driven Development (TDD)Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target databases.TECHNICAL SKILLSHadoop/Spark EcosystemHadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm.Hadoop DistributionCloudera distribution and Horton worksProgramming LanguagesScala, Hibernate, JDBC, JSON, HTML, CSS, SQL, R, Shell ScriptingScript Languages:JavaScript, jQuery, Python.DatabasesOracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQL, HBase, MongoDBCloud PlatformsAWS, Azure, GCPDistributed Messaging SystemApache KafkaData Visualization ToolsTableau, Power BI, SAS, Excel, ETLBatch ProcessingHive, MapReduce, Pig, SparkOperating SystemLinux (Ubuntu, Red Hat), Microsoft WindowsReporting Tools / ETL ToolsInformatica Power Centre, Tableau, Pentaho, SSIS, SSRS, Power BIPROFESSIONAL EXPERIENCECVS Pharmacy, MA April 2023 to PresentSr. Big Data Engineer/Data EngineerResponsibilities:Implemented Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and unstructured data to meet business functional requirements.Design and developed Batch processing and real-time processing solutions using ADF, Databricks clusters and stream Analytics.Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move Transform, Copy, filter, for each, Databricks etc. Maintain and provide support for optimal pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.Performed data flow transformation using the data flow activity.Used Polybase to load tables in Azure synapse.Implemented Azure, self-hosted integration runtime in ADF.Improved performance by optimizing computing time to process the streaming data by optimizing the cluster run time.Perform ongoing monitoring, automation, and refinement of data engineering solutions.Scheduled, automated business processes and workflows using Azure Logic Apps.Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.Created Linked services to connect the external resources to ADF.Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.Used Azure Devops & Jenkins pipelines to build and deploy different resources (Code and Infrastructure) in Azure.Ensure the developed solutions are formally documented and signed off by business.Worked with team members to resolve any technical issue, Troubleshooting, Project Risk & Issue Identification, and management.Worked on the cost estimation, billing, and implementation of services on the cloud.Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.Migration of on-premise data (Oracle/ Teradata) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/2).Work closely across teams (Support, Solution Architecture) and peers to establish and follow best practices while solving customer problems.Created infrastructure for on-time extraction. Transformation. And loading of data from a wide variety of data sources.Environment: Azure Data Factory (ADF), Databricks, Azure Stream Analytics, Azure Service Bus Queue, Azure Logic Apps, Azure Data Lake Store (ADLS), Spark, Azure DevOps, Jenkins, SQL, Oracle, Teradata.Delta Airlines, Atlanta, GASr. Big Data Engineer July 2021 to March 2023Responsibilities:Worked on UNIX and LINUX environments to work on the data files received from various clients and developed UNIX shell scripts to automate the build process and generated reports on top.Worked with AWS cloud services i.e., EC2, EMR, S3 bucketsExpertise in Snowflake to create and maintain tables and views.Experience with Snowflake cloud data warehouse for integrating the data from multiple source systems which includes loading nested json formatted data into Snowflake.Practical expertise with loading and unloading data in bulk into Snowflake using the COPY commandWorking knowledge of Python libraries such as NumPy, SciPy, matplotlib, urllib2, Data frame, Pandas, and PytablesExperience with data transformations utilizing Python in SnowflakeWorked on creating Python scripts to migrate the data from MongoDB to SnowflakeWorked on creating Python scripts to parse JSON and XML documents and load the documents into the target databaseDeveloped transformations logic using Snow Pipe. Hands-on experience working with snowflake utilities such as SnowSQL and SnowPipe.Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.Involved in Migrating Objects from Teradata to Snowflake and created Snow pipe for continuous data load.Developed a POC that leverages Snowpark, a Snowflake tool, to query and process data in a pipeline.Designed and implemented the ETL process using Talend to load data from source to targetCreated and maintained secure data transfer pipelines, including batch data processing.Worked with Autosys scheduler to schedule daily batch jobs.Developed ETL pipelines in and out of the data warehouse using PythonExperience in change implementation, monitoring, and troubleshooting of AWS, Snowflake databases, and cluster-related issues.Developed merge scripts to UPSERT data into Snowflake from an ETL source.Environment: AWS, Snowflake, SnowSQL, Python, SQL, Big data, MongoDB, Autosys, UNIX & LINUX, shell scriptingFord, Dearborn, MIAWS Data Engineer February 2019 to June 2021Responsibilities:Experience in developing real-time processing applications with spark structured streaming with Kafka and DBT (Data Build Tool) on AWS.Scheduled jobs in Databricks using Databricks workflowsUsed Spark Streaming APIs to perform transformations on the fly to build a common data model which gets the data from confluent Kafka in real-time and persist it to snowflake.Developed the streaming application from scratch which reads data from Kafka topics and writes into Snowflake by using Python/PySpark in Databricks on AWS cloudWorked with AWS cloud services i.e., EC2, EMR, S3 buckets, Cloud Watch.A developed Python script, UDFS using both data frames and RDD in spark for data aggregation.Expertise in snowflake to create and maintain tables and views.Experience with snowflake cloud data warehouse for integrating the data from multiple source systems which include loading nested json formatted data into snowflake.Developed spark applications using PySpark and spark-SQL for data extraction, transformation and aggregation from multiple file formats for analysing& transforming the data to uncover insights into the customer usage patterns.Developed transformations logic using Snow Pipe. Hands on experience working with snowflake utilities such as SnowSQL and SnowPipe.Developed Kafka producer and consumers, HBase clients, and Spark jobs using Python along with components on HDFS, and Hive.Created Kafka producer API to send live-stream data into various Kafka topics and developed Spark- Streaming applications to consume the data from Kafka topics and insert the processed streams into HBase.Worked in a Production Environment which involves building CI/CD pipeline using Jenkins with various stages starting from code checkout from GitHub to Deploying code in a specific environment.Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance, Metadata Management, Master Data Management, and Configuration Management.Developed ETLs using PySpark development. Used both Data frame API and Spark SQL API.Created, and provisioned different Data bricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.Environment: Apache Spark, Confluent Kafka, data bricks, AWS, snowflake, DBT, Docker, PySpark, Python, SQL, Scala, Big data, EMR, S3ICICI Bank, IndiaETL Developer September 2016 to November 2018Responsibilities:Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.Transformed the data using AWS Glue dynamic frames with PySpark; catalogued the transformed data using Crawlers and scheduled the job and crawler using workflow feature.Created PySpark Glue jobs to implement data transformation logics in AWS and stored output in Redshift cluster.Designed and developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.Architect and design serverless application CI/CD by using AWS Serverless (Lambda) application.Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using PySpark.Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.Developed complex Talend ETL jobs to migrate the data from flat files to database. Pulled files from mainframe into Talend execution server using multiple ftp components.Utilized Agile and Scrum methodology for team and project management.Created and published multiple dashboards and reports using Tableau server.Environment: Hadoop, Spark, Kafka, Map Reduce, Hive, AWS, Lambda, EMR, EC2, S3 Buckets, RedShift, Oracle, My SQL, Python, Snowflake, Talend, Teradata, Tableau, Cassandra.Zensar Technologies, IndiaData Warehouse Developer May 2014 to August 2016Responsibilities:Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support intraday and real-time data processing.Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.Creating pipelines, data flows, and complex data transformations and manipulations in ADF and PySpark with Databricks.Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.Create pipelines in Azure Data Factory (ADF) using linked services to extract, transform and load data from multiple sources like Azure SQL, Blob storage, and Azure SQL Data warehouse.Developed Python, PySpark, and Bash scripts to Transform, and Load data on a cloud platform.Used Spark and Spark-SQL to read the parquet data and create the tables in hive using Scala API.Installed and configured Airflow for workflow management and created workflows in Python.Deployed the Big Data Hadoop application using Talend on cloud Microsoft Azure Source Analysis, tracing back the sources of the data and finding its roots through Teradata, DB2, etc.Implemented Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronizing installers, configuration modules, packages, and requirements for the applications.Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.Designed and developed the code without any issues and deployed to all environments without any issues.Developed storytelling dashboards in Tableau Desktop and published them to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on-demand information.Designed and Developed data mapping procedures ETL-Data Extraction, Data.Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension)Environment: Apache Spark, Hadoop, PySpark, HDFS, Cloudera, Azure, ADF, Databricks, Kafka, Informatica, Docker, Jenkins, Kubernetes, Nifi, Teradata, DB2, SQL Server, MongoDB, Shell Scripting. |