| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Contact : PHONE NUMBER AVAILABLEMail ID : EMAIL AVAILABLEProfessional Summary:Experience with the use of Azure services like Azure SQL Database, Networking, Azure DNS, Azure Active Directory, Azure Blob Storage, Azure Virtual Machines, and administering Azure resources using Azure Portal & Azure CLI.Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, Data Proc.Comprehensive working experience in implementing Big Data projects using Apache Hadoop, Pig, Hive, HBase, Spark, Sqoop, Flume, Zookeeper, Oozie.Experience working on Hortonworks / Cloudera / Map R.Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers, etc.In-depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages, and task.Experience working on Spark and Spark Streaming.Hands-on experience with major components in Hadoop Ecosystem like Map Reduce, HDFS, YARN, Hive, Pig, HBase, Sqoop, Oozie, Cassandra, Impala, and Flume.Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Spark, Kafka, Storm, Zookeeper, and FlumeExperience with new Hadoop 2.0 architecture YARN and developing YARN Applications on itWorked on Performance Tuning to Ensure that assigned systems were patched, configured, and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availabilityExperience with distributed systems, large-scale non-relational data stores, and multi-terabyte data warehouses.Firm grip on data modeling, data marts, database performance tuning, and NoSQL map-reduce systemsExperience in managing and reviewing Hadoop log filesReal-time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing, and analysis of dataExperience in setting up Hadoop clusters on cloud platforms like Azure.Customized the dashboards and done access management and identity in AzureWorked on Data serialization formats to convert complex objects into sequence bits using Avro, Parquet, JSON, CSV formats.Expertise in extending Hive and Pig core functionality by writing custom UDFs and UDAFs.Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning, and buckets.Proficient in NoSQL databases like HBase.Experience in importing and exporting data using Sqoop between HDFS and Relational Database Systems.Built Talend and NiFi integrations for ingestion bi-directional data into different sources.Hands-on experience building data pipelines using Hadoop components Sqoop, Hive, Pig, MapReduce, Spark, Spark SQL.Loaded and transformed large sets of structured, semi-structured, and unstructured data in various formats like text, zip, XML, and JSON.Experience in designing both time-driven and data-driven automated workflows using Oozie.Good understanding of Zookeeper for monitoring and managing Hadoop jobs.Monitoring Map Reduce Jobs and YARN Applications.Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.Work experience with cloud infrastructures such as Azure Services Compute, Amazon Web Services (AWS) EC2, and S3.Used Git for source code and version control management.Experience with RDBMS and writing SQL and PL/SQL scripts used in stored procedures.Proficient in Java, J2EE, JDBC, Collection Framework, JSON, XML, REST, SOAP Web services. Strong understanding of Agile and Waterfall SDLC methodologies.Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer.Have excellent problem solving, proactive thinking, analytical, programming, and communication skills.Experience working both independently and collaboratively to solve problems and deliver high-quality results in a fast-paced, unstructured environment.Technical Skills:Big Data FrameworksHadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming, Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra.Bigdata distributionCloudera, Hortonworks, AzureProgramming languagesCore Java, Scala, Python, Shell scriptingOperating SystemsWindows, Linux (Ubuntu, Cent OS)DatabasesOracle, SQL Server, MySQLDesigning ToolsUML, VisioIDEsEclipse, NetBeansJava TechnologiesJSP, JDBC, Servlets, JunitWeb TechnologiesXML, HTML, JavaScript, jQuery, JSONLinux ExperienceSystem Administration Tools, PuppetDevelopment methodologiesAgile, WaterfallLogging ToolsLog4jApplication / Web ServersApache Tomcat, WebSphereMessaging ServicesActiveMQ, Kafka, JMSVersion ToolsGit and CVSOthersPutty, WinSCP, Data Lake, Talend, Azure, TerraformPEER PARTICIPATIONS and CERTIFICATIONS:Azure Data Engineer ProfessionalAzure AI Engineer AssociateGoogle Cloud Professional Machine Learning EngineerGoogle Cloud Professional Database EngineerCertified: Python ProgrammingEducation Details:Bachelors in Mechanical Engineering (India, 2014)Masters in computer science University of the Cumberlands, KY (2018)Professional Experience.Client: Ralph Lauren -Sterling, VirginiaRole: Azure Data Developer March 2022 to PresentResponsibilities:Building robust and scalable data integration (ETL) pipelines using SQL, Azure Data Lake, and Azure Databricks.Designing solutions based on needs gathered after discussing with end users/Stakeholders.Code/implement solutions based on design adhering to department best practices and processes.Developing codes using advanced programming languages (Python) to gather and parse the incoming data from multiple data sources (Azure Blob Storage and Azure SQL Data Warehouse), processing it, and creating views for the downstream teams to consume for further processing.Fine-tuning the performance of PySpark run-time jobs on Azure Databricks depending on different factors (file format, compatibility with tools required) and incorporating techniques like salting and partitioning.Testing for edge or outlier cases with the pipeline to avoid downtime. Testing the developed pipelines in different environments (developing, staging, and production) to validate the desired outcomes.Worked in moving high and low volume data objects from Teradata and Hadoop to Azure Synapse.Documenting the successfully created pipelines in the PostgreSQL database so the Analytics Operations team can access the views.Created interactive visualizations and dashboards using Power BI that enabled business users and executives to explore product usage and customer trends.Merging all the completed codes to Azure Repos, updating details about the job in Confluence, and ensuring all the stakeholders are notified about the completed jobs and the resolved errors.Collaborate with Software Solution team members and other staff to validate desired outcomes for code before, during, and post-development.Involved in Data Modeling using Star Schema, Snowflake Schema.Cleaning the raw data to make it available for visualization leads to better business and decisions.Based on the business requirements from the business development team and discussions with MTD (Developers) team, developing data pipelines that would be easier for other teams to transition and work with.Experienced in loading and transforming large sets of structured, semi-structured data using ingestion tool Talend.Experience in Power BI calculations and applying complex calculations to large, complex data sets.Worked on Azure Synapse database on queries and writing Stored Procedures for normalization.Worked with Azure Synapses stored procedures, used procedures with corresponding DDL statements, used JavaScript API to easily wrap and execute numerous SQL queries.Training technical staff to understand how to access/utilize the delivered solution.Responsible for scheduled maintenance of Azure File Storage, Azure Databricks clusters, and other deployed resources for testing, staging, and production phases.Reviewing and debugging the codes to speed up production.Environment: Hadoop, HDFS, Azure Data Factory, Azure Data Lake Analytics, Azure HDInsight, Azure Synapse Analytics, Azure HDInsight, Azure Data Factory, Hive NoSQL HBase, Shell Scripting, Scala, Spark SQL, Azure SQL Database, Power BI.Client: PayPal (Remote) - Austin, Texas September 2021 to January 2022Role: GCP Data EngineerResponsibilities:Analyzed the scope of migrating existing data and its pipeline to GCP cloud.Analyzed data pulling from EIDW (Teradata) and sent them as files on which transformation was applied and was loaded in SQL server tables.Analyzed ML models created from the data.Implementation of DataLake architecture in GCP and finally the data was to be loaded into Big Query.Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.Migrating data from FS to Snowflake within the organizationAnalyzed the Python job scripts on existing pipeline and formed mapping document between source and target and their corresponding transformation logic of database tables.Creating data dictionary documents like collection of names, definitions, attributes, and table properties of databases through SQL Server Management Studio.Used Hive to analyze the partitioned and bucketed data and computed various metrics for reporting.Implemented data streaming capability using Kafka and Talend for multiple data sources.Developed a POC for project migration from on prem Hadoop MapR system to SnowflakeAnalyzed the data by performing Hive queries (Hive SQL) and running Pig scripts (Pig Latin) to study customer behavior.Implemented PySpark and Spark SQL for faster testing and processing of data.Developed multiple MapReduce jobs for data cleaning.Used JIRA to track bugs.Environment: SQL Server, Python, SQL, GCP cloud storage bucket, BigQuery, Snowflake, PySpark,Client: CVS Health - Chicago, IL June 2020 to July 2021Role: Data EngineerResponsibilities:Developed ETL data pipelines using Sqoop, Spark, Spark SQL, Scala, and Oozie.Used Spark for interactive queries, processing of streaming data and integrated with popular NoSQL databases.Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2.Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instancesWorked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.Developed the batch scripts to fetch the data from AWS S3 storage and do required transformationsDeveloped Spark code using Scala and Spark-SQL for faster processing of data.Created Oozie workflow engine to run multiple Spark jobs.Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-SQL, Data Frame, pair RDD's, Spark YARN.Experience with terraform scripts which automates the step execution in EMR to load the data to Scylla DB.Developed stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.Prepared scripts to automate the ingestion process using Pyspark and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.Implemented scheduled downtime for non-prod servers for optimizing AWS pricing.De-normalizing the data as part of transformation, which is coming from Netezza and loading it to No SQL Databases and MySQL.Experienced in Dimensional Data modelling, Star/Snowflake Schema, FACT and Dimension tables.Developed Kafka consumer API in Scala for consuming data from Kafka topics.Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system using Scala programming.Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.Good knowledge in setting up batch intervals, split intervals, and window intervals in Spark Streaming using Scala Programming language.Implemented Spark-SQL with various data sources like JSON, Parquet, ORC, and Hive.Loaded the data into Spark RDD and did in memory data Computation to generate the output response.Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to Cassandra.Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.Developed Spark scripts using Scala Shell commands as per the requirements.Environment: HDFS, Spark, Scala, Tomcat, Netezza, EMR, Oracle, Sqoop, AWS, Terraform, Scylla DB, Cassandra, MySQL, OozieClient: Cardinal Health, Columbus, OHIO Jan 2019 to May 2020Role: Sr. Hadoop DeveloperResponsibilities:Experience with complete SDLC process staging code reviews, source code management, and build process.Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval, and processing systems.Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.Experience in moving data between GCP and Azure using Azure Data Factory.Experience in building power bi reports on Azure Analysis services for better performance.Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQueryDeveloped data pipelines using Flume, Sqoop, Pig, and Map Reduce to ingest data into HDFS for analysis.Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.Developed Oozie Workflows for daily incremental loads, which get data from Teradata and then imported into hive tables.Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries, and writing data into HDFS through Sqoop.Developed pig scripts to transform the data into a structured format and are automated through Oozie coordinators.Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming.Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.Performed Data Ingestion from multiple internal clients using Apache Kafka.Wrote MapReduce jobs to discover trends in data usage by the users.Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables using different SerDe's.Load and transform large sets of structured, semi structured, and unstructured data Pig.Experienced working on Pig to do transformations, event joins, filtering, and some pre-aggregations before storing the data onto HDFS.Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.Involved in developing Hive UDFs for the needed functionality that is not available out of the box from Hive.Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala, and Python.Responsible for executing hive queries using Hive Command Line, Web GUI HUE, and Impala to read, write and query the data into HBase.Developed and executed hive queries for denormalizing the data.Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.Experience loading and transforming structured and unstructured data into HBase and exposure handling Automatic failover in HBase.Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.Environment: Cloudera, GCP, Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Zookeeper, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator.Client: Options Clearing Corporation, Chicago, IL Feb 2018 to Dec 2018Role: Hadoop DeveloperResponsibilities:Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysisWorked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SqoopImplemented real time analytics pipeline using Confluent Kafka, storm, elastic search, Splunk and green plum.Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.Responsible for all Public (AWS) and Private (Openstack/VMWare/DCOS/Mesos/Marathon) cloud infrastructureDesign and develop Informatica BDE Application and Hive Queries to ingest Landing Raw zone and transform the data with business logic to refined zone and to Green plum data marts for reporting layer for consumption through Tableau.Installed, configured, and maintained big data technologies and systems. Maintained documentation and troubleshooting playbooks.Automated the installation and maintenance of Kafka, storm, zookeeper and elastic search using salt stack technology.Developed connectors for elastic search and green plum for data transfer from a Kafka topic. Performed Data Ingestion from multiple internal clients using Apache Kafka Developed k-streams using java for real time data processing.Responded to and resolved access and performance issues. Used Spark API over Hadoop to perform analytics on data in HiveExploring with Spark improving the performance and optimization of the existing algorithms Hadoop using Spark context, Spark-SQL, Data Frame, Spark YARNImported and exporting data into HDFS and Hive using SQOOP & Developed POC on Apache-Spark and Kafka. Proactively monitored performance, Assisted in capacity planning.Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using SqoopPerformed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS Good understanding of performance tuning with NoSQL, Kafka, Storm and SQL TechnologiesDesign/Develop framework to leverage platform capabilities using MapReduce, Hive UDFsWorked on data transformation pipelines like Storm. Worked with operational analytics and log management using ELK and Splunk. Assisted teams with SQL and MPP databases such as Green plum.Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (MapReduce, Yarm, Pig, Hive, HDFS)Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, Eclipse, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, SOLRClient: Infosolz, Kolkata, India Aug 2015 to Dec 2016Role: Jr. Hadoop DeveloperResponsibilities:Worked with business teams and created Hive queries for ad hoc access.Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.Involved in review of functional and non-functional requirementsResponsible to manage data coming from various sources.Loaded daily data from websites to Hadoop cluster by using Flume.Involved in loading data from UNIX file system to HDFS.Creating Hive tables and working on them using Hive QL.Created complex Hive tables and executed complex Hive queries on Hive warehouse.Wrote MapReduce code to convert unstructured data to semi structured data.Used Pig to extract, transformation & load of semi structured data.Installed and configured Hive and written Hive UDFs.Develop Hive queries for the analysts.Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.Design technical solution for real-time analytics using Kafka and HBase.Cluster co-ordination services through Zookeeper.Collected the logs data from web servers and integrated it to HDFS using Flume.Creating Hive tables and working on them using Hive QL.Used Pig as ETL tool to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.Support the data analysts and developers of BI and for Hive/Pig development.Environment: Apache Hadoop, HDFS, Cassandra, MapReduce, HBase, Impala, Java (jdk1.6), Kafka, MySQL, Amazon, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig, Infosphere Python, Scala, NoSQL, Flume, OozieClient: Kovair Software Pvt. Ltd, India April 2013 to June 2015Role: SQL DeveloperResponsibilities:Create and maintain database for Server Inventory, Performance InventoryWorking with SQL, and T-SQL, VBAInvolved in creating tables, stored procedures, indexesCreating and Maintain usersCreating / Running jobs with packagesDesign, Develop, Deploy Packages with WMI QueriesImporting Data from various sources like Excel, SQL Server, Front baseCollecting Server Inventory data from users using InfoPath 2003/2007 into SQL Server 2005Creating Linked Servers to other databases like Front base, and import dataUsing Linked ServersEnsuring Data Consistency, Analyzing the DataGenerate Dashboard Reports for Internal Users using SQL Server 2005 Reporting ServicesBacking up the DatabaseRack/Stack ProLiant Servers, installing base operating systemDeploy various reports on SQL Server 2005 Reporting ServerDesign Reports as of User RequirementsInvolved in migrating servers i.e. physical to virtual, virtual to virtualInstalling and Configuring SQL Server 2005 on Virtual MachinesMigrated hundreds of Physical Machines to Virtual MachinesConduct System Testing and functionality after virtualizationMonitor the migrated System for next 48 hrs.Closely work with teamEnvironment: Java/J2EE, JDK 1.7/1.8, LINUX, Spring MVC, Eclipse, JUnit, Servlets, DB2, Oracle 11g/12c, GIT, GitHub, JSON, RESTful, HTML5, CSS3, JavaScript, Rally, Agile/Scrum** References will Provided up on Request ** |