| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
PHONE NUMBER AVAILABLEEMAIL AVAILABLELinkedIn: https://LINKEDIN LINK AVAILABLESUMMARY10 years of experience in Data Engineering, specialized in Big Data technologies, Data Pipelines, SQL/NoSQL, Cloud-based RDS, Distributed Database, Serverless Architecture, Data Mining, Web Scraping, Cloud technologies like AWS EMR, Redshift, Lambda, Step Functions, and CloudWatch.Expertize in writing end to end Data Processing jobs to analyze data using MapReduce, Hadoop, Spark and Hive.Leveraged Hadoop ecosystem knowledge to design and develop capabilities to deliver our solutions using technologies Spark, Scala, Hive, Kafka, AWS, and No-SQL databases.Expertise in working with a wide variety of data sources including Oracle, Teradata, Greenplum, Hadoop databases to extract information and integrate for creating key performance indicators.Implemented Text Analytics / ML programs to enhance business analytics, including training models for better prediction of conversational intents, text parsing, topic modeling, and understanding volumetric trends towards specific lines of business.Worked on data ingestions using Sqoop and Streaming tools, involving importing of data to HDFS, Hive, and HBase from various data sources, also performed transformations using Hive, Pig.Proficient in SQL databases MySQL, MS SQL, Oracle, and NoSQL databases like MongoDB. Experience in writing Subqueries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL databases.Experience implementing dashboards, data visualization, and analytics using Tableau, Power BI.Designed and optimized end-to-end ETL processes to enhance data integration from diverse sources into a centralized repository.Designed and implemented scalable ETL architectures, ensuring adaptability to growing data volumes.Exposure with AWS services like API, RDS instance, and Lambda to build a serverless application.Working knowledge in building Azure applications using tools like Azure Storage, Azure SQL databases, Azure HDInsights.Working experience in building various scripts as per requirements in Shell, Bash, and Python.Good understanding of Data Modeling (Dimensional and Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension Tables.Used Git extensively and Jenkins for CI/CD.Contributed to different projects involving tasks like setting up Data Lakes, moving applications, transitioning to the Cloud, and automating processes for different clients.Extensively used Microservices and Postman for hitting Hadoop clusters.Experience in scheduling and monitoring jobs using Oozie, Airflow and ZookeeperExperience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.Experienced with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.Expertise in working with Linux/Unix and shell commands on the Terminal.Experience in writing Subqueries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL databases.Contributed to different projects involving tasks like setting up Data Lakes, moving applications, transitioning to the Cloud, and automating processes for different clients.Hands on experience in Test-driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.Good analytical, communication skills and ability to work with a team as well as independently with minimal supervision.TECHNICAL SKILLSHadoop EcosystemHDFS, SQL, YARN, PIG Latin, MapReduce, Hive, Sqoop, Spark, Yarn, Zookeeper, Oozie, Kafka, Storm, FlumeProgrammingLanguagesPython, PySpark, Spark with Scala, JavaScript, Shell ScriptingCloud PlatformEC2, S3, EMR, Redshift, DynamoDB, Aurora, VPS, Glue, Kinesis, Boto3, Azure Data Lake, Azure-Data Factory, Azure Blob storage, Azure table storage, Azure SQL serverOperating SystemsLinux, Windows, UNIXDatabasesNetezza, MySQL, UDB, HBase, MongoDB, Cassandra, Snowflake, Oracle,Cosmos, DynamicDB.IDEsPyCharm, IntelliJ, Ambari, Jupiter NotebookData VisualizationTableau, BO Reports, Splunk, PowerBIMachine Learning Techniques:Linear & Logistic Regression, Classification and Regression Trees, Random Forest, Associative rules, NLP, and Clustering.OthersDocker/ Kubernetes, TensorFlow, Rapid Miner, Elasticsearch, Kerberos, Outlook, Excel, Word, and PowerPoint, SharePoint, AWS Big Data, Application Development, Computer Science, DaaS, Data As A Service, Unit Testing, Web Systems, Web Applications, System Design, System Analysis, Software Development Cycle, Business Analytics.PROFESSIONAL EXPERIENCEWells Fargo Charlotte, NC Aug 2021 - PresentRole: Sr Data EngineerResponsibilities:Executed end-to-end management of multiple projects throughout the Software Development Life Cycle (SDLC), including requirement analysis, design, development, testing, deployment, and production support. Ensured seamless alignment with organizational objectives and adherence to best practices.Leveraged Big Data technologies such as Hadoop, MapReduce, HBase, Hive, and Sqoop, utilizing Python alongside Spark for processing and analyzing multi-source data.Imported and transformed structured data using Sqoop and Spark, with Python scripts for automated handling and storage into HDFS in CSV format.Implemented advanced data management techniques such as partitioning and bucketing Informatica and developed Hive queries for data processing and cube generation for visualization.Implemented data ingestion processes from various sources into Redshift, including Amazon S3 and RDS, ensuring consistent and reliable data integration.Designed and developed a Data Lake in the AWS environment, synchronizing data across multiple platforms and ensuring data integrity.Utilized Spark Streaming APIs and Kafka for real-time data analysis and persistence to AWS S3 or Cassandra, demonstrating real-time analytics capabilities.Connected Snowflake and Cassandra databases to Amazon EMR, analyzed data using CQL, and loaded data using DataStax Spark-Cassandra connector, exhibiting deep expertise in complex data storage solutions.Configured the Elastic Stack (ELK) for log analytics and full-text search, integrating it with AWS Lambda and CloudWatch to enhance application monitoring.Maintained a data warehouse ecosystem on AWS, utilizing Redshift for storing and processing large-scale datasets for analytics and reporting purposes.Developed and deployed Kafka producer and consumer applications on a Kafka cluster, managing configuration through Zookeeper.Utilized Star and Snowflake Schemas for data modeling, with hands-on experience in OLTP/OLAP systems, and data modeling using Erwin.Developed Spark Code, Spark-SQL/Streaming, and Scala scripts for efficient data processing, utilizing features like Spark Context, Data Frame, Pair RDD's, and Spark YARN.Expert in automating and scheduling the Informatica jobs using UNIX shell scripting configurating korn-jobs Informatica sessions.Used Python to develop pre-processing jobs, flatten JSON documents to flat files, and convert distributed data into organized formats with Scala's Data Frame API.Created comprehensive reports using Tableau, collaborating with the BI team and utilizing Python for data extraction and transformation to meet specific business requirements.Environment: Python, Hadoop Yarn, Spark Core, Unix, Spark Streaming, Spark SQL, HBase, Scala, Kafka, Hive, Sqoop, AWS, S3, AWS Lambda, Glue, Cloud Watch, OLTP/OLAP, Cassandra, Tableau, MySQL, Linux, Shell scripting, Agile Methodologies.Centene Corporation, Denver, CO Apr 2019 Aug 2021Role: Data EngineerResponsibilitiesInvolved in Requirement gathering, Business Analysis and translated business requirements into technical design in Hadoop and Big Data. Automated and monitored complete AWS infrastructure with terraform.Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Implement security measures to protect sensitive data within Tableau, setting up user permissions and access controls.Utilized EC2 to launch and manage virtual machines (VMs) in the cloud and performed scaling of EC2 instances using tools like CloudFormation and Terraform.Optimize Power BI reports for speed and efficiency, especially with large datasets.Develop data ingestion pipelines to load, process, and store large volumes of semi-structured or unstructured data in NoSQL databases. Experienced in writing live Real-time Processing using Spark Streaming with Kafka.Set up MongoDB replica sets and sharded clusters to ensure high availability, scalability, and fault tolerance. Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.Implement table partitioning strategies in PostgreSQL to manage large datasets efficiently and improve query performance. Created data partitions on large data sets in S3 and DDL on partitioned data.Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau. Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.Assisted in creating and maintaining technical documentation to launch HADOOP Clusters and even for executing Hive queries and Pig Scripts. Implement security measures in Power BI to control access and protect sensitive data.Extensively worked with Avro and Parquet, XML, JSON files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.Implement robust error handling mechanisms and logging strategies within Informatica workflows to capture and manage data integration errors effectively. Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.Optimize queries and indexing strategies to retrieve data efficiently from NoSQL databases, considering the specific query patterns of the application. Involved in Configuring Hadoop cluster and load balancing across the nodes. Implement data streaming solutions using AWS Kinesis for real-time processing.Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems. Prepare and cleanse data within Tableau to ensure accuracy and consistency for reporting purposes.Experienced in implementing robust governance and security measures in OBIEE, ensuring data integrity and compliance.Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created Dags to run the Airflow. Stored and retrieved data from data-warehouses using Amazon Redshift.Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and installed and configured Hadoop, Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.Environment: Redshift, ETL, Tableau, Python, Hive, SQL, SQL Server PostgreSQL, NoSQL, Snowflake, Kafka, S3, EC2, I AM HDFS, AWS, Terraform, Hadoop, Java, HDFS, MapReduce, DAGs, Apache, RDD, Scala, Informatica, AvroAscena Retail Group Mahwah NJ Sep 2017 Mar 2019Role: Data EngineerResponsibilities:Design robust, reusable, and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and unstructured batch and real time data streaming data using Python Programming.Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.Applied transformation on the data loaded into Spark Data Frames and done in memory data computation to generate the output response.Good knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.Used Spark Data Frames API over platforms to perform analytics on Hive data and used Spark Data Frame operations to perform required validations in the data.Built end-to-end ETL models to sort vast amounts of customer feedback, derive actionable insights and tangible business solution.Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.Wrote Spark applications for Data Validation, Cleansing, Transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.Prepared scripts to automate the ingestion process using Pyspark and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.Created a business category mapping system that automatically maps customers' business category information to any source website's category system. Category platforms include Google, Facebook, Yelp, Bing etc.Developed a data quality control model to monitor business information change overtime. The model flags outdated customer information using different APIs for validation and updates it with correct data.Responsible for monitoring sentimental prediction model for customer reviews and ensuring high performance ETL process.Data cleaning, pre-processing and modelling using Spark and Python.Implemented real-time data driven secured REST APIs for data consumption using AWS (Lambda, API Gateway, Route S3, Certificate Manager, CloudWatch, Kinesis), Swagger, Okta and Snowflake.Develop automation scripts to transfer the data from on premise clusters to Google Cloud Platform (GCP).Load the files data from ADLS Server to the Google Cloud Platform Buckets and create the Hive Tables for the end users.Involved in performance tuning and optimization of long running spark jobs and queries (Hive/SQL)Implemented Real-time streaming of AWS CloudWatch Logs to Splunk using Kinesis Firehose.Skilled in integrating OBIEE with various data sources and warehouses, optimizing data flow for reportingDeveloped using object-oriented methodology a dashboard to monitor all network access points and network performance metrics using Django, Python, MongoDB, JSON.Developed application for monitoring, root cause analysis and management of WLAN data by parsing log using Python Django, MongoDB and creating JSON format.Environment: Python, Scala, SQL, Maven, AWS (Redshift, S3, EC2), Mongo dB, MYSQLJohnson & Johnson Consumer Oct 2015 Jul 2017Role: Data EngineerResponsibilities:Experience in learning architecting data intelligence solutions around Snowflake Data Warehouse and architecting snowflake solutions as developer.Build, create and configure enterprise level Snowflake environments. Maintain, implement, and monitor Snowflake Environments.Worked on POC using Snowflake utilities, Snowflake SQL, Snow Pipe, etc.Developed specifications and code for high complexity items, working independently.Support and troubleshoot production systems as required, optimizing performance and resolving.Performed root cause analysis on re-occurring problems and make recommendations to eliminate the issue.Tuned performance of Informatica sessions for large data files by increasing block size, data cache size, sequence buffer length and target-based commit intervalExperience with many of the transformations of Informatica including complex lookups, stored procedures, update strategy, mapplets, etc.Environment: Python, Linux, DB2, SQL, Snowflake, IBM Infosphere 9, AWS services.AMADEUS, Bangalore, India May 2014 Sep 2015Role: SQL DeveloperResponsibilities:Worked on the Reports module of the project as a developer on MS SQL Server 2005 (using SSRS, T-SQL, scripts, stored procedures, and views).Involved in designing ETL as a part of Data warehousing and loaded data in to Fact tables using SSIS.Write T-SQL scripts for database backup jobs and daily/weekly jobs.Created databases, tables, indexes, views and coded stored procedures, functions, triggers.Conducted data migration from Oracle database and flat files into SQL Server 2008 using SSIS Packages.Created complex stored procedures and functions to support the front-end application.Optimized SQL queries for improved performance and availability.Environment: MS SQL Server 2008, SSIS 2008, SSRS 2008/2005, Windows XP, SQL Server Management Studio (SSMS), T-SQL, SQL Server Integration Services (SSIS), Microsoft Excel.Education:Bachelor's, Computer ScienceKeshav Memoria Institute of Technology. Jul 2010 - May 2014 |