| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
PHONE NUMBER AVAILABLEEMAIL AVAILABLEDATA SCIENTIST/DATA ENGINEERPROFESSIONAL PROFILE4+ years of experience in IT and Health care industry, including big data environment, Hadoop ecosystem, Python and Design, Developing, Maintenance of various applications like Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Airflow, Snowflake, Teradata, Flume, Kafka, Yarn, Oozie, Zookeeper and Machine Learning .Experience in managing, planning, supporting business critical solutions in Information Technology & Services, Insurance Services, Oil & Gas, Energy and telecom industries.Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.Involved in Data Warehouse design, data integration and data transformation using Apache Spark and Python.Worked related to downloading Big Query data into pandas or Spark data frames for advanced ETL capabilities.Developed highly complex Python code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.Experience in creating and executing Data pipelines in GCP and AWS platforms.Hands on experience in GCP, Big Query, GCS, cloud functions, Cloud dataflow, Pub/Sub, cloud shell, GSUTIL, bq command- line utilities, Data Proc.Experience in data architecture and design.Company provides Tableau dashboards and data sources to healthcare and hospital corporations in order to understand demand, pricing, geospatial claim concentrations, deidentified and reidentified customer claims across multiple claims networks, consultation on hospital expansion and location, etc. as requested.Good working experience in Application and web Servers like JBoss and Apache Tomcat.Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and MapReduce open-source tools.Experience in Developing Spark applications using Pyspark, Scala and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Experienced in using Pig scripts to do transformations, event joins, filters, and pre-aggregations before storing the data into HDFS.Expertise in Big Data architecture like Hadoop (Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.Experience in spark - based application to load streaming data with low latency, using Kafka and Pyspark programming.Hands on experience on Hadoop/Big Data related technology experience in Storage, Scala, Querying, Processing, and analysis of data.Expertise in programming in different technologies i.e., Python, Spark, SQL.Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.Solid Experience and understanding of Implementing large scale Data warehousing Programs and E2E Data Integration Solutions on Snowflake Cloud, AWS Redshift, Informatica Intelligent Cloud Services (IICS - CDI) & Informatica PowerCenter integrated with multiple Relational databases (MySQL, Teradata, Oracle, Sybase, SQL server, DB2)Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.Strong hands-on experience with AWS services, including but not limited to EMR, S3, EC2, route53, RDS, ELB, DynamoDB, CloudFormation, etc.Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Hive, Sqoop, Oozie, Flume, Storm, big data technologies.Professional in deploying and configuring Elasticsearch, Logstash, Kibana (ELK) and AWS Kinesis for log analytics and skilled in monitoring servers using Nagios, Splunk, AWS CloudWatch, and ELK.Worked on Sparks, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.Experienced in maintaining CI/CD (continuous integration & deployment) pipelines and applying automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible,Experienced in working with different scripting technologies like Python and UNIX shell scripts.Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.AREAS OF EXPERTISEHadoop/Spark Ecosystem:Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Sqoop, Oozie, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm, RAG, ReactProgramming Languages:Java, Python, Hibernate, JDBC, JSON, HTML, CSS,NLP,C++.Net, Langchian,Ilma 3.1Cloud Technologies:AWS, GCP, Amazon, S3, EMR, Redshift, Lambda, Athena Composer, Big Query, Deep LearninngScript Languages:Python, Shell Script (bash, shell).Databases:Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB.Version controls and Tools:GIT, Maven, SBT, CBT.Devops:Docker, Jenkins, Terraform, AWS-CDK, CloudFormation, K8sPROFESSIONAL EXPERIENCE:Nevro, San Francisco 2023 PresAssociate Data Scientist/Data EngineerResponsibilities:Designed, developed, and maintained data processing pipelines using Pyspark to handle large-scale data transformations and ETL workflows.Pull data from Salesforce and built an AI model for one of the Nevro FDA approved device. Based off patients pain assessment.Provisioned and administered AWS Cloud Storage(S3) cross region replication, Elastic Load Balancer, configure Auto scaling, setting up CloudWatch alarms, Virtual Private Cloud (VPC), mapping with multi-AZ VPC instances and RDS, based on architecture. Worked on Amazon EC2 setting up instances, virtual private cloud (VPCs), and security groups and created AWS Route53 to route traffic between different regions and used BOTO3 and Fabric for launching and deploying instances in AWS.Configured Amazon S3, Elastic Load Balancing, IAM and Security Groups in Public and Private Subnets in VPC, created storage cached and storage volume gateways to store data and other services in the AWS.Architected and configured a virtual data center in the AWS cloud to support Enterprise Data Warehouse hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups and Route Tables.Worked on Security Groups, Network ACLs, Internet Gateways, NAT instances and Route tables to ensure a secure zone for organizations in AWS public cloud.Worked on migration services like AWS Server Migration Service (SMS) to migrate on-premises workloads to AWS in easier and faster way using Rehost "lift and shift" methodology and AWS Database Migration Service (DMS), AWS Snowball to transfer large amounts of data and Amazon S3 Transfer Acceleration.Worked in an agile development team to deliver an end-to-end continuous integration/continuous delivery product in an open-source environment using tools like Chef.Worked on Terraform scripts to automate AWS services which include ELB, Cloud Front distribution, RDS, EC2, database security groups, Route 53, VPC, Subnets, Security Groups, and S3 Bucket and converted existing AWS infrastructure to AWS Lambda deployed via Terraform and AWS CloudFormation.Implemented AWS Elastic Container Service (ECS) scheduler to automate application deployment in the cloud using Docker Automation techniques.Implemented Docker - maven-plugin in Maven pom.xml files to build Docker images for all Microservices and later used Docker File to build the Docker images from the Java jar files also Created Docker images using a Docker File, worked on Docker container snapshots, removing images, and managing Docker volumes.Wrote Python scripts using the Boto3 library to automatically sign up the instances in AWS EC2 and OpsWork stacks and integrated with Auto Scaling with configured AMIs.Worked on Git version control to manage the source code and integrating Git with Jenkins to support build automation and integrated with Jira to monitor the commits.Designed various Jenkins jobs to continuously integrate the processes and executed CI/CD pipeline using JenkinsEnvironment: AWS, Terraform, Chef, Docker, Jenkins, Git, Jira, Jenkins, Kubernetes, Maven, Nagios, ELK, Java, SonarQube, Shell, Bash, Python, DynamoDB, Cassandra.Yana Software Private LimitedData Analyst 2020 2022Responsibilities:Built scalable and deployable machine learning models.Performed Exploratory Data Analysis, trying to find trends and clusters. Built models using techniques like Regression, Tree based ensemble methods, Time Series forecasting, KNN, Clustering and Isolation Forest methods.Communicated and coordinated with other departments to collect business requirement. Tackled highly imbalanced Fraud dataset using under sampling with ensemble methods, oversampling and cost sensitive algorithms.Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.Utilized AWS Glue Crawlers to automatically discover and catalog metadata from various data sources, including databases, data lakes, and semi-structured data formats.Designed, developed, and managed data processing workflows on the Amazon Web Services (AWS) Elastic MapReduce (EMR) platform.Implemented data pipelines using AWS services like AWS Glue, AWS Lambda, and AWS Step Functions to automate data ingestion, transformation, and loading processes.Designed, implemented, and managed end-to-end data pipelines using Azure Data Factory to efficiently orchestrate data movement and transformations across diverse data sources and destinations.Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn. Optimized algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.Developed a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.Designed, developed, and maintained data pipelines on the Snowflake cloud data platform to ingest, process, and transform large-scale datasets from diverse sources.Utilized Pyspark Data Frame API to perform data cleansing, data enrichment, and data aggregations to prepare data for downstream analytics and reporting.Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.Developed complex SQL queries in AWS Athena to analyze and process data stored in Amazon S3, providing valuable insights to data analysts and business stakeholders.Extensively performed large data read/writes to and from csv and excel files using pandas.Tasked with maintaining RDD using Spark SQL.Measured the ROI based on the differences pre-promo-post KPIs. Extensively used SAS procedures like IMPORT, EXPORT, SORT, FREQ, MEANS, FORMAT, APPEND, UNIVARIATE, DATASETS and REPORT.Standardized the data with the help of PROC STANDARD.Responsible for maintaining and analyzing large datasets used to analyze risk by domain experts. Developed Hive queries that compared new incoming data against historic data. Built tables in Hive to store large volumes of data.Used big data tools Spark (Spark SQL, Mllib) to conduct the real time analysis of credit card fraud based on AWS Performed Data audit, QA of SAS code/projects and sense check of results.After sign-off from the client on technical brief, started developing the SAS codes.Iteratively rebuild models dealing with changes in data and refining them over time.Extensively used SQL queries for legacy data retrieval jobs. Tasked with migrating the django database from MySQL to PostgreSQLEnvironment: Spark, Hadoop, AWS, SAS Enterprise Guide, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS/SQL, ORACLE, MS-OFFICE, (scikit-learn, pandas, Numpy), Azure Data Factory, Snow Flake, PySpark, EMR, Glue, Athen, Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau) |