| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Data Engineer/Scientist
EMAIL AVAILABLE PHONE NUMBER AVAILABLE LINKEDIN LINK AVAILABLE
CAREER SUMMARY
A Data Engineer expert in developing, implementation, and optimization data plumbing systems and
ETL processes. Skilled in Big Data tech including Hadoop, Impala, Sqoop, Pig, Zookeeper, Hive, and
cloud platforms like AWS, GCP & Azure. Proficient in creating data pipelines with AWS services for
storage, analytics, and modeling. Service-oriented programmer proficient in JavaScript, Python,
Pyspark, NodeJS, R, Java, and SQL. Strong in data cleaning, reshaping, ETL pipeline building, and
generating subsets with Databricks, Data Warehouse, Data Lake, Numpy, Pandas, and PySpark. Expertise
in DevOps, Jenkins, Docker, and Splunk.
PROFESSIONAL EXPERIENCE
Seacoast Bank, Senior Data Engineer Feb 2023 present | Tampa,FL
Built a Data Lake in Amazon S3 using Python, R, and Pyspark for the client, Imported data from
Snowflake tables using CRM Postgres DB, Salesforce, MySQL Server, Amazon RDS, and Integrated
data from multiple sources.
Developed automation regression scripts to validate ETL processes across multiple databases,
including AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server, using Python & Pyspark.
Transformed vast sets of financial temporal data into actionable insights by leveraging AWS S3, EMR,
Athena, HDFS, Databricks, and Apache Spark, resulting in a 30% increase in data processing
efficiency.
Using Amazon Aurora Databases and DynamoDB, Increased storage capacity by 50% while reducing
latency by 250%, Implemented real-time data processing solutions using AWS Kinesis and AWS
Lambda.
Optimized the performance of AWS-hosted applications using CloudWatch monitoring, which reduced
error rates by 10%, and migrated the company's entire workload to AWS cloud using EC2 and S3 for
efficient scaling, which resulted in 40% more efficiency.
Evaluating performance of business requirements, performed data segmentation, integrated customer
data into emails, enforced compliance approvals, and analyzed customer engagement using Data
bricks, snowflake, PySpark, and AWS S3.
Developed and maintained cloud-based data manipulation pipelines utilizing SQL, Python scripting,
and dbt, ensuring efficient data transformation and integration across banking applications.
Implemented Data Mining and parsing techniques to extract valuable insights from large datasets,
improving decision-making processes.
Created and optimized SQL queries for robust data reporting and visualization in Looker, enhancing
the accuracy and accessibility of financial reports for stakeholders.
Aligned OLTP and OLAP processes by collaborating with cross-functional teams; achieved a 20%
increase in data consistency, enhancing strategic planning, and accelerating decision-making
capabilities across the organization by 25%
Played a vital role in implementing CI/CD pipelines using GIT, Jenkins, and Maven, streamlining
development processes.
Exported the analyzed data to the relational databases using Amazon EMR for visualization and to
generate reports using Power BI and Tableau. Utilized Apache Airflow pipelines to automate DAGs,
their dependencies, and log files.
Thryve Digital Health LLP, Senior Data Analyst / Engineer Jun 2019 Aug 2021 | Hyd, India
Develop predictive models using regression in collaboration with the healthcare analytics team using
Python, AWS SageMaker, EC2, and S3 to analyze total charges and length of stay for patients with
COVID-19 and mental illness.
Contributed to the implementation of a medical records filing system which helped decrease
outpatient wait time by 13.2%, adhering to Agile principles and delivering projects on time.
Analyzed and transformed patients' time-series data by running batch-processing jobs in data
warehouses using HDFS, Azure Databricks, Azure Data Factory, and Apache Spark.
Managed on-premises data infrastructure including Apache Kafka, Apache Spark, Elasticsearch,
Redis, and MongoDB, using Docker containers.
Constructed MapReduce jobs to validate, clean, and access data and worked with Sqoop jobs with
incremental load to populate and load into Hive External tables.
Leveraged statistical modeling techniques including decision trees and generalized linear models
(GLM) using SAS and MATLAB for predictive modeling in the insurance industry.
Conducted comprehensive data analysis and transformation, ensuring compliance with industry
regulations.
Experience with Data warehousing techniques like Star Schema, Snowflake Schema, normalizing,
denormalization, transformations, and aggregation.
Authored detailed technical documentation to facilitate knowledge transfer and maintain high
standards of information technology practices.
Utilized advanced data analysis and risk assessment methodologies to enhance annuity and insurance
risk predictions, resulting in a 15% improvement in accuracy.
Integrated results into database applications and generated reports using spreadsheets and VBA,
facilitating data-driven decision-making.
Created and Designed dashboards using ggplot, Python matplotlib, Power BI, and Tableau to analyze
important features and model performance.
Implemented version control using Git/GitHub for Storing data on Medical projects, ensuring better
collaboration and code management within the team.
Cliff.AI, Software Engineer- Data Crawling & Analytics, Dec 2018 May 2019 | Hyd, India
Implemented web scraping scripts to gather data from various websites, ensuring ethical and efficient
data extraction practices, using technologies like Python(libraries), Java, R, and SQL.
Utilized Pandas and NumPy to transform raw data into a usable format, improving the quality of
datasets for further analysis.
Automated ETL processes across billions of rows of data, which reduced manual workload by 29%
monthly.
Ingested data from disparate data sources using a combination of SQL, Google Analytics API, and
Salesforce API using Python to create data views to be used in BI tools like Tableau.
Designed and maintained robust data pipelines for annuity computations and risk analysis using SQL,
JSON, and XML formats.
Automated data extraction and processing workflows using VBA, ensuring seamless integration of
diverse data sources into predictive modeling frameworks.
Collaborated with cross-functional teams to develop and deploy database applications and statistical
models, leveraging skills in HTML and teamwork to enhance data analysis and decision support
systems.
Led the implementation of RESTful API integrations for seamless data exchange between domain
systems, leveraging JSON and XML formats for data transfer.
Engineered data repositories and automated ETL processes to support complex data requirements in
the banking sector.
Contributed to developing BI tooling solutions, integrating Power BI dashboards with existing data
infrastructure, using Tableau to create and maintain data visualizations.
Collaborated with cross-functional teams, including data scientists & application developers to guide
the development and implementation of Cloud applications, systems, and processes using DevOps
methodologies.
EDUCATION
Master's in Computer Technology, Eastern Illinois University Aug 2021 May 2023 | Chicago, IL
Bachelor's in Computer Science, JNTUH Apr 2015 May 2019 | Hyd, India
SKILLS
Programming languages: Python, Java, SQL, pyspark, Scala, Bash. Data: Hadoop, s3, Redshift, Hiva,
Elasticsearch, Redis, PostgreSQL, MongoDB, MySQL Distrusted systems: Apache Spark, Databricks,
Kubernetes, Kafka, AWS/Azure cloud: s3, EC2, EMR, Airflow, Lambda, Athena, Glue, IAM, Redshirt,
Dynamo DB, CloudWatch, Sagemaker, Kinesis, Azure SQL Database, Azure Load Balancer, DevOps Tool
Integrations. Other: Docker, Git, Kibana, Flask, PyTorch, Salesforce, Tableau, Power Bl, Mata Trader 4,
Jira, Pandas, NumPy s , OpenStreetMap, Terraform, Grafana,
|