Quantcast

Senior Data Engineer Resume Syracuse, NY
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Senior data engineer
Target Location US-NY-Syracuse
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Software Engineer Data Syracuse, NY

senior QA Engineer Seneca Falls, NY

Data Science Software Engineer Syracuse, NY

Software Engineer Stack Clark Mills, NY

Systems Engineer Security Clearance Rome, NY

Data Entry Title Clerk Cato, NY

Sr. Staff Engineer Marathon, NY

Click here or scroll down to respond to this candidate
                                             Senior Data Engineer
Sai Mani Teja Reddy
Phone:PHONE NUMBER AVAILABLE
Email:EMAIL AVAILABLE

PROFESSIONAL SUMMARY:

      Dynamic and motivated IT professional with around 9 years of experience as a Data Engineer with expertise in
      designing data-intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data Engineering,
      Data Visualization, Reporting, and web application development using Python. Including in Pharma,
      Telecommunications, Finance, Insurance, and Business domains.
      Developed Spark applications using Spark-SQL in Databricks, extracting, transforming, and aggregating data
      from diverse formats. My work unearths crucial insights into customer usage patterns.
      Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing,
      Auto Scaling, CloudWatch, SNS, SES, SQ, Lambda, EMR, and other services of the AWS family. Crafted automated
      ingestion scripts using PySpark and Scala, seamlessly connecting with APIs, AWS S3,Teradata, and Redshift.
      Extensive hands-on experience in developing PySpark applications for distributed data processing.
      Implemented seamless integration between Python and Spark, leveraging PySpark's Python API.
      Exhibited a deep understanding of Azure Big Data technologies, proficiently navigating Azure Data Lake
      Analytics, Azure Data Lake Store, Azure Data Factory, and Azure Synapse. I've crafted Proof of Concepts for
      efficient data migration. Managed tracking tools like JIRA and ServiceNow, displaying a strong ability to build
      CI/CD pipelines through Gitlab, Jenkins, Helm, and Kubernetes.
      Proficient in designing data warehouses, utilizing Azure Synapse capabilities for seamless integration and
      analysis. Demonstrated expertise in leveraging Synapse pipelines for ETL process, ensuring smooth data
      transforming and loading. Implemented Apache Airflow, authored, scheduled, and monitored Data Pipelines.
      Designed and implemented Snowflake cloud data platform architectures, ensuring scalability, performance, and
      reliability. Developed and maintained data models in Snowflake, ensuring alignment with business
      requirements and optimizing schema designs.
      Designed and managed big data platform using Hadoop ecosystem components such as HDFS, Oozie, Sqoop,
      Apache Spark, and Hive to enable distributed and scalable data processing.
      Hands-on experience in handling database issues and connections with SQL and NoSQL databases such as
      MongoDB, HBase, Cassandra, SQL Server, and PostgreSQL. Experience in designing and creating RDBMS Tables,
      Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions. Designed and
      built Dimensions and cubes with star schemas using SQL Server Analysis Services (SSAS).
      Exhibited proficiency in data modeling, leveraging SQL and PL/SQL queries for insightful results. I've seamlessly
      worked across Oracle, SQL Server, and MySQL databases, writing stored procedures, functions, joins, and
      triggers.
      Expert in designing ETL data flows using creating mappings/workflows to extract data from SQL Server and
      DataMigration and Transformation from Oracle/Access/Excel Sheets using SQL Server SSIS. Excelled in Data
      Quality, Mapping, and Filtration using ETL tools like Talend, Informatica, and DataStage.
      Navigated a range of file formats and Click Stream files, while creating intricate Tableau reports and executing
      ad-hoc reporting through Power BI. Led dashboard development and data analysis, revealing customer
      purchasing trends. Created real-time Tableau dashboards, executed A/B tests, and collaborated with marketing
      for analysis.
      Engineered scripts, utilized Spark and Hive, and implemented machine learning models for strategic insights.
      Led version control across Linux and Windows platforms using Git, GitLab, GitHub, and Subversion (SVN). My
      Unix and shell scripting proficiency drives automation.
      Developed robust and scalable back-end components using Python, Django, and Flask technologies. Managed
      code versioning, participated in Agile, Scrum, and Kanban methodologies, and employed RESTful API
      development, contributing to a positive team spirit and collaborative work environment.

TECHNICAL SKILLS:

    AWS (Amazon         EMR, S3, RedShift, EC2, IAM, CloudWatch, Glue, Snowflake, DevOps (CI/CD
    Web Services)       pipelines), RDS, Lambda, Boto3, DynamoDB, Amazon Sage Maker
    Azure Services      Data Factory, Databricks, Blob Storage, ADLS Gen 2, Delta Lake, Data Lake
                        Analytics, Functions, Synapse Analytics, Stream Analytics, SQL Database, SQL
                        Server, Monitor, Event Hubs, Cosmos DB, Azure Cosmos Graph, Cassandra,
                        MongoDB/Azure Cosmos
    Big Data/           MapReduce, Spark, Spark SQL, Spark Streaming, Kafka, PySpark, Pig, Hive,
    Hadoop              HBase, Flume,Yarn, Oozie, Zookeeper, Hue, Ambari Server
    Technologies
    Databases           Oracle, MySQL, SQL server, MongoDB, Cassandra, DynamoDB, PostgreSQL,
                        Teradata, Cosmos
    Programming         Python, Java, PySpark, Scala Shell Script, Perl Script, SQL, JavaScript, HTML5, Linux
    ETL Tools           Informatica, Apache Airflow, Azure Data Factory, Talend, Sqoop, Glue
    Reporting and       Tableau, Power BI, QlikView, Crystal Reports XI, SSRS (SQL Server Reporting
    Visualization       Services), Cognos,Excel
    Tools
PROFESSIONAL EXPERIENCE:

OPTUM (Health Care) | Eden Prairie, Minnesota                                             Sept 2022   Till Date
Senior Data Engineer

Responsibilities:
  Data Governance process development, implementation, and management. Provided full-fledged big-data
   processing within the Hadoop framework. Design of new ETL pipeline using Spark and Hive extracting data from
   numerous sources.
  Designed ad data store, conforming to OLTP archive policy in which to reverse share between ODS and OLTP.
   Using ER/Studio for the creation of conceptual, logical, and physical data modeling and DDL scripts.
  Imported data from real-time to Hadoop via Kafka with Oozie jobs daily. MapReduce programs designed for
   analysing data and cleaning it. Maintaining the Hadoop Cluster on AWS EMR involvement.
  I used AWS Glue and Pyspark to load data into s3 buckets. Use of filtering data in Elasticsearch to load data into
   Hive external tables. Skilled in developing CI/CD pipelines using Jenkins, utilized tech stack such as Gitlab,
   Jenkins, Helm, and Kubernetes.
  Actively engaged in cross-functional discussions to align mapping strategies with REST API design principles
   and industry best practices.
  Implemented robust error handling mechanisms and ensured fault tolerance in PySpark applications. Managed
   PySpark clusters for scalability and performance, integrating with cluster management tools.
  Collaborated on cross-service integration projects, integrating AWS Glue with Amazon S3, Redshift, Athena, and
   other AWS services. Implemented robust data quality checks and validation mechanisms within AWS Glue jobs.
  Developed and implemented incremental data loads using Apache Spark SQL from source systems into Hadoop.
   Maintained comprehensive documentation for Spark applications, including code comments and workflow
   descriptions.
  Applied expertise in HL7 and FHIR to ensure seamless communication and data exchange in healthcare
   environments. Demonstrated the ability to troubleshoot and resolve issues related to HL7 and FHIR data
   mapping, ensuring system reliability.
  To provide the business specification logic, store into Hive, and perform data analysis. The use of Apache Kafka
   to gather web log data from different servers and pass it on to the downstream system for Analysis of data.
  Established comprehensive monitoring and logging for Amazon EMR clusters using Amazon CloudWatch.
  Building Scala Api for backend support for Graph Database User Interface. Coded Scala Api to insert/Delete
   predicates in Graph DB after Transforming and mapping incoming data.
  Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary
   Transformations and Aggregation on the fly to build the common learner data model and persists the data in
   HDFS.
  Integrated AWS Dynamo DB using AWS lambda to teh values items backup teh Automated Regular AWS tasks
   like snapshots creation using Python scripts.
  Optimized Tableau dashboards for performance, addressing factors such as data extracts, filters, and calculated
   fields.
Environment: AWS, Spark, SQL, ER/Studio, Hadoop 3.3, AWS EMR, S3, Snowflakes, Hive, Pig, Apache Kafka, ETL,
Informatica, Sqoop, Python, PySpark, Shell scripting, Linux, MySQL, Jenkins, Git, HLS and FHIR, Oozie, Tableau, and
Agile Methodologies.


The Home Depot | Atlanta, GA                                                             May 2020  Aug 2022
Data Engineer

Responsibilities:
  Proficient in working across a wide range of Azure services, including HDInsight, Data Lake, Data Bricks, Blob
   Storage, Data Factory, Synapse, SQL, SQL DB, DWH, and Data Storage Explorer. Designed and deployed data
   pipelines by leveraging Data Lake, Data Bricks, and Apache Airflow. Thesepipelines facilitated seamless data
   integration, transformation, and orchestration.
  Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure
   Databricks.
  Expertise in connecting and integrating various data sources, including cloud-based applications, databases, and
   APIs, using StreamSets connectors.
  Strong understanding of StreamSets' data governance and security features, including data masking, encryption,
   and access controls.
  Skilfully integrated data from both on-premises (MY SQL, Cassandra) and cloud sources (Blob Storage, Azure
   SQLDB) using Azure Data Factory. Applied transformations to load data back to Azure Synapse for enhanced
   insights.
  Developed Spark Scala functions for real-time data mining, providing crucial real-time insights and generating
   reports. Configured Spark Streaming to receive data from Apache Flume and stored it in Azure Table using Scala.
  Utilized Azure Data Lake for comprehensive data processing and analytics. Employed Data Bricks for processing
   data ingested into Azure Blob Storage, effectively utilizing Spark Scala scripts and UDFs for large-scale
   transformations.
    Loaded tables from Azure Data Lake to Azure Blob Storage, facilitating smooth data movement to Snowflake for
    further analysis. Created complex SnowSQL scripts for reporting and business analysis in the Snowflake Cloud
    Data Warehouse.
    Utilized Spark Streaming API to ingest data from diverse sources, enhancing existing Scala code to improve
    cluster performance and efficiency. Leveraged Spark Data Frames and Data Bricks Notebooks to create datasets
    and apply business transformationsand data cleansing operations.
    Worked with delta tables on Databricks. Hands on experience working with Databricks runtime and
    performance tuning of spark jobs.
    Proficiently developed ETL pipelines and Directed Acyclic Graph (DAG) workflows using Python scripts, Airflow,
    and Apache NiFi.
    Led the migration of critical systems from on-premises hosting to Azure Cloud Services, with a focus on Snow
    SQL query writing and optimization.
    Developed data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL.
    Collaborated with Cosmos DB (SQL API and Mongo API) for seamless data integration.
    Designed custom input adapters using Spark, Hive, and Sqoop for data ingestion and analysis from various
    sources, including Snowflake, MS SQL, and MongoDB.
    Implemented auto-scaling and serverless computing techniques, which reduced overall cloud infrastructure
    expenses by 15%.
    Possesses knowledge of SAS (Statistical Analysis System), utilizing it for advanced analytics and data processing
    when required. Ensured data quality and validation using SAS, implementing checks and procedures to identify
    and address discrepancies in datasets.
    Developed RESTAPIs using Python with Flask and Django framework and the integration of various datasources
    including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
    Proficient in utilizing data for interactive Power BI dashboards and reporting purposes based on business
    requirements.
    Extensively worked on Jenkins to implement continuous integration (CI) and Continuous deployment (CD)
    processes. Worked in Agile Methodology and used JIRA to maintain the stories about the project.

Environment: Azure HDInsight, ADLS, Azure Synapse, Azure Data Factory, Data Bricks, Data Lake, Cosmos DB,
MySQL, Snowflake, MongoDB, Palantir, Teradata, Flume, SAS, Blob Storage, Data Factory, ETL, Data Storage Explorer,
Scala, Hadoop (HDFS, MapReduce, Yarn), Spark, Flink, Airflow, Hive, Sqoop, HBase, Kubernetes, Jira, Tableau, Power
BI.

Wells Fargo | Minneapolis, MN                                                               Feb 2018   April 2020
Data Engineer

Responsibilities:
  Led the creation of reporting dashboards, conducting data mining and analysis to understand customerpurchase
   behavior.
  Formulated real-time dashboards in Tableau, delivering visual monitoring of crucial metrics and executing A/B
   test processing, leveraging both external and internal data.
  Collaborated closely with the marketing team to dissect marketing campaign data, executing analysis
   encompassing segmentation and cohort analysis.
  Orchestrated the design of MySQL table schemas, subsequently implementing stored procedures for efficient
   extraction and storage of customer purchase and session data.
  Employed Python Packages Pandas and NumPy to query MySQL database, validating and identifyinginconsistent
   data. Directed the design of A/B tests, defining metrics, calculating sample size, and assessing statistical
   assumptionsto validate new user interface features.
  Conducted diverse statistical analyses, including hypothesis testing, regression analysis, confidence interval, and
   p-value calculations using R. This provided insights to enhance click-through rates and sales. Led Exploratory
   Data Analysis, identifying trends using Tableau and Python libraries such as Matplotlib,Seaborn, and Plotly Dash.
  Devised scripts for seamless data storage into Hadoop HDFS from varied sources like AWS S3, AWS RDS, Web
   API, and NoSQL Database MongoDB. Leveraged Redshift for cloud-based data warehousing, optimizing data
   storage, and query performance.
  Implemented security measures to protect sensitive data during ETL processes, ensuring compliance with data
   governance and privacy regulations.
  Collaborated on end-to-end data processing pipelines involving ApacheSpark. Implemented effective data
   serialization techniques in Spark applications for optimized data transfer.
  Designed and managed Data warehouses, utilizing Azure Synapse capabilities for seamless integration and
   analysis. Adept at monitoring and optimizing Synapse workloads to enhance data processing efficiency.
   Proficient in configuring access controls and permissions in Azure ADLS, ensuring data security and compliance.
  Leveraged Spark and Hive within the Big Data ecosystem to analyze extensive datasets up to 2GB stored in
   Hadoop HSDF. This included executing filtering and aggregation using Spark SQL on Spark Data Frame.
  Engineered Python scripts for robust data pre-processing in predictive modeling, encompassing tasks such as
   missing value imputation, label encoding, and feature engineering.
  Implemented machine learning models, notably Decision Trees and Logistic Regression, to predict revenue from
   returning customers. This informed strategic promotion decisions by the marketing team.
  Effectively communicated critical data insights to diverse stakeholders, utilizing tools such as MS PowerPoint,
   Tableau, and Jupyter Notebook for impactful presentations.
  Analysed and interpreted data using Power BI to derive actionable insights. Leverage Power BI for visualizing
   and communicating data science results.
Environment: SQL Server, MySQL, Python, R, Pandas, NumPy, Matplotlib, Seaborn, Plotly, Tableau, Power BI,
Excel, Hadoop, Spark, Hive, Spark SQL, AWS S3, AWS RDS, Redshift, MongoDB, Azure Synapse, ETL, ADLS,
Jupyter Notebook, Machine learning, Predictive Modelling, DataVisualization, Power Bi, Statistical Analysis.

TechMahindra | Hyderabad, India                                                       July 2014   Oct 2017
Data Analyst

Responsibilities:
  Gathered business requirements, definition, and design of the data sourcing, worked with the data warehouse
   architect on the development of logical data models.
  Ability to query data in a data warehouse and prepare data for reporting and insights automation needs.
  Extract, cleanse, and combine data from multiple sources and systems using R and Python Programming.
  Perform exploratory and targeted analyses, with a wide variety of statistical methods including cluster,
   regression, decision tree/random forest, time series using Python Programming.
  Built reports and dashboards to monitor KPIs (Key Performance Indicators) to understand drivers of KPI
   changes.
  Performed Regression testing for Golden Test Cases from State (end to end test cases) and automated the
   process using python scripts.
  Designed and executed analytic projects, generated insights to support business decisions using advanced
   analytical and visualization techniques such as descriptive, predictive, and prescriptive analytics.
  Generated graphs and reports using ggplot package in RStudio for analytical models. Developed and
   implemented R and Shiny application which showcases machine learning for business forecasting.
  Performed K-means clustering, Regression and Decision Trees in R. Worked on data cleaning and reshaping,
   generated segmented subsets using NumPy and Pandas in Python.
  Used Python NumPy, Pandas to perform data cleaning and data transformation activities.
  Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to
   ensure that the views and dashboards were displaying the changed data accurately.

Environment: Python, Django, Flask, Beautiful Soup, NumPy, SciPy, matplotlib, Pandas, JavaScript, HTML5, RESTful
API, MySQL, Agile Methodologies, Scrum, Git, Power BI, Software Development Life Cycle (SDLC), Version Control
Systems.

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise