Data Engineer Senior Resume Denton, TX

Data Engineer Senior Resume Denton, TX
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data Engineer Senior
Target Location	US-TX-Denton
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Senior Plano, TX

Senior Data Engineer Irving, TX

Data Engineer Senior Irving, TX

Senior Data Engineer Frisco, TX

Data Engineer Senior Dallas, TX

Principal Machine Learning Engineer | Senior Data Scientist Dallas, TX

Data Engineer Senior Plano, TX

Click here or scroll down to respond to this candidate

Srividya.C
Senior Data EngineerEMAIL AVAILABLEPHONE NUMBER AVAILABLEProfessional Summary: Seasoned Senior Data Engineer with over 10 years of experience specialize in the Hadoop ecosystem, cloud platforms and ETL processes, Data Ingestion leveraging my diverse technical expertise to drive complex data engineering projects that emphasize security, scalability and efficiency. Extensive hands-on experience with cloud platforms like AWS (EMR, EC2, RDS, S3, Lambda, Glue, Redshift), and Google Cloud Platform (Google Pub/Sub, Big Query)
Skilled in data ingestion, pipeline design, Hadoop information architecture, data modeling, data mining and optimizing ETL workflows Extensive knowledge of the Hadoop ecosystem, including technologies like HDFS, MapReduce, Hive, Pig, Oozie, Flume, Cassandra, Spark with Scala, PySpark, RDD, Data Frame, Spark SQL, Spark MLlib and Spark GraphX.
Strong background in ETL methods utilizing tools such as Microsoft Integration Services, Informatica Power Center, SnowSQL, OLAP and OLTP. Experienced in designing logical and physical data models, implementing both Star Schema and Snowflake Schema concepts.
Proficient in SQL Server, NoSQL databases like DynamoDB, MongoDB, complex Oracle queries with PL/SQL and utilizing SSIS for data extraction and enhancing SSRS reporting and Hands on Experience with BigQuery on GCP.
Experienced in visualization tools like Tableau and Power BI and leveraging Talend's big data capabilities to create scalable data processing pipelines.
Creating and maintaining Golang-based data processing pipelines for handling large data volumes, including ingestion, transformation and loading. Configuring and managing Snowflake data warehouses within Azure and utilizing Terraform across AWS, Azure and Google Cloud.
Designing secure API endpoints with authentication and authorization mechanisms like JWT, OAuth2 and API keys. Expertise in writing complex programs for various file formats, including Text, Sequence, XML and JSON.
Adept in Python, Scala and Shell scripting, with significant experience in UNIX/Linux environments.
Solid understanding of Agile and Scrum methodologies, emphasizing iterative and collaborative approaches, proficient in Test-Driven Development (TDD) and utilizing tools like Jenkins, Docker, CI/CD pipelines such as Concourse and Bitbucket. Familiar with version control systems like Git, SVN, Bamboo. Proficient in using industry-leading testing tools such as Apache JMeter, QuerySurge and Talend Data Quality, to validate data transformation and ETL processes. understanding of networking protocols, including DNS, TCP/IP and VPN, with expertise in configuring and troubleshooting to ensure secure and seamless data communication Experience with version control tools such as GIT and Urban Code Deployment (UCD) tools. Adopt Python, Scala, Java, and Shell scripting, with significant experience in UNIX and Linux environments.Certifications:
AWS certified solutions Architect associate. Microsoft certified Azure solutions Architect expert.Technical Skills:
Cloud Computing Amazon Web Services (EMR, EC2, RDS, S3, Lambda, Glue, Redshift), Google Cloud Platform ( Big Query)
Big Data Technologies Hadoop ecosystem (HDFS, MapReduce, Hive, Pig, Oozie, Flume, Cassandra, Spark with Scala, PySpark, RDD, Data Frame, Spark SQL, Spark MLlib, Spark GraphX)ETL Processes Microsoft Integration Services, Informatica Power Center, SnowSQL, OLAP, OLTP, TalendData Modeling & Databases SQL Server, NoSQL (DynamoDB, MongoDB), Oracle (PL/SQL), Star Schema, Snowflake SchemaProgramming Languages Python, SAS, Scala, Java, Shell scriptingVisualization Tools Tableau, Power BINetworking Protocols DNS, TCP/IP, VPNDevOps & CI/CD Terraform, Jenkins, Docker, Concourse, BitbucketVersion Control Systems Git, SVN, BambooTesting Tools Apache JMeter, QuerySurge, Talend Data QualityMethodologies Agile, Scrum, Test-Driven Development (TDD)Education:
Bachelor of Technology in Computer Science, Acharya Nagarjuna University, IndiaProfessional Experience:Senior Data Engineer May 2023 to PresentGenesis, Beaverton, OR
As a Data Engineer, Assess on-premises infrastructure and data for compatibility with AWS and Python. Configure and provision AWS resources, including Sage Maker instances, S3 buckets, and security settings. Convert existing SAS code to Python code, ensuring it produces the same results. Optimize Python code for performance and scalability on AWS. Develop comprehensive test cases and validation procedures to ensure that the Python code replicates the SAS results accurately. Managed both relational (SQL Server, MySQL, PostgreSQL, Oracle) and NoSQL (DynamoDB, MongoDB, Cassandra) databases, designing, optimizing, and implementing scalable data models, complex queries, and performance optimization. Utilized Sqoop for data import/export from Snowflake, Oracle and DB2. Generated SQL and PL/SQL scripts to manage database objects and gained expertise in Snowflake Database. Conduct testing in both the on-premises SAS environment and the AWS Sage Maker environment. Design, develop, and maintain Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes to efficiently extract data from various sources, transform it into usable formats, and load it into data warehouses or other storage systems. Manage the end-to-end data processing pipeline, ensuring data quality, reliability, and timeliness. Collaborate with cross-functional teams to optimize and streamline data processes. Optimize SQL queries to improve query performance and reduce query execution times. Identify and resolve bottlenecks in database performance. Work closely with data analysts and business intelligence teams to optimize dashboards for querying and reporting. Enhance dashboard performance and usability. Conduct knowledge transfer sessions to empower team members and stakeholders with AWS and Python expertise. Created interactive reports and visualizations using Tableau and Power BI.
Designed and tested dimensional data models using Star and Snowflake schemas, following Ralph Kimball and Bill Inmon methodologies. Utilizing the Boto3 library for establishing connections between S3 and Sage maker while translating the code.Environment: Bit Bucket, Python, AWS, AWS Sage Maker, AWS Lambda. SAS grid Enterprise, Boto3 Library.
Data EngineerMayo Clinic, Rochester, MN July 2021 to April 2023Responsibilities: Led the full lifecycle of data engineering projects, from requirement analysis and planning to deployment and maintenance, aligning with both Agile and Waterfall methodologies. Installed and configured multi-node clusters utilizing EC2 instances, managed AWS monitoring tools like CloudWatch and CloudTrail, created alarms for EBS, EC2, ELB, RDS, S3 and SNS and implemented secure data storage practices in S3 buckets.
Migrated on-premises databases and Informatica ETL processes to the AWS cloud, Redshift and Snowflake platforms using asynchronous task execution with Celery, RabbitMQ and Redis.
Integrated AWS DynamoDB with Lambda and developed Spark code for AWS Glue jobs and EMR, automating tasks using Python. Design, develop, and maintain RESTful APIs and SOAP Webservices for integrating banking applications with third-party systems and services. Utilized Sqoop for data import/export from Snowflake, Oracle and DB2. Generated SQL and PL/SQL scripts to manage database objects and gained expertise in Snowflake Database. Utilized Spark, Pyspark, Hive, Hadoop, Golang and Scala for data analysis, ingestion, integrity checks and handling various data formats like JSON, CSV, Parquet, Avro. Designed and implemented Apache Kafka-based data streaming solutions, enabling real-time data ingestion and processing.
Automated data ingestion from diverse sources like APIs, AWS S3, Teradata and Redshift using Pyspark and Scala. Utilized Oozie for job scheduling within the SDLC. Created interactive reports and visualizations using Tableau and Power BI.
Designed and tested dimensional data models using Star and Snowflake schemas, following Ralph Kimball and Bill Inmon methodologies. Implemented logging, monitoring and error handling mechanisms within REST APIs.
Implemented microservices on Kubernetes clusters used Jenkins for CI/CD and Jira for ticketing and issue tracking. Leveraged deep expertised in version control systems such as Git, SVN and Bitbucket to efficiently manage code repositories, ensuring a consistent and well-documented development process. Utilized advanced testing tools and frameworks such as Apache JMeter, QuerySurge and Talend Data Quality to ensure the accuracy and integrity of ETL processes and data migrations.Environment: AWS, EBS, EC2, CloudWatch, CloudTrail, S3, SNS, Redshift, Snowflake, Celery, RabbitMQ, Redis, DynamoDB, Lambda, Glue, EMR, SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, Sqoop, Spark, PySpark, Hive, Hadoop, Golang, Scala, JSON, CSV, Parquet, Avro, Teradata, Oozie, Tableau, Power BI, Star Schema, Snowflake Schema, Ralph Kimball, Bill Inmon, REST APIs, Kubernetes, Jenkins, Jira, Git, SVN, Bitbucket, Apache JMeter, QuerySurge, Talend Data Quality.Senior ETL Developer/Data Engineer
Homesite insurance, Boston, MA November 2018 to June 2021Responsibilities: Developed and managed automated ETL pipelines using Python, Spark and PySpark, utilizing Airflow within the Google Cloud Platform (GCP) for seamless data ingestion and database updates.
Led the migration of on-premises Hadoop systems to GCP, harnessing the capabilities of Cloud Storage, Dataproc, Dataflow and BigQuery. Performed proof-of-concept evaluations to compare self-hosted Hadoop with GCP's Data Proc and explored BigTable performance improvements within the GCP ecosystem.
Architected large-scale data warehousing and integration solutions across platforms like Snowflake Cloud, AWS Redshift and Informatica Intelligent Cloud Services (IICS).
Devised workflows and mappings through Informatica ETL tools, integrating various relational databases and leveraged Power BI for reporting and data visualization following Oracle to BigQuery migrations.
Designed data models within Neptune, efficiently loading data into the Neptune database and employing the Gremlin query language for complex data querying. Utilized Grafana to create dashboards for real-time monitoring of metrics from the Cassandra database.
Created custom User-Defined Functions (UDFs) in Pig and Hive, enriching Pig Latin and HiveQL with Python functionality and deployed applications on servers such as Glassfish, Tomcat and CGI for enhanced interoperability. Designed, implemented, and maintained high-performance and scalable database systems across various platforms including SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, and Cassandra. Established database structures that reduce redundancy and increase query efficiency. Utilized Terraform Cloud and Terraform Enterprise for collaboration, state management and execution of Terraform configurations in a secure and centralized manner. Implemented RESTful APIs using Golang to expose data processing functionalities to other applications and services. Designed and implemented robust Kafka clusters to facilitate real-time data streaming, ensuring high availability, fault tolerance and optimal performance across various data sources and applications. Coordinated and performed integration testing across different data platforms and applications, ensuring seamless interaction and data flow between various systems, including databases, APIs, and third-party tools.Environment: Python, Spark, PySpark, Airflow, Google Cloud Platform (GCP), Hadoop, Cloud Storage, Dataproc, Dataflow, BigQuery, BigTable, Snowflake Cloud, AWS Redshift, Informatica Intelligent Cloud Services (IICS), Power BI, Neptune, Gremlin, Grafana, Cassandra, Pig, Hive, Glassfish, Tomcat, CGI, SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Terraform Cloud, Terraform Enterprise, Golang, Kafka.Azure Data Engineer July 2015 August 2018
IDB Bank, NYC, NYResponsibilities: Led the transition from legacy applications to a cloud-based data environment, ensuring seamless integration and minimal downtime. Managed large volumes of transactions, ensuring accurate data migration and transformation in cloud environments. Developed scalable data pipelines using Azure Data Factory, T-SQL, Spark SQL, and U-SQL, enabling efficient data ingestion into Azure services. Optimized Spark clusters on Azure Databricks, resolving customer issues and enhancing overall cluster performance. Migrated on-premises data to Azure using Kafka and Spark Streaming, supporting real-time data processing and ingestion. Designed target data architecture, ensuring alignment with business needs and technical specifications for cloud-based applications. Implemented and managed IaaS and PaaS solutions in Azure Cloud, leveraging Azure Portal and PowerShell for efficient deployment. Managed CI/CD pipelines using Jenkins, integrating with Kubernetes for automated deployments and scaling. Applied machine learning algorithms using scikit-learn and MATLAB to improve fraud detection and risk assessment models in banking. Enhanced Hadoop performance using Spark, automating test scripts development in Python for continuous integration. Designed and tested dashboard features using CSS, JavaScript, Django, and Bootstrap, enhancing user experience and interface design. Developed and implemented high-performance ETL processes using Spark (Scala/Python/Java), Hive, and Kafka to ingest and process large volumes of financial data, supporting real-time fraud detection and risk assessment in banking.Environment: Python, Django, SQL, JavaScript, Linux, Shell Scripting, Mongo DB HTML, Angular JS, Eclipse jQuery, JSON, XML, CSS, MySQL, Bootstrap, Hadoop, HDFS, Map Reduce, Pyspark, Spark SQL, ETL, tableau, Hive, Pig, Oozie, Databricks, Sqoop, Azure, Star Schema, Nifi, Cassandra, Power BI, Machine Learning.Software Engineer
IBing Software Solutions Private Limited Hyd India June 2014 to March 2015Responsibilities:
Collaborated with Business and IT groups to address requirements for user stories, process flows, outcomes from UAT, and status updates.
Oversee and direct teams both on-site and offshore. Developed/Created test strategies, plans, and cases in various settings based on business requirements. Involved in gathering requirements, collaborating with offshore testing and development teams to identify automation scenarios, and evaluating test automation outcomes with test teams, in addition to providing oversight.
Used Postman to conduct System Integration, UAT, Regression, Accessibility, and API testing.
Performed System Integration Testing, UAT Testing, Regression, Accessibility Testing and API Testing using Postman. Expertise in using Reporting tool Tableau and Test case creating/executing tools like JIRA, ALM, and Defect Tracking and Bug Reporting Tools like JIRA, ALM and HP Quality Center. Experience in SQL used to validate the test results and pull Data based on requirement. Identifying the Automation scenarios to enhance efficiency and effectiveness in the testing process. Experienced in SDLC process and worked on both Agile/Scrum and Waterfall methodologies. Actively attended Daily Scrum meetings and participated 213in PI Planning s. Expert in test case Creation, Execution, bug tracking and Improved product quality by developing root cause resolution summaries for all the defects.
Lead, Participated and supported in post-implementation reviews and documented testing process lessons learned after major releases.
Tools: JIRA, ALM, test, Tableau, Postman, SDLC Process, TOSCA, SQL /Oracle Developer, Agile Methodology.

Respond to this candidate
Your Message
Please type the code shown in the image: