Sr Data Engineer Resume Austin, TX

Sr Data Engineer Resume Austin, TX
Resumes | Register
Candidate Information
Title	Sr Data Engineer
Target Location	US-TX-Austin
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Senior big data engineer San Marcos, TX
Technology Lead Data Engineer Round Rock, TX
Data Quality Engineer Austin, TX
Data Engineer Big San Antonio, TX
Software Engineer Data Management Georgetown, TX
Data Engineer Integration Austin, TX
Data Engineer Austin, TX
Click here or scroll down to respond to this candidate
SuryanarayanaEMAIL AVAILABLEPHONE NUMBER AVAILABLEProfessional Summary:Senior Data Engineer with 12+ years of IT experience including 7+ years Big Data and Analytics field in Storage, Querying, Processing and Analysis for developing E2E Data pipelines in Cluster and Cloud Platforms with wide range of Cloud services in Retail, Telecom and Financial domains.Extensive experience in Analysis, Design, and Development in GCP Data Engineering using technologies Spark, MapReduce, Hive, HBase, HDFS, Sqoop, Kafka and GCP cloud services including programming languages like Python, Scala and Pyspark.Extensive experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Dataflow, Cloud Data Proc, Google Cloud Storage, Cloud Functions, EventArc Triggers, Pub/Sub, Cloud SQL, Cloud spanner and Bigtable.Hands on Experience on Databases: Teradata, Oracle, HBase, MongoDB, MYSQL and PL/SQL.Expertise in optimizing Teradata database performance through strategic SQL query optimization and efficient ETL process development, ensuring streamlined data management and retrieval.Created Data Frames and performed analysis using Spark SQL.Hands on expertise in writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala and Python.Excellent understanding of Spark Architecture and framework, Spark Context, APIs, RDDs, Spark SQL, Data frames, Streaming.Experience working with cloud providers - GCP.Worked on Hive data analysis, NoSQL databases such as HBase.Experience in migrating Hadoop data to Big Query in Google Cloud EnvironmentExtensive experience in working with structured data using HQL, join operations, writing custom UDFs and experienced in optimizing Hive Queries.Experience in importing and exporting data using Sqoop from HDFS to Relational Database.Experience developing large-scale batch and real-time data pipelines.Having 4 years of Experience using workflow management and scheduler tools like Apache Airflow, oozie, Autosys etc.Experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data validation, Data analysis, Reporting, and data warehousing environments.Expertise in trouble shooting, debugging, performance tuning and optimization of slow running ETL/ELT jobs using push down optimization, Partitioning techniques to manage large volume of data.Experience in Data Modeling using Dimensional Data Modeling techniques like Star Schema and Snowflake Modeling.Possess in-depth knowledge of Database Concepts, Design of algorithms, SDLC, OLAP, OLTP, Data marts and Data Lake.Experience in all stages of Software Development Lifecycle (SDLC)  Agile, Scrum, Waterfall methodologies, right from Requirement analysis to development, testing and deployment.Technical SkillsOperating SystemsUNIX, MAC OS X, WindowsCloudGCP (Data Pipelines, Data Flow, Data Proc, Cloud Composer, Big Query, Airflow, Pub/sub)Programming LanguagesPython, Scala, SQL, PLSQL, Shell scriptingDatabasesOracle, PL/SQL, SQL Server, Teradata, MySQL, Hive, HBase, SnowflakeBig Data Eco SystemHadoop HDFS, Pig, Hive, Sqoop, Zookeeper, Yarn, Spark, Storm, Impala, Flume, Kafka, HBase, Py Spark, AirflowMicrosoft toolsExcel, Visio, Access, PowerPointMethodologiesWaterfall, AgileVersion ControlGitHub, JiraCertificationsGoogle Cloud Certified Professional Data EngineerPCEP  Certified Entry Level Python Programmer (PCEP-30-02)Professional ExperienceClient: Verizon, Irving, TX Apr 2024  Till DateRole: Sr Data EngineerResponsibilities:Designed and implemented data pipelines in GCP to ingest, process, and store large volumes of telecom data using Snowflake and SQL.Optimized complex SQL queries to improve performance and reduce runtime by 40% in data transformation processes.Developed ETL processes using Python and SQL to automate data extraction, transformation, and loading into Snowflake, ensuring data integrity and accuracy.Integrated Snowflake with GCP BigQuery to enable seamless data migration and cross-platform analytics.Created and managed Snowflake data warehouses to support scalable, secure, and high-performance data analytics solutions.Built dynamic and robust dashboards in Looker using SQL and Snowflake data to provide real-time insights and analytics for healthcare claims.Collaborated with cross-functional teams to define data requirements and deliver custom solutions using GCPs data tools, Snowflake, and SQL.Implemented data governance and security best practices in Snowflake and GCP, ensuring compliance with healthcare industry standards and regulations.Environment: Python, Snow Flake, Dataflow, GCP, SQL, Pub/Sub, APIGEEClient: Tech Target, Texas, TX Apr 2023  Apr 2024Role: Sr Data EngineerResponsibilities:Spearheaded the implementation of Google Cloud Platform (GCP) services, including Big Query, Dataflow, and Bigtable, to streamline data pipelines, significantly enhancing efficiency and scalability within TechTarget's infrastructure.Designed and developed complex data processing workflows using Apache Airflow, crafting intricate Directed Acyclic Graphs (DAGs) to orchestrate data transformations and ensure seamless execution of tasks.Played a pivotal role in leveraging Bigtable to populate data in the user interface (UI) of Priority Engine, TechTarget's flagship product, enhancing the platform's performance and enabling real-time data access for users.Collaborated closely with cross-functional teams to analyze requirements, architect solutions, and implement best practices for data management and orchestration, ensuring alignment with organizational objectives and industry standards.Proficient in deploying and managing Apache Spark clusters on Google Cloud Platform using Dataproc, ensuring optimal performance and scalability for big data processing workflows.Experienced in configuring Dataproc clusters to leverage Spark's parallel processing capabilities, maximizing efficiency in data ingestion, transformation, and analysis.Optimized data ingestion processes by fine-tuning Dataflow pipelines, enhancing data quality, reducing latency, and improving overall system reliability, resulting in a more responsive and robust data ecosystem.Implemented monitoring and alerting systems for critical data pipelines and workflows, utilizing GCP's monitoring and logging capabilities to proactively identify and address potential issues, minimizing downtime and ensuring data integrity.Actively participated in continuous improvement efforts, conducting performance analysis, identifying bottlenecks, and implementing optimizations to enhance the overall efficiency and performance of TechTarget's data infrastructure.Environment: Python, Spark, Big Query, Dataflow, Dataproc, MySQL, Airflow, GCPClient: Bank of America, Charlotte, NC Sept 2021  Dec 2022Role: Sr Data Engineer  GCP LeadResponsibilities:Experience in migrating existing on-premises databases to Google Cloud Environment for better reporting experience.Extensive experience in Server infrastructure development on GCP Cloud by using GCP services like VM instances, Big Query, Pub/sub, Dataflow, Dataproc, Google Cloud Storage, Source repositories, Security, Log monitoring, Alert monitoring, IAM role management.Extensive knowledge in GCS (Google cloud Storage) buckets and creating efficient buckets depending on the business requirement.Developed various shell scripts to get the monthly billing report, user access report for Big Query load and query tables in Big Query and to move the data into storage.Strong knowledge in writing SQL Queries that varies from single table simple SQLs to multiple table complex SQLs.Proficiency using TOAD or equivalent developer tools.Created and modified SQL, PL/SQL and SQL Loader scripts for data conversions.Implemented the ETL pipelines using Dataflow, Dataproc, Cloud functions and Big Query.Automated data and ETL- Batch pipelines using Airflow to interact with services like Dataproc, GCS, Big Query and Hive.Involved in orchestrating and scheduling the jobs in Airflow and Autosys.Configured spark-submit config parameters as per the cluster in different environments.Optimized Spark code by using cache, re-partitioning, broadcast join, etc.Implemented retail application on Cloudera Platform using Hive, Spark, Scala Spark.Experience in creating and maintaining multiple projects in GCP based on the business requirement.Generated reports to get the user queries and access information and used the reports for GCP billing and reporting.Performed benchmark tests to read data from database, object store using pandas and Py Spark APIs to compare results, identify potential improvement areas and provide recommendations.Read and write Parquet, JSON files from S3 buckets using Spark, Pandas data frame with various configuration.Monitor data alerts, reporting alerts, regression test failure alerts and act, escalate and resolve the issues.Work closely with the application customers to resolve JIRA tickets related to API issues, data issues, consumption latencies, onboarding and publishing data.Created and implemented multi-sensor data fusion algorithms.Develop and enhance the application software development kit to provide APIs to access data and analyze using Jupiter notebook environment for EDA, Data visualization and Machine learning.Perform peer review, maintain code coverage, automate the application using CI/CD.Environment: Python, Spark, Big Query, Dataproc, Dataflow, Logging and monitoring, Security, IAM, Hadoop, Hive, Teradata, SQL, AirflowClient: Kohls, Milpitas, CA Nov 2017  Aug 2021Role: Sr Data Engineer  GCP LeadDescription: The goal of the project is to create an accurate, updated, and accessible data table(s), reports, and dashboards by review of business rules, metrics, activations, feature adds, deactivations, upgrades.Responsibilities:Designed and developed secure ETL data pipeline on the Hadoop ecosystem for diverse use cases.Reduced customer churn by 10% by performing critical analysis on customer data.Process and cleanse the collected data using Spark.Created topics for Kafka and helped in producing the messing feeds and making those are subscribed by the consumers.Used spark to parse XML files and extract values from tags and load it into multiple hive tables.Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Py Spark concepts.Developed Py Spark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the source to target mappings developed.Performed benchmark tests to read data from database, object store using pandas and Py Spark APIs to compare results, identify potential improvement areas and provide recommendations.Developed Scripts for data migrating from Hive to HBase.Orchestration and Scheduling has been implemented using Cloud Composer and Apache Airflow.Implemented local airflow implementation for each developer for unit testing on GCP platform.Read and write Parquet, JSON files from S3 buckets using Spark, Pandas data frame with various configuration.Develop code to assign default life cycle policy to buckets, objects and auto purge objects based on default policy in mercury which is internal implementation of GCS Storage.Work closely with the application customers to resolve JIRA tickets related to API issues, data issues, consumption latencies, onboarding, and publishing data.Created and implemented multi-sensor data fusion algorithmsReduced the latency of jobs by configurations, tweaking and other performance tuning techniques.Implemented Spark using Python and Spark SQL for faster testing and processing of data.Design data ingestion and integration process using SQOOP, Shell Scripts & Pig with HiveOptimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD'sExperience in working with Avro, parquet file formats and used compression techniques to leverage storage in HDFS.Designed and implemented efficient data pipelines using Snowflake's Snowpipe and Streams for real-time data ingestion and processing.Optimized query performance through advanced techniques such as micro-partitioning, result caching, and clustering keys, resulting in significant reductions in query execution times.Developed and maintained scalable data models within Snowflake, ensuring optimal storage, retrieval, and analysis of large datasets for various business applications.Managed security configurations, including role-based access control, data masking, and encryption, to ensure data privacy and compliance with industry regulations.Environment: HDFS, Hive, Sqoop, Oozie, Spark, Kafka, Zookeeper, Python, HBase, Flume, GCP (Data Fusion, Big Query, Cloud Storage, Compute Engine, Cloud SQL, Cloud Dataproc), Airflow, SnowflakeClient: AmFam Houston, Tx Jan 2015  Oct 2017Role: Senior Associate ConsultantDescription: This application is used by customer service representatives to display billing info, services active on the customer account, devices on the account, their health status and recommendations for the customer.Responsibilities:Involved in Enterprise Cash View Project  capacity planning, development, deployment, tuning and benchmarking the workflow pipelines with Operations team.Gathered the business requirements from the Business Partners and Subject Matter Experts.Responsible for designing and managing the Sqoop jobs that uploaded the data from MySQL to HDFS and Hive.Created Hive tables to store the processed results in a tabular format.Developed Unix, Hive, Pig scripts and Map reduce programs.Developed Scripts for pulling logs from different nodes into HDFS.Developed Scripts for data migrating from Hive to HBase.Developed scripts using Cloudera manager API to get Metrics.Migrated Consumer Credit Risk applications built on top of Teradata and Oracle onto Hadoop, facilitated data access from Impala into finance & security teams for building Federal regulation reports.Built data stability framework on Hadoop using Spark and Python to normalize the text files, apply transformations and finally load data in Avro and Parquet files.Developed Spark scripts to convert existing Oracle stored procedures for data processing and load them into Hive tables.Worked on optimization and performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.Used Impala as an MPP engine for ad hoc querying.Developed Oozie workflows and sub workflows to automate the ETL process, which are scheduled from Autosys Scheduler.Involved in the design, development and testing of the PL/SQL packages and stored procedures for the ETL processes.Have good Informatica ETL development experience and maintaining offshore and onsite teams and involved in ETL Code reviews and testing ETL processes.Applied Slowly Changing Dimensions like Type 1 SCD and Type 2 SCD mappings to update slowly Changing Dimension Tables.Worked with Informatica Cloud to create Source /Target connections, monitor, synchronize the data in the data warehouse.Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Scala, Spark, Spark-SQL, Cloudera Manager API Linux, HBASE, Oracle, SQL server, Unix, Linux, Shell scripting, ETL.IGATE Global Solutions Ltd, Hyderabad Dec 2010  Dec 2014Role: Senior Software EngineerDescription: ETL center of excellence development services team objective is to provide cost effective and efficient solutions to its client using Informatica power center, metadata manager and data quality. The group constantly evaluates data integration technologies to expand its horizon and add latest technologies set which helps our client to achieve competitive in terms of business solutions.Responsibilities:Backend data processing using PERLInvolved in JavaScript functions for client-side Validations.Involved in Database Design Backend data processing using PERL.Involved in Designing, coding, unit testing and implementation.Worked on Change Requests.Responsible for debugging the codes and running them.Responsible for writing the codes in MySQL and JAVASCRIPT.Implement and maintain components to the solution (design and development development).Development of platform components using object-oriented Perl and Unix shell scriptingEnhance, develop and deliver training materials.Development of complex applications using object-oriented Perl and Unix shell scripting.Improve Perl deployment at the Firm.Develop Perl modules for internal use.Help developers with Perl related problems.Environment: PERL, HTML, MYSQL, Unix, Linux, Shell scripting, ETL, CSS and Java ScriptEducational Qualification:Andhra University, Visakhapatnam, IndiaMaster of Technology in Computer Science and Information Technology 2008Bachelor of Technology in Maths, Physics and Electronics 2006
Respond to this candidate
Your Message
Please type the code shown in the image: