Senior Data Engineer Resume Frisco, TX

Senior Data Engineer Resume Frisco, TX
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Senior Data Engineer
Target Location	US-TX-Frisco
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Senior Plano, TX

Data Engineer Senior Denton, TX

Senior Data Engineer Irving, TX

Data Engineer Senior Irving, TX

Data Engineer Senior Dallas, TX

Principal Machine Learning Engineer | Senior Data Scientist Dallas, TX

Data Engineer Senior Plano, TX

Click here or scroll down to respond to this candidate

Name: Candidate's Name
Sr. Data EngineerPhone: PHONE NUMBER AVAILABLEEmail: EMAIL AVAILABLE Linkedin: LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY: Over 9+ years of experience in data analysis and data engineering covering the whole data lifecycle, from data ingestion, wrangling and modeling, to data visualization and insight discovery.
Proficient in Apache Hadoop, Apache Spark, Apache Flink, and Apache Beam for large-scale data processing, enabling efficient extraction, transformation, and loading (ETL) tasks for handling massive datasets. Good experience in handling big data framework including Hadoop, Map-Reduce, HDFS, Yarn, HBase, Hive.
Skilled in AWS (S3, EC2, EMR, Glue, Redshift), GCP (BigQuery, Dataflow, Pub/Sub), and Azure for building scalable data pipelines, ensuring reliable data processing and management in cloud environments. Experienced with data warehousing platforms like Apache Hive, Apache HBase, Cassandra, BigQuery, Snowflake, and Azure Synapse Analytics, facilitating effective storage and retrieval of structured and semi-structured data. Expertise in data ingestion using Apache Kafka, Apache NiFi, Apache Flume, AWS Glue, GCP Dataflow, and Azure Data Factory, ensuring seamless movement of data into the processing pipeline. Strong in ETL and data processing with Apache Sqoop, Apache Pig, Apache Beam, AWS Lambda, GCP Dataprep, and Azure Databricks, enabling efficient data manipulation and transformation. Skilled in managing various databases including MySQL, PostgreSQL, Oracle, MongoDB, Bigtable, and SQL Server, ensuring effective storage and retrieval of data. Proficient in Git and SVN for version control and collaboration, ensuring seamless code management and collaboration within development teams. Strong programming skills in Python, Scala, Java, and SQL, facilitating the development of robust data processing pipelines and analytical algorithms. Hands-on experience of RDBMS such as MySQL, Postgres as well as NoSql database such as HBase, Cassandra, MongoDB.
Experienced in data visualization with Tableau, Power BI, Looker, and Google Data Studio, enabling the creation of insightful visualizations to communicate complex data insights effectively. Familiar with Docker and Kubernetes for containerization and orchestration, enabling the deployment and management of containerized applications and microservices. Knowledgeable in monitoring and logging with ELK Stack, Grafana, and Prometheus, facilitating real-time monitoring and performance optimization of data processing pipelines. Skilled in job scheduling and workflow management using Apache Airflow, Apache Oozie, and AWS Step Functions, enabling the automation and orchestration of complex data processing tasks. Familiarity with machine learning tools such as TensorFlow, PyTorch, Scikit-learn, and Jupyter Notebooks, facilitating the development and deployment of machine learning models for data analysis tasks.TECHNICAL SKILLS:BigData/Hadoop TechnologiesAWS EMR, S3, EC2-Fleet, Spark-2.2, 2.0 and 1.6, Hortonworks HDP, Hadoop, Mapreduce, Pig, Hive, Apache Spark, SparkSQL, Informatica Power Center 9.6.1/8.x, Kafka, NoSQL, Elastic Mapreduce(EMR), Hue,YARN, Nifi, Impala, Sqoop, Solr, OOZie.LanguagesJava, Scala, SQL, UNIX shell script, JDBC, Python, Perl.Cloud EnvironmentAWS (Amazon Web Services), Microsoft Azure, GCP ( Google Cloud Platform)Operating Systems
All versions of Windows, UNIX, LINUX, Macintosh HD, Sun SolarisWeb Design ToolsHTML, CSS, JavaScript, JSP, jQuery, XMLDevelopment ToolsMicrosoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans.Public CloudEC2, IAM, S3, Auto scaling, CloudWatch, Route53, EMR, RedShiftDatabasesOracle 10g, 11g, 12c, Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, DB2, Teradata, NetezzaNO SQL DatabasesCassandra, HBase, MongoDB, MariaDB
Development MethodologiesAgile/Scrum, UML, Design Patterns, WaterfallBuild ToolsJenkins, Toad, SQL Loader, PostgreSql, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UIReporting ToolsMS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.Education: Bachelor of Technology in Computer Science (2013), Gitam University - India. Masters in Computer Science(2019), GPA 3.5,University of Central Missouri USA.
PROFESSIONAL EXPERIENCE:Client: Macy s, New York, NY.(Remote) Dec 2022 - PresentSr. Data EngineerResponsibilities: Designed, created, and implemented RDBMS and NoSQL databases, including building views, indexes, and stored procedures to optimize data access and manipulation. Architected and deployed multi-tier applications leveraging the full spectrum of AWS (EC2, Route53, S3, RDS, DynamoDB, SNS, SQS, IAM, CloudFormation, Glue, Lambda, Athena, EMR, Redshift) and GCP (Compute Engine, Cloud DNS, Cloud Storage, Cloud SQL, Bigtable, Pub/Sub, IAM, Dataflow, BigQuery) services, emphasizing high availability, fault tolerance, and auto-scaling using AWS CloudFormation. Supported continuous storage in both AWS and GCP using Elastic Block Storage, S3, and equivalent GCP services, ensuring data durability and accessibility. Conducted a proof-of-concept deployment in AWS S3 bucket and Snowflake, automating ingestion processes using Python and Scala from various sources such as APIs, AWS S3, and Snowflake. Developed Spark workflows using Scala for data extraction from AWS S3 bucket and Snowflake, applying transformations to optimize data processing. Implemented ETL migration services using AWS Glue and GCP Dataflow for serverless data pipelines, integrating with Glue Catalog in AWS and BigQuery in GCP for data querying. Designed and developed ETL processes in AWS Glue and GCP Dataflow to migrate flightaware usage data from S3 data source to Redshift and BigQuery respectively, optimizing data storage and processing. Created Databricks notebooks using SQL, Python, and automated notebooks using jobs, estimating cluster size, and monitoring and troubleshooting Spark Databricks clusters. Utilized AWS Glue and GCP Dataflow for data transformations, and AWS Lambda and GCP Cloud Functions to automate processes. Developed scripts to read CSV, JSON, and Parquet files from S3 buckets in Python and loaded into AWS S3, DynamoDB, and Snowflake, with equivalent services in GCP. Migrated data from AWS S3 bucket to Snowflake and from GCP Cloud Storage to BigQuery by writing custom read/write utility functions using Scala. Worked on Snowflake schemas and data warehousing, processing batch and streaming data load pipeline using Snow Pipe and Matillion from AWS S3 bucket and equivalent services in GCP. Conducted data modeling for product information and customer features, building data warehouse solutions to support BI activities in both AWS Redshift and GCP BigQuery. Executed SQL queries on RDBMS such as MySQL/Postgres and HiveQL on Hive tables for data extraction and preliminary analysis. Built data pipelines including ingestion, transformation (aggregation, filtering, cleaning), and storage, handling SQL and NoSQL databases and multiple data formats such as XML, JSON, and CSV. Conducted data ingestion of real-time customer behavioral data into HDFS using Flume, Sqoop, Kafka, and data transformation using Spark Streaming. Managed data management and query using Spark and dealt with streaming data using Kafka to ensure fast and reliable data transfers and processes. Leveraged AWS S3 and GCP Cloud Storage as storage solutions for HDFS, AWS Glue and GCP Dataflow as the ETL solutions, and AWS Kinesis and GCP Pub/Sub as the data streaming solutions to deploy data pipelines on cloud. Migrated data warehouse from RDBMS to AWS Redshift and GCP BigQuery, analyzing log data using AWS Athena on S3 and GCP BigQuery, and maintaining Hadoop cluster using AWS EMR and GCP Dataproc. Conducted data cleansing, manipulation, and wrangling using Python to eliminate invalid datasets and reduce prediction error. Conducted A/B tests on metrics such as customer retention, acquisition, sales revenue, and volume growth to assess the performance of products. Leveraged Pandas, NumPy, and Seaborn for exploratory data analysis. Extended Hive functionality by using User Defined Functions including UDF, UDTF, and UDAF. Developed predictive modeling using Python packages such as SciPy and scikit-learn, as well as mixed-effect models and time series models in R based on business requirements. Conducted dimension reduction with PCA and feature engineering with Random Forest to capture key features for predicting annual sales and best-purchased product using Python and R. Created Hive integrated Tableau dashboards and reports to visualize the time series of purchase value, delivering business insights to stakeholders. Utilized Git for version control, Maven for Java project build, test, and deploy. Created SSIS packages to move data from Oracle to SQL Server and equivalent services in GCP, facilitating data migration and integration.Environment: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python,Scala, Big Data Technologies (Hadoop, Hive, Pig, Spark, Kafka, Flume, Sqoop, MapReduce, Oozie, PySpark, HDFS, HBase), MySQL, PostgreSQL, Oracle, SQL Server, DynamoDB, Google BigQuery, Google Cloud SQL,IBM DB2, Kubernetes, Jenkins, GIT, Airflow, Snowflake, TableauClient: Change Healthcare, Nashville, Tennessee(Remote) Aug 2021 - Dec 2022Azure Data EngineerResponsibilities: Analyzed, designed, and built modern data solutions using Azure PaaS services to support data visualization and analytics. Assessed the impact of new implementations on existing business processes by understanding the current production state of applications. Extracted, transformed, and loaded data from source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL in Azure Data Lake Analytics. Ingested data into multiple Azure services, including Azure Data Lake, Azure Storage, Azure SQL Database, and Azure Synapse Analytics (formerly Azure SQL Data Warehouse), and processed data in Azure Databricks. Performed data ingestion, transformation, and cleaning using Python, leveraging libraries such as NumPy and Pandas. Implemented and evaluated punctuated text data as a post-processor for speech recognition RNNs using Keras and TensorFlow. Conducted ETL processes using Azure Databricks and migrated on-premises Oracle ETL processes to Azure Synapse Analytics. Automated script generation using Python scripting and performed data curation with Azure Databricks. Designed and developed high-availability (HA) deployment models with Azure Classic and Azure Resource Manager. Developed data augmentation techniques for synthetic text and voice data, pre-processing raw data, and conducting data wrangling tasks such as grouping, aggregation, filtering, and replacing missing values using Python. Utilized Azure Data Lake Storage Gen2 to store and retrieve data in various formats, including Excel and Parquet files, using the Blob API. Employed tree-based ensemble algorithms such as XGBoost and AdaBoost for feature extraction and selection. Collaborated with machine learning teams specializing in acoustics, leveraging toolkits such as NLTK. Built analytical and predictive algorithms to identify correlations between features and conducted hypothesis testing to determine the significance level of findings. Designed and implemented data pipelines using Azure Data Factory for orchestrating data workflows and integrating various data sources. Leveraged Azure Synapse Analytics for large-scale data warehousing and real-time analytics, optimizing performance through partitioning and indexing. Created interactive dashboards and reports using Power BI, integrating data from multiple Azure sources to deliver business insights. Implemented Azure Stream Analytics for real-time data processing and analytics, enabling timely decision-making based on streaming data. Employed Azure Machine Learning to build, train, and deploy machine learning models, enhancing predictive analytics capabilities. Utilized Azure Logic Apps and Azure Functions to automate workflows and integrate various Azure services seamlessly.Environment: Microsoft Azure Cloud Services, Python, SQL, NumPy, Pandas, Keras, Tensor Flow, NLTK, Azure CLI, Azure HD Insights, Eclipse, IntelliJ, Power BI.Client: Fifth Third Bank, Evansville, Indiana May 2019 August 2021Data EngineerResponsibilities: Collaborated with the Data Services team and business stakeholders to understand business processes and develop analytical insights. Worked with the source team to comprehend the format and delimiters of data files. Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive semi-structured and unstructured data. Loaded unstructured data into Hadoop Distributed File System (HDFS) and AWS S3. Created a data lake by extracting data from various sources into HDFS and AWS S3. Data sources included RDBMS, CSV, and XML formats. Consolidated, validated, and cleansed manufacturing, customer, and product quality data from a vast range of sources from databases to files. Data files were validated by various Spark jobs written in Scala. Loaded the aggregate data into a relational database and AWS Redshift for reporting and analyses, revealing ways to lower operating costs, boost throughput, and improve product quality. Created Spark RDDs from data files stored in HDFS and AWS S3, then performed transformations and actions to other RDDs. Created Hive tables with dynamic and static partitioning, including buckets for efficiency. Also created external tables in Hive for staging purposes. Loaded Hive tables with data, wrote Hive queries which run on MapReduce, and created customized BI tools for manager teams to perform query analytics using HiveQL. Loaded data from UNIX file systems to HDFS and AWS S3. Imported data from various relational databases (SQL Server, Sybase IQ) into Hadoop and AWS ecosystems using Sqoop. Imported and exported data using Sqoop from HDFS, Hive tables, and AWS S3 to relational databases and vice versa. Aggregated RDDs based on business requirements, converted RDDs into DataFrames, saved them as temporary Hive tables for intermediate processing, and stored in HBase/Cassandra, RDBMS, and AWS DynamoDB. Used Spark SQL to perform analytics on large data sets stored in HDFS and AWS S3. Developed Spark scripts with Scala shell commands as per business requirements. Converted Cassandra/Hive/MySQL/HBase queries into Spark RDDs using Spark transformations and Scala. Conducted POCs on migrating to Spark and Spark Streaming using Kafka to process live data streams and compared Spark performance with Hive and SQL. Ingested data from RDBMS and AWS Redshift, performed data transformations in Spark, and exported the transformed data to HBase/Cassandra and AWS DynamoDB as per business requirements. Set up Oozie workflow jobs for Hive, Sqoop, HDFS, Spark, and AWS Glue actions. Created fact and dimension tables from Hive data and AWS Redshift for reporting purposes.Environment: Big Data Technologies (Hadoop) Hive, Pig, Spark, PySpark, Kafka, MapReduce, Sqoop, Flume, Apache NiFi, Zookeeper, HDFS, HBase, Amazon Web Services (AWS), Python, Scala, SQL, Snowflake, Airflow, JiraClient: Century Link, Jacksonville, Arkansas Dec 2015 May 2019Big Data EngineerResponsibilities: Spearheaded data ingestion initiatives utilizing Apache Spark, seamlessly loading data from diverse sources including RDBMS, CSV, and XML files, ensuring efficient and scalable processing. Orchestrated data cleansing and transformation processes leveraging Apache Spark and Hive, meticulously handling ETL tasks to repair data, identify sources for audit purposes, and apply filtering, resulting in enhanced data quality. Executed comprehensive data consolidation strategies employing Apache Spark and Hive, facilitating the generation of data in required formats, optimizing data repair, and enabling seamless storage back to HDFS. Facilitated the seamless migration of computational code from HQL to PySpark, ensuring compatibility and efficiency in data processing workflows. Demonstrated extensive proficiency in Apache Spark for large-scale data processing and analytics, encompassing data ingestion, transformation, and machine learning tasks. Developed high-performance Spark applications using Scala or Python, optimizing performance by leveraging Spark RDDs, DataFrames, and Datasets. Applied deep expertise in Spark Streaming for real-time data processing, crafting streaming applications capable of handling high-volume and high-velocity data streams. Led initiatives in Apache Kafka for building distributed streaming platforms, overseeing data ingestion, real-time event processing, and building data pipelines. Implemented robust Kafka clusters, effectively managing topics, partitions, and consumer groups to ensure seamless data processing. Proficiently integrated Spark and Kafka, leveraging Spark's Kafka integration to consume and process data streams efficiently, enhancing data processing capabilities. Demonstrated strong understanding and proficiency in Apache Hive for distributed data warehousing and SQL-like querying on large data sets, optimizing performance through schema design and optimization techniques. Leveraged Hadoop Distributed File System (HDFS) for distributed storage and data processing, ensuring efficient data ingestion, replication, and fault tolerance. Utilized various file formats commonly used in big data ecosystems, such as Parquet, Avro, ORC, and JSON, optimizing data storage and processing efficiency. Proficiently troubleshooted and debugged issues in Spark, Kafka, and Hive environments, implementing performance tuning strategies to optimize resource utilization and enhance system efficiency. Spearheaded the development of Python code to gather data from HBase (Cornerstone), designing and implementing solutions using PySpark to streamline data retrieval processes. Led the creation of Micro Services to retrieve data from frontend systems, facilitating various data retrieval patterns and enhancing data accessibility. Developed and maintained Hive tables on top of clean data, enabling the creation of Tableau reports for comprehensive data visualization and analysis. Engineered an Ingestion Framework for seamless file ingestion from SFTP to HDFS using Apache NiFi, streamlining the ingestion of financial data into HDFS. Developed and maintained scripts for loading log data using FLUME, ensuring seamless data storage in HDFS. Managed the creation of Hive tables for data loaded into HDFS, applying Context-N-Gram functionality to generate Trigram's frequency for given data sets, facilitating data analysis and insights generation. Collaborated closely with Data Science teams, providing Trigram Frequency data for refining accurate models and understanding near-failure nature for hardware, ensuring data-driven decision-making. Transformed INFORMATICA ETL logic into SPARK using SPARK Data Frames API, streamlining data transformations, ETL jobs, and SPARK SQL processing for BI aggregations and reporting needs.Environment: Big Data, Hadoop, HDFS, Map Reduce, Hive, HBase, Sqoop, Oozie, AWS, HBase, Apache Airflow, NiFi, TableauClient: Kensium, Chennai, India Feb 2014 - Jan 2015Big Data DeveloperResponsibilities: Spearheaded the development of Spark scripts using Python on Azure HDInsight, focusing on data aggregation, validation, and performance verification over MR jobs, enhancing data processing efficiency. Engineered robust pipelines to seamlessly transfer hashed and un-hashed data from Azure Blob to Data Lake, ensuring data integrity and accessibility. Leveraged Azure HDInsight for effective monitoring and management of the Hadoop Cluster, optimizing cluster performance and resource utilization. Collaborated closely with Data Scientists, Business Analysts, and Partners to derive actionable insights from data, facilitating informed decision-making processes. Conducted advanced text analytics and processing leveraging Spark's in-memory computing capabilities using Python, enabling sophisticated data analysis and interpretation. Developed pipelines to efficiently move data from on-premises servers to Azure Data Lake, streamlining data integration processes. Utilized Python Pandas for comprehensive data analysis, enabling insightful data-driven decision-making. Enhanced and optimized Spark scripts to efficiently aggregate, group, and execute data mining tasks, optimizing data processing workflows. Implemented schema extraction for Parquet and Avro file formats, ensuring data compatibility and seamless integration within Spark applications. Proficient in performance tuning of Spark Applications, ensuring optimal batch interval time, parallelism levels, and memory allocation for enhanced processing efficiency. Developed Hive queries to process data and generate data cubes for visualization, facilitating comprehensive data analysis and reporting. Designed and implemented specific functions to ingest columns into schemas for Spark Applications, optimizing data ingestion processes. Experienced in handling large datasets using partitions, Spark's in-memory capabilities, and effective join and transformation techniques during the ingestion process, ensuring efficient data processing. Developed data integration programs in a Hadoop and RDBMS environment, facilitating seamless data access and analysis from both traditional and non-traditional data sources. Analyzed SQL scripts and designed solutions to implement using PySpark, ensuring compatibility and efficiency in data processing workflows. Utilized reporting tools like Power BI for generating daily data reports, facilitating comprehensive data analysis, and reporting for stakeholders. Handled various techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, documentation, and production support, ensuring the successful execution of data projects.Environment: Microsoft Azure Cloud Services, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Oozie, Oracle, Pig, Shell, HBase, Flume, Zookeeper, Oracle, DB2

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: