Data Engineer Resume Mckinney, TX

Data Engineer Resume Mckinney, TX
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-TX-McKinney
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Denton, TX
Real Estate Data Engineer Dallas, TX
Data Engineer Senior Plano, TX
Data engineer Plano, TX
Data Engineer Senior Denton, TX
Data Engineer Dallas, TX
Android Engineer Data Science Irving, TX
Click here or scroll down to respond to this candidate
Candidate's Name
PHONE NUMBER AVAILABLEEMAIL AVAILABLEProfessional Summary:With 5+ years of technical experience as a Data Engineer with Business Intelligence Reporting, I am expertized in finding business needs of clients, developing effective and efficient solutions, and ensuring client deliverables within committed timelines.Experience in building ETL Data Pipelines, designing Data Modeling in data warehouses, processing, and transforming large amounts of batch and real-time streaming data using Apache Spark, Spark Structured Streaming, Kafka, Data Visualization, Reporting, Data Quality, and Data virtualization.Have a proven track record of working as a Data Engineer on AWS and Azure cloud services, Big Data/Hadoop Applications, and product development.Experience in data ingestion, data storing techniques from source to target data warehouse schema, utilized data cleansing techniques to ensure quality data being stored for business intelligence reporting and data analysis.Experience in PySpark programming with applied knowledge of Spark Architecture, RDDs, DataFrames, SparkSQL, and its In-memory Processing.Strong experience in building complex ETL Pipelines using AWS Glue, Talend, and Informatica.Hands-on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map-Reduce, HDFS, HBase, Hortonworks, and Flume.Hands-on experience with tools like Pig & Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling, and Zookeeper for coordinating cluster resources.Expertise in AWS Resources like EC2, S3, EBS, VPC, ELB, SNS, RDS, IAM, AWS Glue, Route 53, Auto scaling, Cloud Formation, Cloud Watch, Security Groups.Skillful in Data Analysis using SQL on Oracle, MS SQL Server, PostgreSQL, DB2, and Teradata.Strong experience with SQL Server and T-SQL in constructing joins, user-defined functions, stored procedures, views, indexes, user profiles, and data integrity.Experience in working in a Hadoop eco-system integrated into the Cloud platform provided by AWS with several services like AWS EMR, and EC2 instances.Good experience working with Azure Cloud Platform services like Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure SQL Analytics, and HDInsight/Databricks.Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elastic search for near real-time log analysis of monitoring End-to-end transactions.Designed Star, and Snowflake Schemas in data warehousing and implemented various data quality checks, validation rules, and data governance practices.Extensively used SQL, Python, NumPy, Pandas, Scikit-learn, Spark, and Hive for Data Analysis and Model building.Expose to various software development methodologies like Agile and Waterfall.TECHNICAL SKILLS:LanguagesPython, SQL, PL/SQL, PySpark, R/R Studio, Scala, Shell Scripting.DatabasesOracle, PostgreSQL, MySQL, SQL-Server, TeradataNoSQL DatabasesCassandra, MongoDB, MariaDB, HBaseBig Data EcosystemHadoop, HDFS, MapReduce, pig, Sqoop, Spark, Impala, Cloudera, and Hortonworks HDP, Spark SQL, NIFI, Kafka, Spark-Streaming, Flink, Ambari.Data Orchestration ToolsApache Airflow, Dragster, MageData Ingestion ToolsAirbyte, Fivetran, Pentaho Data IntegrationETL ToolsAws Glue, Informatica power center, Azure Data Factory, TalendData Monitoring ToolsDatadog, Grafana, ELK stackData WarehousingSnowflake, Redshift, Microsoft Azure synapse analytics, HiveAnalytics ToolsAlteryx, Microsoft SSIS, SSAS and SSRSBI ToolsMicrosoft Power BI, Tableau, QlikView, Informatica 6.1.IDE Dev. ToolsVisual Studio, PyCharm, Jupyter NotebookCloud Platforms (AWS, Aure)Microsoft Azure  Azure Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, Azure Synapse Analytics, Cosmos DB, Active DirectoryAmazon AWS - EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, KinesisCertification:Scrum MasterAWS Certified Cloud PractitionerEducation:Bachelor of Science in Information Tech & SystemUniversity of Texas at Dallas  Aug 2018Professional Experience:Aspiring IT LLC January 2023, Plano, TX  PresentRole: Data Engineer/Data AnalystResponsibilities:Worked with users to identify the most appropriate source of record required to define the asset data for financing.Enhanced processing of large datasets of over 10 TB using Hadoop, Apache Spark, PySpark, and Kafka achieving a 40% increase in efficiency and a 30% reduction in server load.Implemented and supported complex Airflow DAGs, scheduling and automating ETL processes with Python in AWS Managed Workflows for Apache Airflow (MWAA) ecosystem, significantly reducing manual intervention and enhancing data pipeline efficiency.Streamlined data processing to AWS, performing extensive data analysis on complex datasets using AWS Redshift, and Optimized SQL queries to reduce processing time and costs.Designed and Developed ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.Utilized Spark's RDDs and Data Frames for high-performance parallel processing and used Spark Streaming to process and analyze real-time data feeds on the Hadoop platform, resulting in real-time analytics and insights.Ingested customer data from Data Lake, and Apache Hive into the Oracle database, ensuring smooth data integration for the company's analytical needs.Handled various data formats including YAML, JSON, Parquet, XML, and CSV across AWS platforms ensuring seamless data integration and analysis in security-related datasets.Specialized in data warehousing solutions, managing both cloud-based (AWS Redshift, Big Query, Azure SQL Data Warehouse) and on-premises (Oracle, PostgreSQL, MYSQL, MongoDB) enterprise data warehouses and data marts, incorporating star and snowflake schema modeling and implementing Type 1 and Type 2 SCDs for robust historical data.Implemented and optimized data processing pipelines leveraging Flink's capabilities for event time processing, state management, and fault tolerance.Designed and implemented stateful streaming applications with Flink's APIs, including DataStream API and DataSet API, to handle complex event-driven workflows.Utilized Flink's windowing and watermarking features to efficiently aggregate and analyze data streams based on time or other custom criteria.Excellent SQL programming skills were used to create Stored Procedures, Triggers, Functions, and performance tuning using SQL/PL SQL increasing overall system efficiency by 20% and improving query speeds by 30%.Good experience in Agile Methodologies, Scrum stories, and sprints experience in a Python-based environment, along with data analytics and Excel data extracts.Developed a reporting system using Python, HTML, and SQL to analyze key metrics, allowing stakeholders to make data-driven decisions, resulting in a 48% improvement in site performance and saving 18 hours of weekly work hours.Utilized Tableau for crafting interactive visualizations, providing actionable insights from live data for key stakeholders, By Developing over 20+ Tableau dashboards and 50+ reports, aiding decision-making across 10+ departments.Developed Bash/Shell and Python scripts in Unix/Linux environments, leveraging libraries like Pandas and NumPy for advanced data analysis and manipulation, enhancing the efficiency of data processing tasks.Experienced with data processing and workflow management tools such as Apache Kafka, Apache Airflow, Apache NiFi, Apache Sqoop Ab Initio, IBM DataStage, and Informatica for Real-time and Batch data processing.Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/Hive & ImpalaExperienced in cloud versioning technologies like GitHub.Contributed to on-time delivery of 4 complex projects using Agile methodologies, supporting the team to achieve a 100% on-time delivery rate.Reduced post-deployment bugs by 30% through improved documentation practices.Collaborated with cross-functional teams to optimize asset data sources, reducing data errors by 30% and significantly enhancing the accuracy of financial reporting.Environment: Python, SparkSQL, Hadoop, PySpark, Airflow, Kafka, Sqoop, AWS, Glue, Redshift, Oracle, MySQL, Linux, Tableau.SysPlus Technologies January 2022 - May 2022, Plano, TX Role: Data Engineer/Data AnalystResponsibilities:Led the design and implementation of a complete ETL pipeline using Azure services, ensuring data accuracy and availability for critical banking operations.Developed and maintained a centralized data warehouse in Azure Synapse Analytics, integrating data from various sources and enabling advanced analytics and reporting capabilities for the banking team.Worked on migrating two data-intensive applications from on-prem servers to Azure App Service environments while utilizing Azure SQL, Blob Store, and Databricks for data storage and processing.Proficient in Databricks for Lakehouse architecture implementation. Skilled in metadata optimization for enhanced data governance.Advanced Databricks proficiency for big data processing. Experienced in metadata management for reliable insights.Utilized the capabilities of Azure Functions to perform event-driven data transformations and data transfers from Oracle to Azure SQL DB.Utilized a range of tools (Sqoop, Spark-Streaming/SQL, Kafka, Flume) for data ingestion and processing, ensuring seamless integration of structured, semi-structured, and unstructured data.Experienced with NoSQL Databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Strom, Spark-Streaming/SQL, Kafka, and Flume.Designed and implemented scalable and secure data solutions using Azure Data Lake, Azure Synapse Analytics, and Azure Data Factory.Experienced extensively in creating and managing Hive tables and implementing partitions and buckets for efficient data loading and analysis.Proficient in designing database schemas in Hive, MySQL, and other systems, and in processing large datasets via importing and exporting between databases and HDFS.Developed Hive and Bash scripts for source data validation and transformation.Created ETL jobs both design and code to process data to target databases.Applied machine learning techniques on large datasets using Spark and MapReduce. Used Spark Streaming APIs with Kafka to build real-time data models and persisted data into Cassandra for near real-time analytics.Performed Extract, Transform, and Load data from source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Synapse Analytics.Environment: Azure, Azure synapse analytics, Hive, Kafka, Tableau, Scala, Sqoop, Spark, HBase, Teradata, MS Excel, Python Pandas API, RDBMS, HiveQLCollabera June 2019 - August 2021, Basking Ridge, NJRole: Data Engineer/Data AnalystResponsibilities:Utilized Elasticsearch to design and implement a scalable log data storage and retrieval system, enabling efficient querying and analysis of log data for application performance testing purposes.Configured Logstash pipelines to ingest, parse, and enrich log data from trading applications and infrastructure components, ensuring seamless integration with Elasticsearch for centralized log management.Collaborated with cross-functional teams to define and prioritize requirements for new features and enhancements in trading applications, leveraging Elasticsearch for real-time data retrieval and visualization.Implemented log-based performance testing strategies using Elasticsearch and Logstash, enabling the identification of performance bottlenecks, latency issues, and resource constraints in trading systems.Automated log data ingestion and processing workflows using Logstash input plugins and scheduled Logstash pipelines, reducing manual intervention and improving operational efficiency.Developed custom monitoring dashboards and alerts to track key metrics, detect anomalies, and proactively identify issues in data ingestion, processing, and storage.Experienced in designing and implementing Extract, Transform, and Load (ETL) processes using ADF.Implemented and configured data monitoring tools such as Grafana to monitor the health, performance, and availability of data pipelines and systems.Enhanced efficiency of data movement and transformation by building and optimizing data pipelines in Azure Data Factory, conducting performance tuning to identify bottlenecks, and implementing enhancements for improved throughput and reduced latency.Established CI/CD pipelines for Databricks notebooks and jobs, enabling automated testing, deployment, and rollback of changes.Environment: Python3, Azure Data Factory, Azure Databricks, Azure Data Lake, Blob Storage, Elastic Search, Logstash, Grafana, Kibana
Respond to this candidate
Your Message
Please type the code shown in the image: