Data Engineer Big Resume Denton, TX

Data Engineer Big Resume Denton, TX
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	Data Engineer Big
Target Location	US-TX-Denton
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Dallas, TX
Big data engineer Irving, TX
Data Engineer Big McKinney, TX
Data Engineer Big Frisco, TX
Big Data Engineer Irving, TX
Real Estate Data Engineer Dallas, TX
Data Engineer Senior Plano, TX
Click here or scroll down to respond to this candidate
Candidate's Name
Email: EMAIL AVAILABLE PH: PHONE NUMBER AVAILABLESr. Data EngineerPROFESSIONAL SUMMARY:IT professional with 9+ years of experience as a Big Data Engineer with expertise in designing data-intensive applications using Hadoop Ecosystem, Big Data Analytics, Cloud Data Engineering, Data Science, Data Warehouse/Data Mart, Data Visualization, Reporting, and Data Quality solutions.Experience with various Python and R packages like NumPy, Pandas, Sklearn, Matplotlib, Seaborn, Open CV, NLTK, Genism, Spacy PyTesseract, PyOCR, TensorFlow, PyTorch, Flask, dplyr, tidyr, lubridate, GGplot2, Beautiful Soup. Comfortable with SQL, SAS, SPSS, Tableau, MATLAB, and Relational databases. Deep understanding & exposure to Big Data Eco-System.Knowledge of various frameworks and technologies that enable capabilities within the Hadoop Ecosystem technologies such as MapReduce, Impala, HDFS, Hive, Pig, HBase, Storm, Flume, Sqoop, Oozie, Kafka, Spark, and Zookeeper.Extensive experience in Amazon web services (AWS) cloud services such as EC2, VPC, S3, IAM, EBS, RDS, ELB, VPC, Route 53, Ops Works, Dynamo DB, Autoscaling, CloudFront, CloudTrail, CloudWatch, CloudFormation, Elastic Beanstalk, AWS SNS, AWS SQS, AWS SES, AWS SWF & AWS Direct Connect.Experience in Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, DataBricks, and Azure SQL Data warehouse, controlling and granting database access, and Migrating On-premises databases to Azure Data Lake store using Azure Data factory.Architect & implement medium to large-scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, and XML. Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase, and SQL Server databases.Used Talend to Extract, Transform, and load data into the Data warehouse from various sources like Oracle and flat files. Used different AWS Data migration services and schema conversion tools along wif the Matillion ETL tool.Engaged in the Design and Development of the ETL process for integrating offers and benefits data from various sources into the data warehouse. Developed Spark Applications for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into customer usage patterns.Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in working on creating and running Docker images with multiple microservices.Strong experience and knowledge of real-time data analytics using Spark Streaming, Kafka, and Flume. Excellent knowledge of Machine Learning, Deep Learning, Mathematical Modelling, and Operations Research.Experience in troubleshooting and resolving architecture problems including database and storage, network, security, and applications. Experience in working on both agile and waterfall methods in a fast-paced manner.Proficient in Data Modeling Techniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, Physical, and Logical data modeling for Data Warehouse and Data Mart. Establish database standards for operations, upgrades, migrations, and onboarding new applications and/or customers.Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc. Developed and maintained multiple Power BI & Tableau dashboards/reports and content packs.Ability to blend technical expertise with strong Conceptual, Business, and Analytical skills to provide quality solutions and result-oriented problem-solving techniques and leadership skills. Capable to adapt and learn new tools, techniques, and approaches.SKILL SET:Big Data Ecosystem Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Airflow, Kafka, Camel, Spark, Flink, Presto, Luigi, and ZookeeperHadoop Distributions Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP, Amazon EMR(EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, DynamoDB, Aurora DB, Redshift, ECS, Quicksight) Azure HDInsight(DataBricks, DataLake, Blob Storage, Data Factory ADF, SQL DB, SQLDWH, Cosmosdb, Azure AD)Programming Languages Python, R, Scala, SAS, Java, SQL, HiveQL, PL/SQL, UNIX shell Scripting, Pig LatinDatabases Snowflake, MySQL, Teradata, Oracle, MS SQL SERVER, MDM, PostgreSQL, DB2NoSQL Databases HBase, Cassandra, Mongo DB, DynamoDB and CosmosdbDevOps Tools Jenkins, Docker, Kubernetes, Maven, Terraform, Azure DevOpsCloud AWS EC2, VPC, EBS, SNS, RDS, EBS, S3, Autoscaling, Lambda, Redshift, Cloud Watch, Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, Azure Cosmos DB, Data bricksVersion Control Git, SVN, BitbucketETL/BI Informatica, SSIS, SSRS, SSAS, Tableau, Power BI, QlikView, dbt, Airbyte, Talend, Matillion, Arcadia, and Adobe Analytics.Operating System Mac OS, Windows 7/8/10, Unix, Linux, UbuntuMethodologies RAD, JAD, UML, System Development Life Cycle (SDLC), Jira, Autosys, Apache pulsar, Confluence, Agile, Waterfall ModelPROFESSIONAL EXPERIENCE:Sr. Data EngineerLowes, Mooresville, NC May 2023 to PresentKey Responsibilities:Responsible for sessions with Business, Project manager, Business Analyst, and other key people to understand the business needs and propose a solution from a warehouse standpoint. Installing, configuring, and maintaining Data Pipelines.Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large-scale data handling Millions of records every day. Implementing and Managing ETL solutions and automating operational processes.Migrated the data from Amazon Redshift data warehouse to Snowflake. Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on snowflake warehouses.Transforming business problems into Big Data solutions and defining Big Data strategy and Roadmap. The business requirement collection approach is designed based on the project scope and SDLC methodology.Developed Apache presto and Apache drill setups in AWS EMR (Elastic Map Reduce) cluster, to combine multiple databases like MySQL and Hive. Disenables comparison results like joins and inserts on various data sources controlled through a single platform.The AWS Lambda functions were written in Scala cross-functional dependencies that generated custom libraries fordelivering the Lambda function in the cloud. Performed raw data ingestion into S3 from kinesis firehouse, which triggered a lambda function, put refined data into another S3 bucket, and wrote to SQS queue as aurora DB topics.Writing to the Glue metadata catalog allows us to query the improved data from Athena, resulting in a serverless querying environment. Created Pyspark frame to bring data from DB2 to Amazon S3. Worked on Kafka Backup Index,Log4j appended minimized logs and pointed Ambari server logs to NAS.Created AWS RDS (Relational database services) to work as Hive meta store and could combine 20 EMR cluster's metadata into a single RDS, which avoids data loss even by terminating the EMR.Used AWS Code Commit Repository to store their programming logic and script and have them to their new clusters again. Spin up the EMR clusters from 30 to 50 nodes which are memory-optimized such as R2, R4, X1, and X1e instances wif autoscaling feature. Hive Being the primary query engine of EMR, we created external table schemas for the data being processed.Mounted Local directory file path to Amazon S3 using S3fs fuse to enable KMS encryption on the data reflecting in S3 buckets. Designed and implemented ETL pipelines on S3 parquet files on data lake using AWS Glue.Used AWS glue catalog wif crawler to get the data from S3 and perform SQL query operations and JSON schema to define table and column mapping from S3 data to Redshift. Developed automation scripting in python using Ansible to deploy and manage Java applications across Linux servers.Applied Auto-scaling techniques to scale in and scale out the instances wif given memory out of time. This helped reduce the number of instances when the cluster is not actively in use. This is applied by even considering Hive's replication factor as 2 leaving a minimum of 5 instances running.Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting, and scheduling tools. Data profiling and data wrangling of XML, Web feeds, and file handling using python, Unix, and SQL.Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2. Used Sqoop to channel data from different sources of HDFS and RDBMS.Created and configured landing tables, staging tables, Foreign Key Relationships, Queries, Query Groups, etc. in MDM (Master Data Management). Used Spark Streaming to receive real-time data from Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra.Used SSIS to build automated multi-dimensional cubes. Prepared and uploaded SSRS reports. Manages database and SSRS permissions. Used .NET framework to access and manipulate data stored in relational databases and worked on .NET framework tools for debugging and testing data engineering applications.Used Apache NiFi to copy data from the local file system to HDP. Thorough understanding of various modules of AML including Watch List Filtering, Suspicious Activity Monitoring, CTR, CDD, and EDD. Used SQL Server Management Tool to check the data in the database as compared to the requirement given.Worked on Dimensional and Relational Data Modelling using Star and Snowflake Schemas, OLTP/OLAP system, and Conceptual, Logical, and Physical data modeling using Erwin.Environment: Redshift, AWS, Cloudera Manager (CDH5), Hadoop, Pyspark, HDFS, NiFi, Pig, Hive, S3, Kafka, Scrum, Git, Sqoop, Oozie, Pyspark, Informatica, Tableau, OLTP, OLAP, HBase, Cassandra, Informatica, SQL Server, Python, Shell Scripting, XML, Unix.Data EngineerThomson Reuters Eagan, MN August 2021 to May 2023Key Responsibilities:Meetings with business/user groups to understand the business process, gather requirements, analyze, design, development, and implement according to client requirements.Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of AzureData Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services-(Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.Designed and Developed event-driven architectures using blob triggers and Data Factory. Creating pipelines, data flows, and complex data transformations and manipulations using ADF and PySpark with Databricks.Automated jobs using different triggers like Events, Schedules, and Tumbling in ADF. Created, and provisioned different Databricks clusters, notebooks, jobs, and autoscaling.Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data Factory V2. Created several Databricks Spark jobs with Pyspark to perform several tables-to-table operations.Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.Queried and analyzed data from Cassandra for quick searching, sorting, and grouping through CQL. Joined various tables in Cassandra using spark and ran analytics on top of them. Involved in the implementation and integration of the Cassandra database.Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Performed data flow transformation using the data flow activity. Implemented Azure, self-hosted integration runtime in ADF. Developed streaming pipelines using Apache Spark with Python.Improved performance by optimizing computing time to process the streaming data and saved cost to the company by optimizing the cluster run time. Implemented Terraform modules for the deployment of various applications across multiple cloud providers and managing structuresPerformed ongoing monitoring, automation, and refinement of data engineering solutions. Extensively used SQL Server Import and Export Data tool. Working with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.Generating alerts on the daily metrics of the events to the product people. Suggest fixes to complex issues by thoroughly analyzing the root cause and impact of the defect. Extensively used SQL Queries to verify and validate the Database Updates.Setting up Azure infrastructure like storage accounts, integration runtime, service principal id, and app registrations to enable scalable and optimized utilization of business user analytical requirements in Azure.Created builds and release pipelines in VSTS and did deployments using SPN (secure endpoint connection) for implementing CI/CD.Environment: Azure Data Factory (ADF v2), Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL Server, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, ADLS Gen 2, Azure Cosmos DB, Azure Event Hub, Azure Machine Learning.Data EngineerHomesite insurance Boston, MA September 2018 to July 2021Key Responsibilities:Working with Offshore and onsite teams for sync-up. Developed multi-cloud strategies in better using GCP. Involved in loading and transforming large sets of structured and semi-structured datasets and analyzed them by running Hive queries.Design and develop spark job with Scala to implement end-to-end data pipeline for batch processing. Processing data with Scala, spark, spark SQL, and load in hive partition tables in parquet file format.Implement solutions to run effectively in the cloud and improve the performance of big data processing and the high volume of data being handled by the system.Worked with business process managers and be a subject matter expert for transforming vast amounts of data and creating business intelligence reports using big data technologies (Hive, spark, Sqoop, and NiFi for ingestion of bigdata, python/bash scripting/Apache Airflow for scheduling jobs in GCP/Google cloud-based environments).Proficient in Data Modeling Techniques using Star Schema, Snowflake Schema, Fact and Dimension tables, RDBMS, and Physical and Logical data modeling for Data Warehouse and Data Mart.As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history datafrom Teradata SQL to snowflake.Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data to load pipeline using Snow Pipe and Matillion from the data lake.Involved in the design of the Data-warehouse using star schema methodology and converted data from various sources to SQL tables.Migrated an Oracle SQL ETL to run on the google cloud platform using cloud data proc & big query, cloud pub/sub for triggering the Airflow jobs.Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Develop near real-time data pipeline using flume, Kafka, and spark stream to ingest client data from their weblog server and apply the transformation.Performs data analysis and design, and creates to maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR.Written shell script to trigger data storage jobs and assisted service developers in finding relevant content in the existing reference models.Used Apache Airflow in GCP composer environment to build data pipelines and used various airflow operators like Bash operator, Hadoop operators, python callable, and branch operators.Moved data between big query and Azure data warehouse using ADF. Built reports for monitoring data loads into GCP and driving reliability at the sight level.Environment: GCP, Airflow, Snowflake, Hadoop Ecosystem (HDFS, Yarn, Pig, Hive, Sqoop, Flume, Oozie, Kafka, Hive SQL, HBase, Impala, spark, Scala), python, big query, Cassandra, Oracle SQL, Oracle Enterprise Linux, Shell Scripting.Python DeveloperCeequence Technologies Hyderabad, India December 2016 to June 2018Key Responsibilities:Involved in understanding the requirements of the End users/Business Analysts and developed strategies for the ETL process.Designed and developed ETL packages using SQL Server Integration Services (SSIS) to load the data from different source systems (SQL server, Oracle, CSV files, flat pipes, and XML files). The data loaded into the SQL server was used to generate reports and charts using the SQL server Reporting services (SSRS).Created user variables, property expressions, and script tasks in SSIS. Implementing various SSIS packages having different tasks, transformations, and scheduled SSIS packagesDeveloped/designed Informatica mappings by translating the business requirements. Created and configured workflows, worklets, and sessions using Informatica workflow manager.Used Informatica power center designer to analyze the source data to extract and transform from various source systems by incorporating business rules using different objects and functions.Extensive SQL querying for Data Analysis and wrote, performance-tuned SQL queries for Data Analysis and profiling. Extracted business rules and implemented business logic to extract and load the server using T-SQL.Used Informatica power center to create mappings and mapplets to transform data according to the business rules. Worked as an Informatica administrator to migrate the mappings, sessions, workflows, and repositories into the new environment. Documented Informatica mappings in an Excel spreadsheet.Used Agile Methodology for repeated resting. Worked with Tidal scheduling tool for job scheduling.Environment: Informatica power center 9.5/9.1, Teradata 13, Molad, Fast load, TPT, Fast Export, Tpump, Multi load, BTEQ, Unix, Query man, Teradata Manager, Java, MuleSoft, SQL Assistant, Oracle 11 g, SQL server.Business Intelligence AnalystIBing Software Solutions Private Limited Hyd India September 2014 to November 2016Key Responsibilities:Created and analyzed business requirements to compose functional and implementable technical data solutions. Create and maintain Teradata Databases, Users, Tables, Views, Macros, Triggers, and stored procedures.Performed data analysis and data profiling using complex SQL on various source systems including Oracle and Teradata. Created data dictionary, Data mapping for ETL and application support, DFD, ERD, mapping documents, metadata, DDL, and DML as required.Developed Mappings using Source Qualifier, Expression, Filter, Lookup, Update Strategy, Sorter, Joiner, Normalizer, and Router transformationsUsed Excel to set up pivot tables to create various reports using data from an SQL query. Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn, and NLTK.Cleansing, mapping, and transforming data, creating the job stream, add and deleting the components to the job stream on the data manager based on the requirement.Reported and created dashboards for Global Services & Technical Services using SSRS, Oracle BI, and Excel. Deployed Excel VLOOKUP, PivotTable, and Access Query functionalities to research data issues.Environment: Python, Pyspark, Power Exchange, IBM Rational Data Architect, MS SQL Server, Teradata, Teradata SQL Assistant, Shell scripting, Tableau, PowerBI, MS Excel, Vlookup, PivotTable, Access Query, Oracle BI, PL/SQL, IBM Control Center, TOAD, Microsoft Project Plan, Repository Manager, Workflow Manager, MySQL, Cassandra, Pig, Hadoop.
Respond to this candidate
Your Message
Please type the code shown in the image: