Data Engineer Resume Cross roads, TX

Data Engineer Resume Cross roads, TX
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	data engineer
Target Location	US-TX-Cross Roads
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Swetha KEMAIL AVAILABLEPHONE NUMBER AVAILABLEData Engineer AZURE Big Data AWSPROFESSIONAL SUMMARY:Having around 10 years of total IT experience with over 8 years experience in AWS Data Engineer, Big Data Hadoop experience in Development and Design of Java based enterprise applications.Extensive working experience on Hadoop Eco-system components like Hadoop, HDFS, MapReduce, Hive, Sqoop, Flume, Spark, Kafka, Oozie and Zookeeper.Implemented performance tuning techniques for Spark-SQL queries.Strong knowledge on Hadoop HDFS architecture, Map-Reduce (MRv1) and YARN(MRv2) framework.Strong hands-on Experience in publishing the messages to various Kafka topics using Apache NIFI and consuming the message to HBase using Spark and Python.Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.Design and develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL, Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.Worked on creating Spark jobs that process the true source files and successful in performing various transformations on the source data using Spark Data frame, Spark SQL API's.Developed Sqoop scripts to migrate data from Teradata, Oracle to Bigdata Environment.Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema and Snowflake Schema.Work on large scale data transfer across different Hadoop clusters, implement new technology stacks on Hadoop clusters using Apache Spark.Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).Selecting appropriate low cost driven AWS/Azure services to design and deploy an application based on given requirements.Experience in managing and reviewing Hadoop log files.Experienced in processing big data on the Apache Hadoop framework using MapReduce programs.Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.Hands on experience in analyzing log files for Hadoop and eco-system services and finding root cause.Hands on experience on handling different file formats like AVRO, PARQUET, Sequential files, MAP Files, CSV, xml, log ORC and RC.Experience with NoSQL Database HBase, Cassandra, MongoDB.Experience with AIX/Linux RHEL, Unix Shell Scripting and SQL Server 2008.Worked on data search tool Elastic Search and data collection tool Logstash.Strong knowledge in Hadoop cluster installation, capacity planning and performance tuning, benchmarking, disaster recovery plan and application deployment in production cluster.Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.Exposed into methodologies Scrum, Agile and Waterfall.TECHNICAL SKILLS:Programming LanguagesJava, Python, SQL, and C/C++Big Data EcosystemHadoop, MapReduce, Kafka, Spark, Pig, Hive, YARN, Flume, Sqoop, Oozie, Zookeeper, Talend.Hadoop DistributionsCloudera Enterprise, Horton Works, EMC Pivotal.DatabasesOracle, SQL Server, PostgreSQL.Web TechnologiesHTML, XML, JQuery, Ajax, CSS, JavaScript, JSON.Streaming ToolsKafkaTestingHadoop Testing, Hive Testing, MRUnit.Operating SystemsLinux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.CloudAWS EMR, AZURE, RDS, CloudWatch, S3, Redshift Cluster, Kinesis, DynamoDB.Technologies and ToolsServlets, JSP, Spring (Boot, MVC, Batch, Security), Web Services, Hibernate, Maven, GitHub, Bamboo.Application ServersTomcat, JBoss.ETL ToolsInformatica Power Center, Power Exchange 9.5/9.1/8.6/8.1/7., IDQ 9.6.1/9.0.1/8.6, MDM, IBM DataStage 8.0, IBM Data Stage 11.7,Talend 8/7.4/7.1., SSIS, SSRSIDE sEclipse, Net Beans, IntelliJ.PROFESSIONAL EXPERIENCE:American Express, AZ(Remote) Jan 2021 to Till dateRole: Sr Data EngineerRoles & Responsibilities:Used Agile methodology in developing the application, which included iterative application development, weekly Sprints, stand up meetings and customer reporting backlogs.Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.Understanding of Azure Product and Service suite primarily Databricks, Blob Storage, VPN, Azure Functions, Synapse, Azure SQL DBAnalyze and develop programs by considering the extract logic and the data load type using Hadoop ingest processes using relevant tools such as Sqoop, Spark, Scala, Kafka, Unix shell scripts and others.Design the incremental, historical extract logic to load the data from flat files into Massive Event Logging Database (MELD) from various servers.Determining the viability of a business problem for a Big Data solution with Pyspark.Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.Used ORC and Parquet file formats in Hive.Development of efficient pig and hive scripts with joins on datasets using various techniques.Write documentation of program development, subsequent revisions and coded instructions in the project related GitHub repository.Great hands on experience with Pyspark for using Spark liberties by using python scripting for data analysis.Providing the staging solutions for data validation and cleansing with PL/SQL and Datastage ETL jobsWorked with continuous integration/continuous delivery using tools such as Jenkins, Git, Ant, and Maven, created workflows in Jenkins and Worked on the CI-CD model setup Using Jenkins.Prepare release notes, validation document for user stories to be deployed to production as part of release.Writing technical design document based on the data mapping functional details of the tables.Extracting batch and Real time data from DB2, Oracle, Sql server, Teradata, Netezza to Hadoop (HDFS) using Teradata TPT, Sqoop, Apache Kafka, Apache Storm.Developing Apache Spark jobs for data cleansing and pre-processing.Writing spark programs to improve the performance and optimization of the existing algorithms in Hadoop using spark context, spark-sql, data frame, pair RDD's, spark yarn.Design and build ETL workflows, leading the efforts of programming data extraction from various sources into Hadoop file system, implement end to end ETL workflows using Teradata, SQL, TPT, SQOOP and load to HIVE data stores.Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.Using Scala language to write programs for faster testing and processing of data.Writing code and creating hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.Environment: RHEL, HDFS, Map-Reduce, Hive, Azure, Databricks, Blob Storage, Azure Functions, Synapse, Pig, Sqoop, Oozie, Teradata, Oracle SQL, UC4, Kafka, GitHub, Hortonworks data platform distribution, Spark, Scala.USAA, San Antonio, TX Oct 2019 - Jul 2021Role: AWS Data EngineerRoles & Responsibilities:Implemented a generic ETL framework with high availability for bringing related data for Hadoop & Cassandra from various sources using sparkUsing Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis (Redwood)Involved in development using Java,python,KafkaWorked in Agile environment, and used rally tool to maintain the user stories and tasks.Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in PythonWorked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.Involved in creating the notebooks for moving data from raw to stage and then to curated zones using Azure data bricks.Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.Building the pipelines to copy the data from source to destination in Azure Data Factory. Implemented various Data Modeling techniques for Cassandra.Implemented Apache Drill on Hadoop to join data from SQL and NoSQL databases and store it in Hadoop.Created architecture stack blueprint for data access with NoSQL Database Cassandra; Brought data from various sources in to Hadoop and Cassandra using Kafka.Used SQL Azure extensively for database needs in various applications.Created multiple dashboards in tableau for multiple business needs.Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF s for Pig Latin.Implemented Apache Sentry to restrict the access on the hive tables on a group level.Employed AVRO format for the entire data ingestion for faster operation and less space utilization.Experienced in managing and reviewing Hadoop log files.Implemented test scripts to support test-driven development and continuous integration.Used Spark for Parallel data processing and better performances.Experience in building data pipelines using Azure Data factory, Azure Databricks and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data warehouse and controlling and granting database access.Migrated complex MapReduce programs into in memory Spark processing usingTransformations and actions.Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala, PySpark, Redshift and feature selection and created nonparametric models in Spark Used Avro serializer and Avro De serializer for developing the Kafka clients.Developed Kafka Consumer job to consume data from Kafka topic and perform validations on the data before pushing data into Hive and Cassandra databases.Applied spark streaming for real time data transforming.Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.Created and implemented various shell scripts for automating the jobs.Experienced in using Platfora a data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.Create and deploy Azure command line scripts to automate tasks.Worked with Spark using Scala and Spark SQL for faster testing and processing of data.Applied Spark advanced procedures like text analytics and processing using the inmemory processing.Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.Environment: MapR, MapReduce, spark, Scala, Solr, Java, Azure SQL, Azure Databricks, Azure Data Lake, HDFS, Hive, pig, Impala, Cassandra, Python, Kafka, Tableau, Teradata, CentOS, Pentaho, PIG, Zookeeper, Sqoop.Walgreens, Chicago, IL Aug 2018 - Sep 2019Role: Data EngineerRoles & Responsibilities:Extract Transform and Load data from sources Systems to AzureDataStorage services using a combination of Azure Data factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.Used Azure API Manager to maintain on premises API services with policies.Involved in Build &Azure deployment of Function apps from Visual Studio.Hands on experience in Azure - PaaS worked on various areas of Azure like Azure Active Directory, App Services, Azure SQL, Azure Storages like CDN, BLOB.Worked extensively on Azure Active directory and on-premise fActive directory.Worked with continuous integration/continuous delivery using tools such as Jenkins, Git, Ant, and Maven, created workflows in Jenkins and Worked on the CI-CD model setup Using Jenkins.Responsible for performing various transformations like sort, join, aggregations, filter in-order to retrieve various datasets using Apache spark.Used Python for SQL/CRUD operations in DB, file extraction/transformation/generation.Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to HiveORC tablesx.Developed compmplex ETL mappings for stage, Dimensions, Facts and Data marts loadTransform and analyze the data using PySpark, HIVE, based on ETL mappings.Developed PySpark programs and created the data frames and worked on transformations.Analysed the sql scripts and designed it by using PySparkSQL for faster performance.Provide guidance to development team working on PySpark as ETL platform.Design and Implemented ETL for data load from heterogeneous Sources to SQL Server and Oracle as target databases and for Fact and Slowly Changing Dimensions SCD-Type1 and SCD-Type2.Big data Hadoop and Cassandra prod support and architecture with BigdataHadoop and Cassandra prod support and architecture.Experienced in Developing Spark application using Spark Core, Spark SQL and SparkStreamingAPI's.Environment: Apache Hadoop, HDFS, Spark, Kafka, Solr, Hive, DataStax Cassandra, Map Reduce, Pig, Java, Flume, Cloudera CDH4, Oozie, Oracle, MySQL, AWSTivo Inc, Boston, MA Nov 2016 - Jul 2018Role: Data engineerRoles & Responsibilities:Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.Created Data Quality Scripts using SQL and Hive to validate successful das ta load and quality of the data. Created various types of data visualizations using Python and Tableau.Wrote various data normalization jobs for new data ingested into Redshift.Created various complex SSIS/ETL packages to Extract, Transform and Load dataUsed Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.Migrated on premise database structure to Confidential Synapse data warehouseWas responsible for ETL and data validation using SQL Server Integration Services.Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.Environment: Informatica, RDS, NOSQL, Snow Flake Schema, Apache Kafka, Python, Zookeeper, SQL Server, Erwin, Oracle, Redshift, MySQL, PostgreSQL.Pike Solutions, Hyderabad, INDIA. April 2013-Oct 2016Data EngineerRoles & Responsibilities:Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.Extensively used Spark stack to develop preprocessing job which includes RDD, Datasets and Data frames Api's to transform the data for upstream consumption.Developed Realtime data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.Worked on extracting and enriching HBase data between multiple tables using joins in Spark.Worked on writing APIs to load the processed data to HBase tables.Replaced the existing MapReduce programs into Spark application using Scala.Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.Developed the Hive UDF's to handle data quality and create filtered datasets for further processingExperienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.Good knowledge on Kafka streams API for data transformation.Developed oozie workflow for scheduling & orchestrating the ETL process.Used Talend tool to create workflows for processing data from multiple source systems.Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobsOptimized Hive QL/ pig scripts by using execution engine like Tez, Spark.Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.Deployed applications using Jenkins framework integrating Git- version control with it.Participated in production support on a regular basis to support the Analytics platformUsed Rally for task/bug tracking.Used GIT for version control.Environment: RHEL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Oozie, Mahout, HBase, Hortonworks data platform distribution, Cassandra
Respond to this candidate
Your Message
Please type the code shown in the image: