Data Engineer Big Resume Dallas, TX

Data Engineer Big Resume Dallas, TX
Resumes | Register
Candidate Information
Title	Data Engineer Big
Target Location	US-TX-Dallas
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Denton, TX
Big data engineer Irving, TX
Data Engineer Big McKinney, TX
Data Engineer Big Frisco, TX
Big Data Engineer Irving, TX
Real Estate Data Engineer Dallas, TX
Data Engineer Senior Plano, TX
Click here or scroll down to respond to this candidate
MouliEMAIL AVAILABLEPHONE NUMBER AVAILABLESenior Big Data EngineerSUMMARY OF EXPERIENCESenior Big Data Engineer having 10+ years of experience with strong background in end-to-end enterprise data warehousing and big data projects.Strong working experience with SDLC, Agile and Waterfall Methodologies.Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing.Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive User Defined Functions (UDF's) as required.Extensive experience in development of Bash scripting, T-SQL, and PL/SQL scripts.Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data sets.Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS).Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.Extensive experience in working with micro batching to ingest millions of files on Snowflake cloud when files arrives to staging area.Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce framework, Yarn, Scala and Hue.Extensively used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and also used Spark Data Frame Operations to perform required Validations in the data.Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data.Worked in developing Impala scripts for extraction, transformation, loading of data into data warehouse.Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL. Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.Skilled in using Kerberos, Azure AD, Sentry, and Ranger for maintaining authentication and authorization.Excellent knowledge in using Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.Capable of understanding and knowledge of job workflow scheduling and locking tools/services like Oozie, Zookeeper, Airflow and Apache NiFi.Experience in importing and exporting the data using Sqoop from HDFS to Relational Database Systems and from Relational Database Systems to HDFS.Worked on HBase to load and retrieve data for real time processing using Rest API.Strong knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.Proficient in relational databases like Oracle, MySQL and SQL Server. Extensive experience in working with NO SQL databases and its integration Dynamo DB, Cosmos DB, Mongo DB, Cassandra and HBaseKnowledge in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, STS.Experienced in designing different time driven and data driven automated workflows using Oozie.Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency which is important for decision making in the process.Hands on Experience in using Visualization tools like Tableau, Power BI.Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS and expertise in using spark-SQL with various data sources like JSON, Parquet and Hive.IT SKILLSBigData/Hadoop TechnologiesMap Reduce, Spark, SparkSQL, Spark Streaming, Kafka, PySpark,,Pig, Hive, HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari ServerLanguagesPython, R, SQL, Java, Scala, JavascriptNO SQL DatabasesCassandra, HBase, MongoDBWeb Design ToolsHTML, CSS, JavaScript, JSP, jQuery, XMLDevelopment ToolsMicrosoft SQL Studio, IntelliJ, Azure Data bricks, Eclipse, NetBeans.Public CloudAWS, AzureDevelopment MethodologiesAgile/Scrum, UML, Design Patterns, WaterfallBuild ToolsJenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UIReporting ToolsMS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.DatabasesMicrosoft SQL Server, MySQL, Oracle, DB2, Teradata, NetezzaOperating SystemsAll versions of Windows, UNIX, LINUX, Macintosh HD, Sun SolarisPROJECT EXPERIENCEClient: Apple, Sunnyvale, CARole: Senior Big Data Engineer June 2023 - PresentResponsibilities:Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Data bricks.Conducted Jenkins training sessions for team members, improving their proficiency in continuous integration and deployment practices.Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.Deployed the initial Azure components like Azure Virtual Networks, Azure Application Gateway, Azure Storage and Affinity groups.Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.Used Scala function, dictionary and data structure (array, list, map) for better code reusabilityUtilized Spark SQL API in PySpark to extract and load data and perform SQL queries.Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.Responsible to manage data coming from different sources through Kafka.Working in big data technologies like spark, Scala, Hive, Hadoop cluster (Cloudera platform).Making a data pipelining with help Data Fabric job SQOOP, SPARK, Scala and KAFKA. Parallel working in data side oracle and MYSQL server for data designing to source to target.I was in charge of PostgreSQL databases, making ensuring they were set up properly and optimized for maximum performance and dependability.Efficiently stored data by designing and implementing PostgreSQL data models that ensured adequate normalization.Secured PostgreSQL databases by implementing authentication, authorization, and encryption for users to prevent unauthorized access to sensitive data.Designed, deployed, and managed highly available and scalable Confluent Kafka clusters to support real-time data streaming for a large-scale enterprise application.Designed, build and managed ELT data pipeline, leveraging Airflow, python, DBT.Proficient in Cassandra Agile (Apache Cassandra), a highly scalable and distributed NoSQL database management system.Architect and optimize cloud-based data solutions, leveraging cloud computing technologies and platforms (e.g., Snowflake) to store and process large volumes of data.Collaborated with cross-functional teams to configure and manage Jenkins agents for distributed builds and scalability.Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.Write programs using Spark to move data from Storage input location to output location by running data loading, validation, and transformation to the data.Designed highly efficient data model for optimizing large-scale queries utilizing Hive complex datatypes and Parquet file format.Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from Cassandra to Azure SQL Data warehouse and improved the query performance.Involved in developing data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).Worked extensively on Azure data factory including data transformations, Integration Runtimes, Azure Key Vaults, Triggers and migrating data factory pipelines to higher environments using ARM Templates.Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.Environment: SPARK, Kafka, DBT (data build tool), Data Stage, DB2, Snowflake, Map Reduce, Python, Hadoop, Hive, Pig, Spark, PySpark, SparkSQL, Azure SQL DW, Data brick, Azure Synapse, Azure Data lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Oracle 12c, Cassandra, Git, Zookeeper, Oozie. confluent, kafkaClient: Verizon, Irving, TXRole: Big Data Engineer February 2020  May 2023ResponsibilitiesDesigned and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with PythonWorked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.Developed HIVE UDFs to incorporate external business logic into Hive script and Developed join data set scripts using HIVE join operations.Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.Implemented the machine learning algorithms using python to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 data lake.Allotted permissions, policies and roles to users and groups using AWS Identity and Access Management (IAM).Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing.Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.Migrated Map reduce jobs to Spark jobs to achieve better performance.Working on designing the MapReduce and Yarn flow and writing MapReduce scripts, performance tuning and debugging.Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka.Implemented Kafka Connect connectors to integrate Kafka with various external systems, enabling seamless data ingestion and delivery.Developed workflow in Oozie to automate the tasks of loading the data into Nifi and pre-processing with Pig.Worked on Apache NIFI to decompress and move JSON files from local to HDFS.Like Access, Excel, CSV, Oracle, flat files using connectors, tasks and transformations provided by AWS Data Pipeline.Worked with Json format files by using XML, Hierarchical Data stage stages.Extensively used Parallel Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.Mentor and guide analyst on building purposeful analytics tables in dbt for cleaner schemas.Developed a python script to transfer data from on-prem with AWS. Worked on the tuning of SQL Queries to bring down run time by working on Indexes and Execution Plan.Reduced analytical query response times and improved query speed by implementing query optimization techniques in Amazon Redshift.Created and executed comprehensive recovery and backup strategies for Amazon Redshift, protecting data and reducing the likelihood of loss.Strong understanding of AWS components such as EC2 and S3Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.Stored data in AWS S3 like HDFS and performed EMR programs on data stored.Used the AWS-CLI to suspend an AWS Lambda function. Used AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS.If we dont have data on our HDFS cluster, I will be scooping the data from netezza onto out HDFS cluster.Transferred the data using Informatica tool from AWS S3 to AWS Redshift.Worked on Hive UDFs and due to some security privileges I have to ended up the task in middle itself.Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWSWrote Flume configuration files for importing streaming log data into HBase with Flume.Experience in setting up the whole app stack, setup, and debug log stash to send Apache logs to AWS Elastic search.Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.Implemented AWS provides a variety of computing and networking services to meet the needs of applicationsWriting HiveQL as per the requirements and Processing data in Spark engine and store in Hive tables.Importing existing datasets from Oracle to Hadoop system using SQOOP.Brought data from various sources in to Hadoop and Cassandra using Kafka.Experienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.Model, lift and shift custom SQL and transpose LookML into dbt for materializing incremental views.Applied spark streaming for real time data transforming.Environment: Hadoop (HDFS, MapReduce), DBT, Data Stage, Scala, Spark, DB2, Snowflake, Impala, Hive, MangoDB, Pig, Devops, HBase, Oozie, Hue, Sqoop, Flume, Oracle, AWS Services (Lambda, EMR, Auto scaling), Mysql, Python, Scala, Spark, Hive, Spark-Sql.Client: State of MD, Bethesda, MDRole: Data Engineer April 2019  January 2020Responsibilities:Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.Experienced in writing live Real-time Processing using Spark Streaming with Kafka.Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database.Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.Developed Pig program for loading and filtering the streaming data into HDFS using Flume.Experienced in handling data from different datasets, join them and pre-process using Pig join operations.Developed HBase data model on top of HDFS data to perform real time analytics using Java API.Developed Streaming pipelines using Azure Event Hubs and Stream Analytics to analyze data for dealer efficiency and open table counts for data coming in from IOT enabled poker and other pit tables.Helped maintain and troubleshoot UNIX and Linux environmentDeveloped custom alerts using Azure Data Factory, SQLDB and Logic App.Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.Built pipelines to move hashed and un-hashed data from XML files to Data lake.Environment: Spark, Kafka, Hadoop, HDFS, Spark-SQL, Azure, Python, Map Reduce, Pig, Hive, Oracle 11g, My SQL, MongoDB, Hbase, Oozie, Zookeeper, Tableau.Client: Xerox, Norwalk, CTRole: Hadoop Developer November 2016  March 2019Responsibilities:Used Hive to implement data warehouse and stored data into HDFS. Stored data into Hadoop clusters which are set up in AWS EMR.Performed Data Preparation by using Pig Latin to get the right data format needed.Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.Developed spark code and spark-SQL/streaming for faster testing and processing of data.Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and HiveInvolved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.Created Hive schemas using performance techniques like partitioning and bucketing.Used Hadoop YARN to perform analytics on data in Hive.Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server and MySQL.Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.Developed and maintained batch data flow using HiveQL and Unix scriptingInvolved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.Worked extensively with Sqoop for importing metadata from Oracle.Involved in creating Hive tables and loading and analyzing data using hive queries.Developed Hive queries to process the data and generate the data cubes for visualizing.Environment: Hadoop, MapReduce, HBase, JSON, Spark, Kafka, Hive, Pig, Hadoop YARN, Spark Core, Spark SQL, Scala, Python, Java, Hive, Sqoop, Impala, Oracle, Yarn, Linux, Oozie.Client: HSBC Bank, Hyderabad, IndiaRole: Junior Hadoop Developer August 2013  October 2016Responsibilities:Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.Installed Oozie workflow engine to run multiple Hive and Pig Jobs.Developed Simple to complex Map/reduce Jobs using Hive and PigDeveloped Map Reduce Programs for data analysis and data cleaning.Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.Implemented Apache PIG scripts to load data from and to store data into Hive.Environment: Hive, Hadoop, Cassandra, Pig, Sqoop, Ooze, Hive, Python, MS Office.
Respond to this candidate
Your Message
Please type the code shown in the image: