Senior Big Data Engineer Resume Columbia...

Senior Big Data Engineer Resume Columbia...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Senior Big data Engineer
Target Location	US-MD-Columbia
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Senior Big Data Engineer Oxon Hill, MD

Senior Data Engineer CLARKSBURG, MD

Data Engineer Big Herndon, VA

Big Data Engineer Chantilly, VA

Data Engineer Big Annandale, VA

Data Engineer Big Herndon, VA

Data Scientist Senior Burke, VA

Click here or scroll down to respond to this candidate

Candidate's Name
Senior Data Engineer Email: EMAIL AVAILABLE Ph: PHONE NUMBER AVAILABLE https://LINKEDIN LINK AVAILABLE
PROFESSIONAL SUMMARY Having 9+ years of IT industry experienced Data Engineer with hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Scala Cassandra, Kafka and Spark. Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD). Extensive Experience on importing and exporting data using stream processing platforms like Flume. Experienced with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud. Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement. Played key role in Migrating Teradata objects into Snowflake environment. Excellent Knowledge on Cloud data warehouse systems AWS Redshift, S3 Buckets and Snowflake. Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop. Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows. Excellent knowledge on Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm. Experience tuning spark jobs for efficiency in terms of storage and processing. Experienced in integrating Hadoop with Kafka, experienced in uploading Clickstream data from to HDFS. Experience in developing applications on Spark using Scala as a functional and Object-Oriented Programming. Designed developed and implemented ETL pipelines using python API(PySpark) of Aapche Spark on AWS EMR Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation. Worked with AWS Glue,AWS Data Catalog .AWS RedShift, and AWS RedShift Spectrum for developing and orchestrating ETL/ETL applications using PySpark in AWS Glue. Experience in developing the big data applications and services using in Amazon Web Services (AWS) platform using EMR, S3, EC2, Lambda, CloudWatch and cloud computing using AWS RedShift Works with various services from Amazon Web Services (AWS) and Microsoft Azure to build, host and maintain data extraction, transformation and loading functions to data warehouse. Experience in analysing data using HQL, Pig Latin and custom MapReduce programs in Python.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and process of data. Hands on experience in developing ETL jobs in Hadoop eco-system using Oozie& Stream sets. Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data Lake Store, Azure Data Bricks, Azure Data Factory and created POC in moving the data from flat files and SQL Server using U-SQL jobs. Created Active Batch jobs to automate the PySpark and SQL functions as daily run jobs. Proficient in usage of tools like Erwin (Data Modeler, Model Mart, navigator), ER Studio, IBM Meta Data Workbench, Oracle data profiling tool, Informatica, Oracle Forms, Reports, SQL*Plus, Toad, Crystal Reports. Utilized Spark SQL API in PySpark to extract and loaddataand perform SQL queries. Implement CI/CD pipelines for automating build, test, and deployment processes to accelerate the delivery of data solutions. Involved in writing data transformations, data cleansing using PIG operations and good experience in data retrieving and processing using HIVE. Experience in creating Spark Streaming jobs to process huge sets of data in real time. Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau. Created reports using visualizations such as Bar chart, Clustered Column Chart, Waterfall Chart, Gauge, Pie Chart, Tree map etc. in Power BI. Manage graph database systems and support large text corpus management. Experience installing tooling using the command line on Linux, and if needed Windows, servers. Expertise in relational database systems (RDBMS) such as My SQL, Oracle, MS SQL, and No SQL database systems like Hbase, MongoDB and Cassandra. Experience with Software development tools such as JIRA, GIT, SVN. Having experience in developing story telling dashboards data analytics, designing reports with visualization solutions using Tableau Desktop and publishing on to the Tableau Server. Tested, Cleaned, and Standardized Data to meet the business standards using Execute SQL task, Conditional Split, Data Conversion, and Derived column in different environments. Flexible working Operating Systems like Unix/Linux (Centos, Redhat, Ubuntu) and Windows Environments. Proficient with software tools that develop data pipelines in a distributed computing environment (PySpark, GlueETL). Developed Apache Spark jobs using Scala and Python for faster data processing and used Spark Core and Spark SQL libraries for querying. Proficient in using Hive optimization techniques like Buckets, Partitions, etc. Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa. Engages with internal stakeholders to understand and probe business processes in order to develop hypotheses.
Extensive experience using MAVEN as a Build Tool for the building of deployable artifacts from source code. Collaborate with all business users to gather requirements and also work with them to resolve day to day data related issues. Experience supporting and working with cross-functional teams in a dynamic environment.TECHNICAL SKILLS Languages: SQL, PL/SQL, PYTHON, Java, Scala, C, HTML, Unix, Linux ETL Tools: AWS Redshift Matillion, Alteryx, Informatica PowerCenter, Ab Initio Big Data: HDFS, Map Reduce, Spark, Airflow, Yarn, NiFi, HBase, Hive, Pig, Flume, Sqoop, Kafka, Oozie, Hadoop, Zookeeper, Spark SQL. RDBMS: Oracle 9i/10g/11g/12c, Teradata, My SQL, MS SQL NO SQL: MongoDB, HBase, Cassandra Cloud Platform: Microsoft Azure, AWS (Amazon Web Services) Concepts and Methods: Business Intelligence, Data Warehousing, Data Modeling, Requirement Analysis Data Modeling Tools: ERwin, Power Designer, Embarcadero ER Studio, IBM Rational Software Architect, MS Visio, ER Studio, Star Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss Other Tools: Azure Databricks, Azure Data Explores, Azure HDInsight Operating Systems: UNIX, Windows, LinuxPROJECT EXPERIENCEClient: USAA, San Antonio, TX Jun 2023 - PresentRole: Senior Big Data EngineerResponsibilities:
Synchronizing both the unstructured and structured data using Pig and Hive on business prospectus. Used Pig Latin at client-side cluster and HiveQL at server-side cluster. Importing the complete data from RDBMS to HDFS cluster using Sqoop. Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's Experienced with AWS services to smoothly manage application in the cloud and creating or modifying the instances Installed and configured OpenShift platform in managing Docker containers and Kubernetes Clusters. Using PySpark, created ETL jobs to assist in AWS ETL process. Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. Created Hive base script for analyzing requirements and for processing data by designing cluster to handle huge amount of data for cross examining data loaded in Hive and Map Reduce jobs. Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates. Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
Writing map reduce code using python in order to get rid of certain security issues in the data Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency. Load and transform large sets of structured, semi structured data using hive. Handle billions of log lines coming from several clients and analyze those using big data technologies like Hadoop (HDFS), Apache Kafka and Apache Storm.. Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards. Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop. Worked extensively on spark and MLlib to develop a regression model for cancer data. Hands on design and development of an application using Hive (UDF).. Involved in various NOSQL databases like HBase, Cassandra in implementing and integration. Loaded and transformed large sets of structured, semi structured and unstructured data using PIG by importing data using Sqoop to load and export data from My SQL to HDFS and NoSQL Databases on regular basis for designing and developing PIG scripts to process data in a batch to perform trend analysis of data. Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR. We used the most popular streaming tool Kafka to load the data on Hadoop File system and move the same data to Cassandra NoSQL database. Worked on migrating MapReduce programs into Spark transformations using Spark and Scala. Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS. Migrated existing MapReduce programs to Spark using Scala and Python. Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable. Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access. Exported the Analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau. Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates. Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system. Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.Work with Architects, Stakeholders and Business to design Information Architecture of Smart Data Platform for the Multistate deployment in Kubernetes Cluster. Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS Worked in AWS environment for development and deployment of custom Hadoop applications. Involved in Designing and Developing Enhancements product features. Involved in Designing and Developing Enhancements of CSG using AWS APIS. Created and maintained various DevOps related tools for the team such as provisioning scripts, deployment tools, and development and staging environments on AWS, Rack space and Cloud.
Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API. Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the networkEnvironment: HDFS, Hive, Scala, Sqoop, Spark, Tableau, Yarn, Cloudera, SQL, Terraform, Splunk, RDBMS, Elastic search, Kerberos, Jira, Confluence, Shell/Perl Scripting, Zookeeper, AWS (EC2, S3, EMR, Redshift, ECS, Glue, S3, VPC, RDS etc.), Ranger, Git, Kafka, Openshift, CI/CD(Jenkins), KubernetesClient: Chewy, Dania Beach, FL Sep 2021 May 2023Role: Big Data EngineerResponsibilities:
Implemented Copy activity, Custom Azure Data Factory Pipeline Activities Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell. Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
Implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB). Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes. Optimized the Tensor Flow Model for efficiency Built performant, scalable ETL processes to load, cleanse and validate data Implemented business use case in Hadoop/Hive and visualized in Tableau Create data pipelines to use for business reports and process streaming data by using Kafka on premise cluster. Process the data from Kafka pipelines from topics and show the real time streaming in dashboards Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience. Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2). Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab. Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development. Used Pyspark for data frames, ETL, Data Mapping, Transformation and Loading in complex and high-volume environment. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau. Business applications and Data marts for reporting. Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements. Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines Create Spark code to process streaming data from Kafka cluster and load the data to staging area for processing. Experienced in ETL concepts, building ETL solutions and Data modeling
Worked on architecting the ETL transformation layers and writing spark jobs to do the processing. Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.Environment: Kafka, Impala, Pyspark, Azure, HDInsight, Data factory, Databricks, Datalake, Apache Beam, Cloud Shell, Tableau, Cloud Sql, MySQL, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql, No Sql, MongoDB, TensorFlow, Jira.Client: PNC, Cleveland, OH May 2020 Aug 2021Role: Big Data EngineerResponsibilities:
Experience in Loading the data into Spark RDD s, perform advanced procedures like text analytics and processing using in memory data Computation capabilities of Spark using Scala to generate the Output response. Experience writing scripts using Python (or Go Lang) and familiarity with the following tools: AWS Cloud Lambda, AWS S3, AWS EC2, AWS Redshift, AWS Postgres Developed Autosys scripts to schedule the Kafka streaming and batch job. Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's. Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, Snowflake. Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR. Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping. Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning. Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool Responsible for building scalable distributed data solutions using Hadoop. Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW Implemented schema extraction for Parquet and Avro file Formats in Hive. Developed Hive scripts in Hive QL to de-normalize and aggregate the data. Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive. Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project. Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming. Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra. Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala. Worked with BI team to create various kinds of reports using Tableau based on the client's needs. Experience in Querying on Parquet files by loading them in to Spark's data frames by using Zeppelin notebook. Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS. Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra for data access and analysis.Environment: Hadoop Yarn, Spark-Core, Spark-Streaming, Spark-SQL, AWS Cloud, Scala, Python, Kafka, Hive, Sqoop, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, MySQL, LinuxClient: Leidos, Baltimore, MD Jan 2018 Apr 2020Role: Data Engineer
Responsibilities: Used Python programs automated the process of combining the large SAS datasets and Data files and then converting as Teradata tables for Data Analysis. Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka Developed the Interfaces in SQL, for data calculations and data manipulations. Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source. Apache tools like FLUME and SQOOP into HIVE environment. Developed Python programs for manipulating the data read from Teradata data sources and convert them as CSV files. Worked on Micro Strategy report development, analysis, providing mentoring, guidance and troubleshooting to analysis team members in solving complex reporting and analytical problems. Extensively used filters, facts, Consolidations, Transformations and Custom Groups to generate reports for Business analysis. Used MS Excel and Teradata for data pools and adhocs reports for business analysis Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using SQL, MS Excel and UNIX. Experience in automation scripting using shell and Python. Leveraged with the design and development of Micro Strategy dashboards and interactive documents using Micro Strategy web and mobile. Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios. Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability Kafka- Used for building real-time data pipelines between clusters.Environment: AWS, Kafka, Spark, Python, SQL, UNIX, MS EXCEL, Kafka, Hive, Pig, HadoopClient: Infonaya Software, India Jan 2015 Oct 2017Role: Data EngineerResponsibilities:
Involved in review of functional and non-functional requirements. Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce Created consumption views on top of metrics to reduce the running time for complex queries. Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts Developed spark code and spark-SQL/streaming for faster testing and processing of data. Installed and configured Pig and also written Pig Latin scripts. Designed and implemented MapReduce-based large-scale parallel relation-learning system Setup and benchmarked Hadoop/HBase clusters for internal use Created Metric tables, End user views in Snowflake to feed data for Tableau refresh. Wrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration. Importing and exporting data into HDFS from Oracle Database and vice versa using sqoop. Imported data using Sqoop to load data from Oracle to HDFS on regular basis. Written Hive queries for data analysis to meet the business requirements. Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Linux, XML, Tableau, Eclipse, Oracle 10g, Toad

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: