Data Engineer Big Resume Herndon, VA

Data Engineer Big Resume Herndon, VA
Resumes | Register
Candidate Information
Title	Data Engineer Big
Target Location	US-VA-Herndon
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Data Engineer Big Herndon, VA
Big Data Engineer Chantilly, VA
Senior Big data Engineer Columbia, MD
Senior Big Data Engineer Oxon Hill, MD
Data Engineer Big Annandale, VA
Data Engineer Big Washington, DC
Sr.Data engineer. Baltimore, MD
Click here or scroll down to respond to this candidate
Name: PoojithaSenior Big Data EngineerPROFESSIONAL SUMMARY:As a Big Data Engineer/Data Engineer possess 8 years of experience including designing, developing and implementation of data models for enterprise - level applications and systems.Hands-On experience on Spark Core, Spark SQL, Spark Streaming and creating the Data Frames handle in SPARK with Scala.Expertise in databricks components - HDFS, YARN, Name Node, Data Node and Apache Spark.Proficient in Data Analysis, Cleansing, Transformation, Data Migration, Data Integration, Data Import, and Data Export using ETL tools such as Glue ETL.My expertise includes designing and developing data pipelines, integrating Talend with AWS services such as S3, Redshift, and RDS, and optimizing ETL jobs for performance and scalability.Implemented large scale technical solutions using Object Oriented Design and Programming concepts using Python and Scala.Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.Experience in pushing data from raw bucket to staged SF bucket using S3 event notification, AWS SNS and Lambda.Experience in building Snow pipe and integrated with Auto-Ingest Enabled.Hands-on Data Transformation, loading within Snowflake using Snow SQL and Snowflake stored procedure.Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real-time data processing and performance improvements based on data access patterns.Hands-on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop Map Reduce programming.Experienced in using various Python libraries like NumPy, SciPy, Python-Twitter, Pandas.Cloudera certified Developer for Apache Hadoop. Good knowledge of Cassandra, Hive, Pig, HDFS, Sqoop and Map Reduce.Hands-on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.Developed applications using Spark and Scala for data processing.Experience with version control and CI/CD pipelines, ensuring consistent and seamless deployment of Talend ETL processes.Hands-on use of Spark and Scala APIs to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.Experience in Amazon web services (AWS) cloud like S3, EC2 and EMR and in Microsoft Azure.Expertise in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage, Azure AD Licenses, Office365)Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.Exposure on the usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.Excellent understanding and knowledge of NoSQL databases like HBase, MongoDB and Cassandra.Proficient in designing and implementing RESTful APIs for data access and integration.Used GitHub version control tool to push and pull functions to get the updated code from the repository.Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.EDUCATION:Bachelors in Computer Science, Mohan Babu University, IndiaCERTIFICATIONS:AWS Certified: Data Engineer AssociateTECHNICAL SKILLS:Big Data/Hadoop TechnologiesMap Reduce, Spark, Spark SQL, Spark Streaming, Kafka, Hive, HBase, Yarn, OozieDatabasesMicrosoft SQL Server, MySQL, Azure SQL, Oracle, DB 2, Teradata,NO SQL DatabasesCassandra, HBase, MongoDBLanguagesPython, Java and ScalaCloud TechnologiesAWS (EC2, IAM, S3, Autoscaling, Cloud Watch, Route53, EMR, Athena, Redshift, DynamoDB, Kinesis), Azure(Data Factory, Databricks, Datalake, BLOB Storage, Synapse Analytics)Development MethodologiesAgile,SAFE Agile, WaterfallBuild ToolsJenkins, PGAdmin, Talend, Maven, Apache Ant,Control-MReporting ToolsTableau, Power BIVersion Control ToolsGit, GitHub, GitLab, BitbucketWORK EXPERIENCE:Client: BCBS, Detroit, MI Oct 2022  Till NowRole: Senior Big Data EngineerResponsibilities:Evaluating client needs and translating their business requirement to functional specifications thereby onboarding them onto the Hadoop ecosystem.Extracted and updated the data into HDFS using Sqoop import and export.Developed HIVE UDFs to incorporate external business logic into Hive script and developed join data set scripts using HIVE join operations.Deployed, configured, and maintained Apache Druid clusters on AWS EC2 instances to ensure high availability, scalability, and fault tolerance.Designed and implemented data warehousing solutions using Snowflake to handle large-scale data storage and processing needs.Developed ETL processes using AWS Glue and Talend to extract, transform, and load data from various sources into Snowflake.Developed and maintained robust data models within Tableau, ensuring they are optimized for performance and scalability.Developed SQL Server Integrations Services (SSIS) packages to facilitate data migration projects, moving data from legacy systems to AWS-based data stores like Redshift and RDS.Worked with various HDFS file formats like Parquet and JSON for serializing and deserializing.Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, bigdata, Impala, Tealeaf, Pair RDD's, NiFi, Spark YARN.Good experience in using Relational databases Oracle, SQL, and PostgreSQL.Developed and managed data ingestion pipelines to feed data into Apache Druid, Elasticsearch from various sources like S3 and Kafka.Invoked an AWS Lambda function from PL/SQL to process data asynchronously.Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.Designed and developed ETL processes using AWS Glue to extract, transform, and load data from various sources into data lakes or data warehouses.Integrated Airflow with big data tools such as Spark, Hadoop, and data warehouses like Snowflake, Redshift, and BigQuery.Integrated Snowflake with AWS services such as S3, Redshift, RDS, and DynamoDB to facilitate seamless data migration and storage.Designed SQL Server Integrations Services (SSIS) workflows to transform data extracted from AWS S3, ensuring it meets the required format for downstream processing.Strong Knowledge of the architecture and components of Tealeaf, and efficient in working with Spark Core, Spark SQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka.Exposure to Spark, Spark Streaming, snowflake, Scala, and Creating the Data Frames handled in Spark with Scala.Optimized Snowflake queries and storage using techniques such as clustering keys and partitioning to improve performance and reduce costs.Good Exposure on MapReduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.Experienced Good understanding of NoSQL databases and hands-on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.Developed automation scripts using Python, SQL, Django to manage Redshift operations and maintenance tasks.Integrated Kubernetes with other AWS services such as Glue, EMR, and S3 for comprehensive data processing workflows.Integrated third-party data sources into Google BigQuery, utilizing connectors and APIs, with similar experience translating to AWS Glue for data integration and cataloging.Integrated Kinesis with other AWS services such as Lambda, S3, Redshift, and Elasticsearch to build end-to-end data processing pipelines.Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Pair RDD's, Spark YARNWorking on designing the MapReduce and Yarn flow and writing MapReduce scripts, performance tuning and debugging.Used Ansible to orchestrate data processing workflows across AWS Glue, EMR and AWS Data pipeline.Developed Oozie workflow engine to run multiple Hive, Pig, Tealeaf, Mongo DB, Git, Sqoop and Spark jobs.Created and maintain Tableau Data Extracts (TDE) and live connections to data sources such as SQL, Snowflake and AWS Redshift.Installed application on AWS EC2 instances and configured the storage on S3 buckets.Performed Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.Created Tableau dashboards to monitor data pipelines and ETL processes, providing insights into data quality, performance, and usage metrics.Implemented and managed infrastructure using Infrasture as Code IaC tools like AWS CloudFormation or Terraform, ensuring reproducibility and scalability of data infrastructure.Developed Terraform modules to summarize data processing clusters, database instances, networking setup, and IAM roles promoting code reuse and maintainability.Integrated Airflow workflows with CI/CD pipelines to automate testing, deployment, and updates of data pipelines.Implemented infrastructure as code, including version control, automated testing and CI/CD pipelines to ensure reliability and scalability of infrastructure changes using Terraform.Stored data in AWS S3 like HDFS and performed EMR Spark jobs on data stored.Worked on AWS Lambda functions in python for AWS Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.Environment: Hadoop (HDFS, Map Reduce), Scala, Data bricks, Yarn, IAM, PostgreSQL, Spark, Impala, Hive, Mongo DB, Pig, HBase, Oozie, Hue, Sqoop, Flume, Oracle, NIFI, Git, Terraform, AWS Services (Lambda, EMR, Auto scaling).Client: Morgan Stanley, NY Jan 2021 - Sep 2022Role: Sr. Data EngineerResponsibilities:As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in a development environment to 8 nodes in the pre-production stage and up to 24 nodes in production.Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.Built pipelines to move hashed and un-hashed data from XML files to Data lake.Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.Developed and deployed data processing workflows using tools like Azure Databricks, HDInsight, Azure Data Factory to transform data stored in ADLS Gen2 into actionable insights.Designed and Implemented data integration pipelines using Azure Synapse Pipelines to ingest data from various sources such as databases, Data Lakes, streaming platforms and external APIs.Designed security features such as Azure Active Directory integration, Role-Based Access Control (RBAC), and data encryption to protect sensitive data stored in ADLS Gen2.Enabled data analysts and data scientists to explore and analyze data stored in ADLS Gen2 using tools like Azure Synapse Analytics, Azure Data Explorer and Power BI.Integrated Power Bi with Azure Data Factory and Azure Data Lake Storage Gen2 and Azure Functions to create operational analytics dashboards for monitoring and managing data pipelines and infrastructure.Developed SQL Server Integrations Services (SSIS) packages to extract data from Azure Blob Storage, handling various file formats such as CSV, JSON, or Parquet.Implemented CI/CD pipelines in SAFe Agile to automate the deployment of artifacts such as data pipelines, models and analytics solutions.Utilized SAFe Agile to implement data pipelines, transformations, integrations and workflows.Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data.Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.Designed SQL Server Integrations Services (SSIS) workflows to transform data stored in Azure Data Lake, applying necessary cleansing, filtering, and aggregation operations.Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB).Involved in designing the row key in the HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in sorted order.Wrote Junit tests and Integration test cases for those Microservice.Worked heavily with Python, Spark, SQL, Airflow.snowScripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.Worked with developer teams on NiFi workflow to pick up the data from the rest API server, from data lake as well as from SFTP server and send that to Kafka.Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.Created Hive schemas using performance techniques like partitioning and bucketing.Used Hadoop YARN to perform analytics on data in Hive.Developed and maintained batch data flow using HiveQL and Unix scriptingInvolved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.Integrated Azure Synapse Analytics with Azure Data Lake Storage (ADLS) Gen2 to utilize the scalability and flexibility of Data Lake Storage for storing and processing large volumes of semi-structured and unstructured data.Primarily involved in the Data Migration process using Azure by integrating with GitHub repository and Jenkins.Integrated Log Analytics with DevOps pipelines and Continuous Integration/Continuous Deployment(CI/CD) processes to monitor application performance and deployment activities.Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.Developed workflow in Oozie to manage and schedule jobs on the Hadoop cluster to trigger daily, weekly and monthly batch cycles.Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).Queried both Managed and External tables created by Hive using Impala.Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL data warehouse environmentDeveloped customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDFs.Used windows Azure SQL reporting services to create reports with tables, charts and maps.Environment: Hadoop, Azure, Microservices, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC, Hive, JSON, Pig, Oozie, Sqoop, Zookeeper, Flume, Impala, SQL, Scala, Python, Unix, GitHub.Client: Unite US, New York Feb 2019 - Nov 2020Role: Big Data EngineerResponsibilities:Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica.Built APIs that will allow customer service representatives to access the data and answer queries.Designed changes to transform current Hadoop jobs to HBase.Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.The new Business Data Warehouse (BDW) improved query/report performance reduced the time needed to develop reports and established a self-service reporting model in Cognos for business users.Used Oozie scripts for deployment of the application and perforce as the secure versioning software.Developed story-telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end-users to understand the data on the fly with the usage of quick filters for on demand needed information.Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom Map Reduce programs in Java.Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.Implemented AJAX, JSON, and Javascript to create interactive web screens.Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB. Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS.Created Session Beans and controller Servlets for handling HTTP requests from Talend.Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.Used Git for version control with the Data Engineer team and Data Scientists colleagues. Involved in creating Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts, etc. using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server.Performed statistical analysis using SQL, Python, R Programming and Excel.Worked extensively with Excel VBA Macros, Microsoft Access FormsImport, clean, filter and analyze data using tools such as SQL, HIVE and PIG.Used Python& SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key conclusions.Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, Informatica, HBase, Map Reduce, HDFS, Sqoop, Impala, SQL, Tableau, Python, SAS, Flume, Oozie, Linux.Client: UXLI, India Aug 2017  Dec. 2018Role: Data Engineer / Hadoop DeveloperResponsibilities:Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka.Build the Oozie pipeline which performs several actions like file move process, Sqoop the data from the source Teradata or SQL and exports into the hive staging tables and performing aggregations as per business requirements and loading into the main tables.Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic Map Reduce (EMR) on (EC2).Worked on different data formats such as JSON, XML and performed machine learning algorithms.Performed pig script which picks the data from one HDFS path and performs aggregation and loads into another path which later pulls populates into another domain table. Converted this script into a jar and passed as a parameter in the Oozie script.Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity. Build an ETL that utilizes spark jar inside which executes the business analytical model.Hands-on experiences on Git bash commands like Git pull to pull the code from source and developing it as per the requirements, Git adds to add files, Git commits after the code build and Git push to the pre-prod environment for the code review and later used a screwdriver. YAML, which builds the code, generates artifacts which are released into production.Created logical data model from the conceptual model and its conversion into the physical database design using Erwin. Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop.Connected to AWS Redshift through Tableau to extract live data for real-time analysis.Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sourcesImplemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies on petabytes of data by using Azure snow, Hive, Hadoop, Python, PySpark, Spark SQL, MapReduce, and Azure Machine Learning.Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.Environment: MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka, JSON, XML PL/SQL, SQL, HDFS, Unix, pyspark, PySpark, Azure.Client: UNISON consulting,pte.ltd.,India Jun 2016  Jul 2017Role: Java/Hadoop DeveloperResponsibilities:Involved in review of functional and non-functional requirements.Installed and configured Pig and also written Pig Latin scriptsDeveloping Scripts and Batch Job to schedule various Hadoop ProgramWrote MapReduce job using Pig Latin. Involved in ETL, Data Integration and Migration.Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduceImported data using Sqoop to load data from Oracle to HDFS on regular basis.Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS.Build and deployed war file in a WebSphere application server.Written Hive queries for data analysis to meet the business requirements.Creating Hive tables and working on them using Hive QL. Experienced in defining job flowsInvolved in frequent meetings with clients to gather business requirements & converting them to technical specifications for the development team.Adopted agile methodology with pair programming technique and addressed issues during system testing.Involved in Bug fixing and Enhancement phase, used find bug tool.Importing and exporting data into HDFS from Oracle Database and vice versa using sqoopInvolved in creating Hive tables, loading the data and writing hive queries that will run internally in a MapReduce way. Developed a custom FileSystem plugin for Hadoop so it can access files on the Data Platform.Developed application in Eclipse IDE. Experience in developing spring Boot applications for transformations.Primarily involved in front-end UI using HTML5, CSS3, JavaScript, jQuery, and AJAX.The custom FileSystem plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.Designed and implemented MapReduce-based large-scale parallel relation-learning systemSetup and benchmarked Hadoop/HBase clusters for internal useEnvironment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Cloudera, Pig, HBase, Linux, XML, Eclipse, Oracle, PL/SQL, MongoDB, Toad.
Respond to this candidate
Your Message
Please type the code shown in the image: