Quantcast

Data Engineer Big Resume Herndon, VA
Resumes | Register

Candidate Information
Title Data Engineer Big
Target Location US-VA-Herndon
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Data Engineer Big Herndon, VA

Big Data Engineer Chantilly, VA

Senior Big data Engineer Columbia, MD

Senior Big Data Engineer Oxon Hill, MD

Data Engineer Big Annandale, VA

Data Engineer Big Washington, DC

Sr.Data engineer. Baltimore, MD

Click here or scroll down to respond to this candidate
SUMMARY8+ years of IT Experience and 7 Years as a Data Engineer with experience in all phases of Software Application requirement analysis, design, development, and maintenance of Big Data applicationsDesigned and developed big data applications using Python, Spark, Java, Hadoop, and Scala components in batch and real-timeProficient in Spark applications (Spark-Core, Dataframes, Spark-SQL, Spark-ML, Spark-Streaming)Expertise in real-time data collection with Kafka and Spark streamingManaged Snowflake data warehouse and ETL pipelines with SnowSQL, Snowpipe, and AWS integrationExperienced in Azure services, including Azure Data Factory and Azure DatabricksExperience in analyzing, designing, and developing ETL Strategies and processes, writing ETL specifications.Excellent understanding of NoSQL databases like HBASE, DynamoDB, MongoDB.Strong database schema modeling, Spark Data Frames, Spark SQL, and Spark-ML usageWorked with various Hadoop services and Apache Sqoop for data integrationExperience in Mapping the transformations using Talend design interface like join, aggregation, quality checksDeveloped Kafka producers and consumers for streaming dataDeveloped and maintained complex data models to support business requirements and data analysis.Proficient in T-SQL query optimization, distributed systems, and CI / CD with Jenkins and version controlExpertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.Skilled in Tableau, PowerBI, Java-based Spring Boot Rest applications, and SDLC methodologiesFamiliar with Oozie workflow engine, Airflow, SonarQube, GIT, JIRA, and JenkinsTECHNICAL EXPERTISEBig Data / Hadoop: Spark, HBase, Kafka, Hive, HDFS, Impala, Sqoop, Yarn, Cloudera, MapReduceSQL Databases: SQL Server, Oracle SQL, MySQL, PL SQL, TeradataNoSQL Databases: HBase, Cassandra, DynamoDB, MongoDBAWS Cloud: S3, EC2, EFS, VPC, Route53, Redshift, EMR, Glue, Lambda, Athena, Step Functions, Cloudwatch, SNS, SQS, KinesisAzure Cloud: Azure Data Factory, Azure Data Lake Storage, Azure synapse, Azure Data Bricks, AzureFunctionsProgramming Languages: Python, Java, ScalaBuild and SCM Tools: Docker, Jenkins, Git, SVN, React, MavenSDLC Methodologies: Agile, SCRUMCertificationsAWS Certified Solutions Architect - Associate (SAA-C03), March 2022PROFESSIONAL EXPERIENCECapitalOne August 2021 to July 2024Sr. Data EngineerBuilt Data Pipelines to load S3 data into Postgres DB which involves AWS Step functions to automate the states execution needed for Serverless development.Migrated the data from Amazon Redshift data warehouses to Snowflake. Involved in code migration of quality monitoring tools from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on Snowflake warehouses.Created and Deployed Lambda Layers for snowflake and Extracted data from Snowflake.Wrote AWS Lambda functions in Python for AWSs Lambda which involves python scripts to perform various transformations and analytics on large datasets in EMR clusters.Created AWS CloudWatch alerts for instances and used them in Auto scaling launch configurations.Created Athena and integrated with AWS Glue to fully manage ETL service that can categorize the data.Worked on AWS hosted Databricks environment and used spark structured streaming to consume the data from Kafka topics and perform merge operations on delta lake tables.Experience in developing spark applications using Spark SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing & transforming the data to uncover insights into the customer usage patterns.Optimized real-time data ingestion pipelines by integrating Kafka with AWS Lambda and DynamoDB allowing for high-throughput, low-latency processing of streaming data.Architected and implemented serverless workflows using AWS Step Functions and State Machines to orchestrate complex ETL processes, including automated data ingestion from S3 into Postgres DB.Utilized Apache Airflow to create scalable and robust workflows, automating data transformations, and regularly scheduled ETL jobs, leading to increased team efficiency and data quality.Scheduled individual lambdas to run on a daily basis by adding rules in the AWS Event bridge.Worked with a lot of REST APIs to extract data and ingest S3 into Postgres DB.Developed complex stored procedures and views to generate various drill-through reports, parameterized reports and linked reports using SSRS.Ingested large volume of user metadata from various BI dashboards built within AWS Quick sight, ThoughtSpot.Created scalable REST APIs using AWS API Gateway and Lambda, enabling streamlined data access and management.Created SSIS Packages to integrate data coming from Text files and Excel files.Involved in the Development of copying data from S3 to Redshift using the Talend process.Designed and documented operational problems by following standards and procedures using JIRA.Environments: AWS Cloud, Snowflake, S3, EMR, PL_SQL, Lambda, Redshift, Athena, DynamoDB, Hadoop, Talend, Spark, Scala, Python Java, Hive, Kafka, Terraform, Databricks, DockerCloudflare, Inc., Austin, TX February 2020 to July 2021ETL Data EngineerResponsible for creating a data lake on the Azure Cloud Platform to improve business teams' use of Azure Synapse SQL for data analysis.Utilized Azure SQL as an external hive meta store for Databricks clusters so that metadata is persisted across multiple clusters.Employed Azure Data Lake Storage as a data lake and made sure that spark and hive tasks immediately sent all the processed data to ADLS.Responsible for designing fact and dimension tables with Snowflake schema to store the historical data and query them using T-SQL.Strong Experience working with Azure Databricks runtimes and utilizing data bricks API for automating the process of launching and terminating runtimes.Experience in integrating Snowflake data with Azure Blob Storage and SQL Data Warehouse using Snow Pipe.Employed resources like SQL Server Integration Services, Azure Data Factory, and other ETL tools to identify the route for transferring data from SAS reports to Azure Data Factory.Transferred data to Excel and Power BI for analysis and visualization after being moved to Azure Data Factory and managed in Azure Databricks.Environment: Azure Cloud, Azure Data Lake Storage, Databricks, Synapse SQL Pools, Scala, Excel, PowerBI, Apache Spark, Azure SQL, Hive, HDFS, Azure Data Factory, PIG, Apache Kafka.COX Communications December 2018 to January 2020Data EngineerEngaged in the development of Spark applications to execute diverse data cleansing, validation, transformation, and summarization tasks as required, establishing a centralized data lake on the AWS Cloud PlatformAccountable for ingesting large amounts of individual consumers and advisory data into the Analytics data store.Data sources are extracted, transformed, and loaded to generate CSV data files using Python programming and SQL queries.Developed customized S3 event alerts to trigger AWS Lambda actions, such as object creation, object deletion, or object restoration events.Worked on Kafka Producer using Kafka java producer API to connect to external rest live stream application and produce messages to Kafka topic.A Spark streaming application is written to consume data from Kafka topics and write the processed stream to HBase.Developed, implemented, and optimized high-performance ETL pipelines on AWS EMR using Apache Sparks Python API (PySpark)Utilized Glue metastore as common metastore between EMR clusters and Athena query engine with S3 as the storage layer for both.Designed SOAP-based web services with XML, XSD, and WSDL, enabling seamless data exchange and integration across platforms.Build effective data models and schemas to fulfill data analytics and reporting requirements utilizing Snowflake.Exhibited knowledge of standards for data visualization and stayed current with Tableau and Power BI changes and industry trends.Developed new techniques for orchestrating the Airflow built pipelines and used airflow environment variables for defining project level and encrypting the passwords.Contributed to the setup of continuous integration and continuous deployment (CI / CD) pipelines, facilitating the integration of infrastructure changes and expediting time-to-productionDesigned and documented operational issues according to standards and procedures using JIRA.Environment: Spark, Hive, AWS S3, Tableau, PowerBI, Sqoop, Snowflake, Kafka, Talend, HBase, Scala, Python, PySpark, Linux, Jira, Jenkins, Unix.NetxCell Limited, India January 2017 to November 2018Hadoop DeveloperEngaged in the development of Spark applications using Scala to execute diverse data cleansing, validation, transformation, and summarization tasks based on specific requirementsLoaded data into Spark RDD and performed in-memory data computations to generate outputs tailored to specific requirementsCreated data pipelines utilizing Spark, Hive, and Sqoop to ingest, transform, and analyze operational data effectivelyImplemented real-time streaming of data using Spark in conjunction with Kafka, with a particular focus on managing streaming data from web server console logsWorked with various file formats such as Text, Sequence files, Avro, Parquet, JSON, XML files, and Flat files by leveraging Map Reduce ProgramsDesigned, modified, and maintained database tables, indexes, views, and stored procedures using T-SQL to ensure data consistency, accuracy, and securityOversaw Hadoop cluster operations using Cloudera Manager, interfaced with Cloudera support, logged issues in the Cloudera portal, and addressed them based on recommendationsDeveloped a daily process for incremental data imports from DB2 and Teradata into Hive tables using SqoopCrafted Pig Scripts to generate Map Reduce jobs and executed ETL procedures on data stored in HDFSUtilized Tableau for dashboard creation and daily report generationCollaborated with cross-functional consulting teams within the data science and analytics group to design, develop, and implement solutions aimed at deriving business insights and addressing client operational and strategic challengesExported analyzed data to relational databases using Sqoop for visualization and report generation by the BI teamManaged Cloudera Hadoop Upgrades, Patches, and the Installation of Ecosystem Products through Cloudera Manager, including Cloudera Manager UpgradesWorked extensively with Hive, including handling Partitions, Dynamic Partitioning, and bucketing tables. Designed both Managed and External tables and optimized Hive queriesEnvironment: Java, Scala, Apache Spark, MySQL, CDH, IntelliJ IDEA, Hadoop, Hive, HDFS, YARN, Map Reduce, Sqoop, Flume, UNIX Shell Scripting, Python, Apache KafkaILenSys Technologies, India August 2015 to December 2016Java DeveloperTook part in the design and implementation process across all SDLC phases, encompassing development, testing, implementation, and ongoing maintenance supportPlayed a role in analyzing requirements, designing, developing, and testing risk workflow systemsCrafted user interfaces using JSP, HTML, CSS, and JavaScript to enhance application simplicityEmployed Java and MySQL on a daily basis for diagnosing and resolving client process-related problemsIntegrated stored procedures and functions into SQL statements using JavaConducted backend testing on the database by executing SQL queriesUtilized core Java extensively, including areas like multithreading, exception handling, and collectionsDeveloped the database access layer using JDBC and SQL stored proceduresManaged version control of the source code using GitEmployed Apache Tomcat as the application server for application developmentUtilized the JIRA tracking tool to effectively oversee and address issues reported by the QA team, prioritizing actions based on severityEnvironment: Java, JDBC, Eclipse IDE, SP, HTML, CSS, JavaScript, Spring Boot, JIRA, Git, Apache Tomcat

Respond to this candidate
Your Message
Please type the code shown in the image:

Note: Responding to this resume will create an account on our partner site postjobfree.com
Register for Free on Jobvertise