Candidate Information | Title | Data Engineer Engineering | Target Location | US-PA-Philadelphia | | 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateOBJECTIVEAccomplished data engineer with over 6+ years of practical experience, specializing in data architecture, ETL processes, and analytics. Seeking to apply my expertise to enhance data-driven initiatives through innovative and efficient solutions. Committed to optimizing data pipelines and fostering cross-functional collaboration to deliver scalable, impactful outcomes. PROFILE SUMMARY Around 6+ years of experience in Data engineering encompassing Requirements Analysis, Design Specification, and Testing in both Waterfall and Agile methodologies. Proficiency with Scala, Apache HBase, Hive, Pig, Sqoop, Zookeeper, Spark, Spark SQL, Spark Streaming, Kinesis, Airflow, Yarn, and Hadoop (HDFS, MapReduce). Expert in utilizing Spark SQL, Spark streaming and developing Data frames using Spark with Snowflake. Expertise in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage, Azure AD Licenses, Office365). Experience in Cisco Cloud Center to more securely deploy and manage applications in multiple data center, private cloud, and public cloud environments. Experience in automating day-to-day activities by using Windows PowerShell. Deployed Dockers Engines in Virtualized Platforms for containerization of multiple apps. Has experience with story grooming, Sprint planning, daily standups, and software techniques like Agile and SAFe. Develop DDL and DML statements for data modelling, data storage and looked into performance fine tuning. Planning and implementing Disaster Recovery solutions, capacity planning, data archiving, backup/recovery strategies, Performance Analysis and optimization. Used Spark streaming, HBase, and Kafka to work on real-time data integration. Experience working with Front end technologies like Html, CSS, JS, ReactJS. Practical knowledge in setting up and designing large-scale data lakes, pipelines, and effective ETL (extract/load/transform) procedures to collect, organize, and standardize data that can be used to Converted a current on-premises application to use Azure cloud databases and storage. Experienced in building Snow Pipes, migrating Teradata objects into Snowflake environment. Experience in Azure Marketplace where to search, deploy and purchase wide range of applications and services. Worked with Matillion which Leverage Snowflakes separate compute and storage resources for rapid transformation and get the Get the most from Snowflake-specific features, such as Alter Warehouse and Flatten Variant, Object, and Array. Proficient in building CI/CD pipelines in Jenkins using pipeline syntax and groovy libraries. Extensive experience in relational Data modeling, Dimensional data modeling, logical/Physical Design, ER Diagrams and OLTP and OLAP System Study and Analysis. Hands-on experience with Spark, Databricks, and Delta Lake. Highly skilled in using visualization tools like Tableau, matplotlib, ggplot2 for creating dashboards. Experience working with NoSQL databases like MongoDB and AWS DynamoDB in storing and retrieving json documents.EDUCATION Masters in Computer Science from Grand Valley State University, USA TECHNICAL SKILLSBig Data Ecosystem HDFS, Yarn, MapReduce, Spark, Kafka, Kafka Connect, Hive, Airflow, Stream Sets, Sqoop, HBase, Flume, Pig, Ambari, Oozie, Zookeeper, Nifi, Sentry Hadoop Distributions Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP Cloud Environment Amazon Web Services (AWS), Microsoft Azure Databases MySQL, Oracle, Teradata, MS SQL SERVER, PostgreSQL, DB2, Mongo DB NoSQL Database DynamoDB, HBaseAWS EC2, EMR, S3, Redshift, EMR, Lambda, Kinesis Glue, Data Pipeline WORK EXPERIENCEClient: Wilmington Trust, Wilmington, Delaware, USA (Jul 2023 - Present) Role: Azure Data EngineerDescription: Wilmington Trust is the largest American institutions by fiduciary assets. It is currently a provider of international corporate and institutional services, investment management, and private banking. I design, develop, and maintain robust data pipelines to support data ingestion, processing, and storage. Responsibilities: Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package. Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Implemented airflow for workflow automation and scheduling tasks and created DAGs tasks. Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo. Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards. Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects. Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy. Used Docker for managing the application environments. Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications. Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis. Implemented Navigation rules for the application and page outcomes, written controllers using annotations. Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query. Designing and implementing data integration solutions using Azure Data Factory to move data between various data sources, including on-premises and cloud-based systems. Execute the Validation Process through SIMICS. Worked on migration of data from On - prem SQL server to Cloud databases (Azure Synapse Analytics (DW)& Azure SOL DB). Installing and automation of application using configuration management tools Puppet and Chef. Performed ETL to move the data from source system to destination systems and worked on the Data warehouse. Implemented data transformations and enrichment using Apache Spark Streaming to clean and structure the data for analysis. Developed Triggers, stored procedures, functions, and packagers using cursors associated with the project using PL/SQL. Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens. Developed custom reports using HTML, Python and MySQL.Environment: Azure, Oracle, Kafka, Python, Informatica, SQL Server, Erwin, RDS, NOSQL, Snowflake Schema, MySQL, Bash, Dynamo DB, PostgreSQL, Tableau, Git Hub, Linux/Unix Client: ChristianaCare, Newark, Delaware, USA (Sep 2022 - Jun 2023) Role: AWS Data EngineerMicrosoft Azure Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB, Azure Active DirectoryOperating systems Linux, Unix, Windows 10, Windows 8, Windows 7, Windows Server 2008/2003, Mac OS Softwares/Tools Microsoft Excel, Stat graphics, Eclipse, Shell Scripting, ArcGIS, Linux, Jupyter Notebook, PyCharm, Vi / Vim, Sublime Text, Visual Studio, Postman, Ansible, Control M. Reporting Tools/ETLToolsInformatica, Talend, SSIS, SSRS, SSAS, ER Studio, Tableau, Power BI, Arcadia, Data stage, PentahoProgrammingLanguagesPython (Pandas, SciPy, NumPy, Scikit-Learn, Stats Models, Matplotlib, Plotly, Seaborn, Keras, TensorFlow, PyTorch), PySpark, T-SQL/SQL, PL/SQL, HiveQL, Scala, UNIX Shell Scripting, C# Version Control Git, SVN, BitbucketDevelopment Tools Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Description: ChristianaCare is a network of private, non-profit hospitals providing health care services. I implemented and managed ETL (Extract, Transform, Load) processes to ensure data integrity and availability. Responsibilities: The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS. Implemented RESTful Web-Services for sending and receiving data between multiple systems. Led migration to AWS, leveraging Amazon Redshift for data warehousing and utilizing HiveQL for reporting, reducing data retrieval and processing time by 30%. Extracted and transformed the log data files from S3 by Scheduling AWS Glue jobs and loaded the transformed data into Amazon Elastic search. Experience in using different types of stages like Transformer, Aggregator, Merge, Join, Lookup, and Sort, remove duplicated, Funnel, Filter, Pivot for developing jobs. Worked on SQL and PL/SQL for backend data transactions and validations Written queries in MySQL and Native SQL. Worked with Docker containers in developing the images and hosting them in antifactory. Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers. Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application. Worked on Big Data Integration & Analytics based on Hadoop, SOLR, PySpark, Kafka, Storm and web Methods. Developed Spark applications using Scala and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data uncover insight into the customer usage patterns and even. Build Jenkins jobs for CI/CD Infrastructure for GitHub repos. Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level.Environment: Kafka, HBase, Docker, Kubernetes, AWS, EC2, S3, Lambda, Cloud Watch, Auto Scaling, EMR, Redshift, Jenkins, ETL, Spark, Hive, Athena, Sqoop, Pig, Oozie, Spark Streaming, Hue, Scala, Python, Databricks, GIT, Micro Services, Unix/Linux, Snowflake.Client: (Accenture) CNA Insurance Company, Mumbai, India (Feb 2021 - Aug 2022) Role: Application Developer/ Data EngineerDescription: CNA Insurance Compant is an American insurer group of vehicles, homes and small businesses and also provides other insurance and financial services products. I maintained and monitored database performance, ensuring data security and compliance with relevant regulations. Responsibilities: Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics. Data Ingestion to one or more Azure and processing the data in In Azure Databricks. Developing scalable and reusable database processes and integrating them. Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development. Looked into existing Java/Scala spark processing and maintained, enhanced the jobs. Worked with Spark Core, Spark ML, Spark Streaming and Spark SQL and data bricks. Created Airflow DAGs to schedule the Ingestions, ETL jobs and various business reports. Built Docker Images to run airflow on local environment to test the Ingestion as well as ETL pipelines. Created clusters to classify control and test groups. Experience working with large data set ts and Machine Learning class using Tensor Flow and Apache Spark. Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure. Managed large datasets using Panda data frames and SQL. Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization. Understood the application's current Production state and the impact of new installation on existing business processes. Architected Python scripts for automated data extraction and loading from web server output files, reducing manual data entry, and processing time by 75%. Developed workflow using Oozie for running MapReduce jobs and Hive Queries. Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark.Environment: Hortonworks, Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, NiFi, Python, SQL server, Oracle, HBase, Hive, Impala, Pig, Sqoop, Tableau, NoSQL, Unix/Linux, Spark, PySpark, Notebooks, Control M. Client: (Sprv Technologies Private Ltd),Hyderabad, India (Jan 2018 - Feb 2021) Role: Data EngineerDescription: Sprv Technologies Private Ltd product base company. My role is to integrate data from various sources, including internal databases, third-party services, and APIs. Ensure seamless data flow across systems and platforms, enabling effective data utilization.Responsibilities: Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to leverage additional scalability and managed services. Have used T-SQL for MS SQL server and ANSI SQL extensively on disparate databases. Design and build scalable data pipelines to ingest, translate, and analyze large sets of data Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster. Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance. Integrated data storage solutions. Responsible for Building and Testing of applications. Experience in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing and configuring various packages in python (Teradata, MySQL, MySQL connector, PyMongo and SQLAlchemy). Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS. Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export. Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions. Worked on CI/CD tools like Jenkins, Docker in Devops Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between. Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens. Environment: HDFS, Hadoop, Hive, Hbase, MapReduce, Spark, Sqoop, Pandas, MySQL, SQL Server, Java, Python, Tableau, Git, Linux/Unix |