| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Sr. Data EngineerEmail:EMAIL AVAILABLE
Phone: PHONE NUMBER AVAILABLE
LinkedIn: https://LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY
To obtain a challenging Data Engineer position that leverages my 9+ years of experience in the software industry, including 5+ years of experience in Azure cloud services, Google cloud services and big data technologies, and 4 years of experience in Data warehouse implementations. Experience in Amazon Web Services (AWS) cloud platform like EC2, Virtual Private Clouds (VPCs), Storage models (EBS, S3, and instance storage), Elastic Load Balancers (ELBs), Glue, Lambda, Secrets Manager, IAM, CloudWatch. Strong grasp on structured streaming and Databricks in the context of Spark Architecture. Configuring Databricks with AWS and Microsoft Azure, Databricks Workspace for Business Analytics, Control Databricks Clusters and the Machine Learning Lifecycle. Competent in setting up auto-scaling groups and working with EC2 instances, ECS, EBeanstalk, Lambda, Glue, RDS, DynamoDB, CloudFront, CloudFormation, S3, Athena, SNS, SQS, X-ray, and Elastic load balancing (ELB). Experienced with Azure services, including Data Lake, Data Lake Analytics, SQL Database, Synapse, Databricks, Data Factory, Logic Apps, and SQL Data Warehouse. Familiar with Azure Data Factory and Azure Data Bricks, two essential tools in the Azure cloud environment for building data pipelines and doing data transformations. Built efficient data pipelines utilizing Azure Data Factory to manage data in a multi-cloud scenario between GCP and Azure. Worked on Azure and GCP machine learning tools like Azure cognitive services and AutoML. Expertise in designing and implementing scalable data processing solutions on Google Cloud Platform (GCP), including GCP data processing services like Dataflow, Dataproc, and Big Query. Proficient in building data pipelines using GCP Dataflow for real-time and batch data processing, including data ingestion, data migration, data modeling, transformation, and enrichment. Strong understanding of GCP Big Query for data warehousing and analytics, including optimizing query performance, managing partitions, and working with nested and repeated data structures. Experienced in Dimensional Data Modeling using data modeling, relational data modeling, ER/Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT, dimensions tables, conceptual, physical & logical data modeling. Expertise in working with the Hadoop ecosystem, which includes spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, and Nifi. Proficiency in utilizing ETL tool Informatica Power Center 9.x for developing data warehouse loads with work experience focused on data acquisition and data integration. Hands on working experience with RESTful API s, API life cycle management and consuming RESTful services. Used groovy for data processing and ETL methods to extract data from various sources, then converting it to business rules then loading into data warehouses or databases. Experienced in Dimensional Data Modeling using data modeling, relational data modeling, ER/Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT, dimensions tables, conceptual, physical & logical data modeling. Adept at using SQL Server Management Studio to create, deploy, and manage SSIS packages. Knowledgeable about setting up data sources, jobs, and scheduling packages using SQL server agent jobs. Expertise in using CI/CD pipelines and related tools, such as Git and Jenkins, to improve teamwork and provide frequent code changes, which increases development efficiency overall. Worked in both Agile and Waterfall environment. Used Git and SVN version control systems.
Extensive knowledge of designing reports, scorecards, and dashboards using Power BI. Experience in using data from multiple sources and creating reports with Interactive dashboards using power BI.TECHNICAL SKILLS
Skill CategorySkillsBig Data TechnologiesApache Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Spark, Impala, Apache Storm, Apache Cassandra, Kafka, Snowflake, PySpark, Cloudera manager.GCP ServicesGoogle Kubernetes Engine, Bigtable, Cloud Spanner, BigQuery, Dataflow, Dataproc, IAM, KMS, Stack driver and Cloud Billing.Programming & ScriptingPython, Scala, Java, SAS, R, SQL, MATLAB, HiveQL, Groovy, PowerShell, and BASH Scripting.DatabasesOracle, Microsoft SQL Server, MySQL, DB2, Teradata.NoSQL DatabasesMongoDB, Cassandra, HBase, Amazon DynamoDB, Azure Cosmos DB.Cloud ServicesAmazon Web Services (AWS), Microsoft Azure, Snowflake, Apache Kafka, Apache Airflow.Spark ComponentsApache Spark, Data Frames, Spark SQL, Spark, YARN, Pair RDDs.Data WarehousingSnowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics.Data Integration/ETLApache NiFi, Apache Kafka, Talend, Apache Airflow, Apache Beam, Apache Flink, SSIS, SSRS.Azure ServicesAzure Data Factory, Azure Databricks, Azure Data Lake, Azure Key Vault, Azure Active Directory, Azure Service Bus, Azure Event Hub, Azure Virtual Machines.AWS ServicesAmazon S3, Glue, Redshift, EMR, Sage Maker.ContainersDocker, Kubernetes, Amazon ECS, GCP Kubernetes Engine.Reporting/VisualizationTableau, Looker, PowerBI.
WORK EXPERIENCE
Client: OSF, Peoria, IL Dec 22 - Till DateRole: Azure/Snowflake Data Engineer
Description:
OSF aimed to implement a data-driven strategy for personalized patient care by leveraging Azure cloud infrastructure and Snowflake for scalable data warehousing.Responsibilities: Used the Azure PaaS service to analyze, create, and develop modern data solutions to enable data visualization. Created PySpark Data Frames in Azure Databricks to read data from Data Lake/ Blob storage and manipulate it using Spark SQL context. Performed ETL jobs from multiple sources to Azure Data Lake Storage (ADLS) using a combination of Azure Data Factory (ADF), Spark SQL and processing the data in Azure Databricks. Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL. Worked on a cloud POC to choose the optimalcloud vendor based on a set of success criteria. Spark integration of data storage systems, particularly Azure Data Lake and Blob storage. Created multiple Databricks Spark jobs with PySpark to perform tables-to-table operations. Migrated the data from SAP, Oracle and created Data mart using Cloud Composer (Airflow) and moving Hadoop jobs to Datapost workflows. Improving the performance of Hive and Spark jobs. To process data in Hadoop, developed Hive scripts using Teradata SQL scripts. Understanding of Hive partitions and bucketing concepts to build both Managed and External tables in Hive to maximize performance. Created generic scripts to automate processes such as creating hive tables and mounting ADLS to Azure Databricks. Created JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) data process the data using the SQL Activity. Used Hive queries to analyze massive data sets of structured, unstructured, and semi-structured data. Using advanced techniques such as bucketing, partitioning, and optimizing self joins, worked with structured data in Hive to increase performance.Environment: Azure Data Lake, SnowSQL, Azure SQL, Azure Data Factory(V2), Azure Databricks, Python 2.0, SSIS, Azure Blob Storage, Spark 2.0, Hive.Client: American Express, Phoenix, AZ Oct 21 - Nov 22Role: AWS/Snowflake Data EngineerDescription: My position as a Sr Data Engineer at American Express involved in designing, implementing, and optimizing data solutions that cater to the specific needs of financial institutions. Developed a centralized financial data warehouse to consolidate data from multiple financial systems, enabling comprehensive reporting and analytics.Responsibilities: Responsible for loading data into S3 buckets from the internal server and the Snowflake data warehouse. Built the framework for efficient data extraction, transformation, and loading (ETL) from multiple data sources. Used Amazon Web Services (Linux/Ubuntu) to launch Amazon EC2 Cloud Instances and configure launched instances for specific applications. Worked extensively on moving data from Snowflake to S3 for the TMCOMP/ESD feeds. For code Productization, wrote codes for data pipeline definitions in JSON format. Used AWS Athena extensively to import structured data from S3 into multiple systems, including RedShift, and to generate reports.
For constructing the common learner data model, which obtains data from Kinesis in near real time, we used Spark-Streaming APIs to perform necessary conversions and operations. Developed Snowflake views to load and unload data from and to an AWS S3 bucket, as well as transferring the code to production. Worked extensively on SQL, Informatica, MLoad, Fload, FastExport as needed to handle different scenarios. Using Python programming and SQL queries, data sources are extracted, transformed, and then loaded to make CSV data files. Used Informatica Power Center Workflow manager to create sessions, workflows, and batches to run with the logic embedded in the mappings. Created DAGs to automate the process using by Python schedule jobs in Airflow. Worked in a Hadoop and RDBMS environment, designing, developing, and maintaining data integration applications that worked with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data storage for data access and analysis. Advanced activities such as text analytics and processing were performed using Spark's in-memory computing capabilities. RDDs and data frames are supported by Spark SQL queries that mix Hive queries with Scala and Python programmable data manipulations. Analyzed Hive data using the Spark API in conjunction with the EMR Cluster Hadoop YARN. Enhancements to existing Hadoop algorithms using Spark Context, Spark-SQL, Data Frames, and Pair RDDs Assisted with the creation of Hive tables and the loading and analysis of data using Hive queries. Conducted exploratory data analysis and data visualizations using Python (matplotlib, NumPy, pandas, seaborn).Environment: AWS S3, Hadoop YARN, SQL Server, Spark, Spark Streaming, Scala, Kinesis, Python, Hive, Linux, Sqoop, Tableau, Talend, Cassandra, oozie, Control-M, EMR, EC2, RDS, Dynamo DB, Oracle 12c.Client: Intetics Inc., FL Mar 19- Sep 21Role: Sr Data Engineer/GCP Data EngineerDescription: In Intetics my role as a Sr data engineer involved in designing and implement a scalable data integration and analytics system. Also responsible for building Data Pipelines and On-Premises Migration to Google Cloud Platform (GCP).Responsibilities: Experience in building multiple Data pipelines, end to end ETL/ELT process for Data ingestion and transformation in GCP and coordinating tasks among the team. Design and implement various layer of Data Lake, Design star schema in BigQuery. Using g-cloud function with Python to load data into Bigquery for on arrival csv files in GCS bucket. Process and load bound and unbound data from Google pub/sub topic to BigQuery using cloud Dataflow with Python. Designed Pipelines using Apache Beam, Kubeflow, Dataflow and orchestrated jobs into GCP. Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, BigQuery, CloudSQL and Cloud DataProc. Documented the inventory of modules, infrastructure, storage and components of existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration. Designing, developing and implementation of ETL pipelines using python API (PySpark) for Apache Spark. Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud. Exposure on IAM roles in GCP. Created firewall rules to access Google data procs from other machines. Process and load bound and unbound Data from Google pub/subtopic to BigQuery using cloud Dataflow with Python. Setup GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.Environment: GCP, Cloud SQL, Big Query, Cloud DataProc, GCS, Cloud Composer, Informatica Power Center 10.1, Talend 6.4 for Big Data, Hadoop, Hive, Teradata, SAS, Spark, Python, Java, SQL Server.Client : PLZ Corp, IL April 17-Feb 19
Role : AWS Data EngineerDescription: As an AWS Data Engineer at PLZ Corp, played a critical role in designing, developing, and implementing robust and scalable data pipelines to support organization's data-driven initiatives.
Responsibilities: Written Spark applications using Scala to interact with the PostgreSQL database using Spark SQL Context and accessed Hive tables using Hive Context. Involved in designing different components of system like big-data event processing framework Spark, distributed messaging system Kafka and SQL database PostgreSQL. Implemented Spark Streaming and Spark SQL using Data Frames. Integrated product data feeds from Kafka to Spark processing system and store the order details in PostgreSQL data base. Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing Created multiple Hive tables, implemented Dynamic Partitioning and Buckets in Hive for efficient data access. Designed tables and columns in Redshift for data distribution across data nodes in the cluster by keeping columnar database design into consideration. Create, modify and execute DDL in table AWS Redshift and snowflake tables to load data. Involved in creating Hive External tables, also used custom SerDe's based on the structure of input file so that Hive knows how to load the files to Hive tables. Managed large datasets using Panda data frames and MySQL Monitor Resources and Applications using AWS Cloud Watch, including creating alarms to monitor metrics such as EBS, EC2, ELB, RDS, S3, SNS and configured notifications for the alarms generated based on events defined Monitor System health and logs and respond accordingly to any warning or failure conditions. Worked on scheduling all jobs using Oozie.Environment: AWS EMR 5.0.0, EC2, S3, Oozie 4.2, Kafka, Spark, Spark SQL, PostgreSQL, Shell Script, Sqoop 1.4, Scala.Client: Sonata Software Ltd, India Nov 14 -Feb 17Role: Hadoop/Jr Data Engineer
Description:
I was a key player in the setup and management of the Hadoop Ecosystem on Azure as a Data Engineer at Sonata Software Ltd.Responsibilities: Installed HortonWorks Hadoop cluster on Confidential Azure cloud in the US region to satisfy customer s data locality needs Created Data ingestion methods using SAP BODS to load data from SAP CRM and SAP ECC into HDFS Used Ozzie workflows to query the data to extract tables into MongoDB. Mongo view was used as the operational data store to view the data and generate reports. Developed a Hortonworks Cluster on Confidential Azure to extract actionable insights for data collected from IOT sensors installed in excavators. Loaded machine data collected via installed sensors and GPS equipment from an MSSQL data dump into the HDP cluster using sqoop. Conducted multiple workshops with the business to understand the data and determine which insights would bring the most immediate value. Processed the data in HDP using Hive and provided simple analytics using Highcharts on a Meteor UI platform. Used General Linear Model using MS Big ML and provided insights into risk of failure of these excavators. The analysis was conducted by collating sensor and maintenance data from multiple sources like engine oil temperature, pressure, maintenance history, battery level and fuel gradient. Project is inflight and is work is progress to enable multiple insights for the Owner, dealer and manufacturer.Environment: HortonWorks Data Platform (HDP), Hadoop (Hive, HDFS, MapReduce, YARN), Confidential Azure Cloud, SAP BODS, Apache Oozie, MongoDB, MSSQL, Sqoop, BigML. |