| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Senior Data EngineerContact: PHONE NUMBER AVAILABLE Email: EMAIL AVAILABLE LinkedIn: LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARY:9+ years IT experience in Data Engineering, Data Pipeline Design, Development, and Implementation across various industries including Hospitality& Entertainment, Healthcare, Retail, Financial, etc.Experience in designing and building Data Management Lifecycle covering Data Ingestion, Data integration, Data consumption, Data delivery, and integration Reporting, Analytics, and System-System integration.Proficient in Big Data environment and Hands-on experience in utilizing Hadoop environment components for large-scale data processing including structured and semi-structured data.Strong experience with all phases including Requirement Analysis, Design, Coding, Testing, Support and Documentation using Apache Spark & Scala, Python, HDFS, YARN, Sqoop, Hive, Map Reduce, KAFKA.Extensive experience with Azure cloud technologies like Azure Data Lake Storage, Azure Data Factory, Azure SQL, Azure Data Warehouse, Azure Synapse Analytical, Azure Analytical Services, Azure HDInsight, and Data Bricks.Hands on experience in GCP BigQuery, GCS Bucket, G- cloud function, Cloud data flow, pub/sub cloud shell, GSUTIL, BQ Command line utilities, dataproc.Solid Knowledge of AWS services like AWS EMR, Redshift, S3, EC2, Lambda, Glue, and concepts, configuring the servers for auto-scaling and elastic load balancing.Experience with monitoring the web services using Hadoop and Spark for controlling the applications and analyzing their operation and performance.Development, Implementation, Deployment, and Maintenance using Bigdata technologies in designing and implementing complete end-to-end Hadoop-based data analytics solutions using HDFS, MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, HBase.Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Build multiple Data Lakes.Experienced in Python data manipulation for loading and extraction as well as with Python libraries such as NumPy, Pandas, matplotlib, seaborn, sklearn and SciPy for data analysis and numerical computations.Good knowledge and experience with NoSQL databases like HBase, Cassandra, and MongoDB and SQL databases like Teradata, Oracle, PostgreSQL, and SQL Server.Experience in using and tuning relational databases (e.g., Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g., Amazon Redshift, Microsoft SQL Data Warehouse)Experience in the development and design of various scalable systems using Hadoop technologies in various environments and analyzing data using MapReduce, Hive, and PIG.Hands-on use of Spark and Scala to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.Strong knowledge in working with ETL methods for data extraction, transformation, and loading in corporate- wide ETL Solutions and Data Warehouse tools for reporting and data analysis.Hands-on experience in designing and implementing data engineering pipelines and analyzing data using Hadoop ecosystem tools like HDFS, Spark, Sqoop, Hive, Flume, Kafka, Impala, PySpark, Oozie, and HBase.Experience with different ETL tool environments like SSIS, Informatica, and reporting tool environments like SQL Server Reporting Services, Power BI and Business Objects.Experience in deployment of applications and scripting using the Unix/Linux Shell scripting.Solid knowledge of Data Marts, Operational Data Store, OLAP, Dimensional Data Modeling with Star Schema Modeling, Snowflake Modeling for Dimensions Tables using Analysis Services.Extensive experience with various databases like Teradata, MongoDB, Cassandra DB, MySQL, Oracle, and SQL Server.Experience in Creating Teradata SQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.Proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures, functions.Knowledge and experience with CI/CD using containerization technologies like Docker and Jenkins.EDUCATIONAL DETAILS:Masters in computer science from California State University, Fullerton, California,2021.Bachelors in information technology from Galgotias College of Engineering,2015.TECHNICAL SKILLS:Big Data TechnologiesHadoop, MapReduce, Spark, HDFS, Sqoop, YARN, Oozie, Hive, Impala, Zookeeper,Apache Flume, Apache Airflow, Cloudera, HBaseProgramming LanguagesPython, Java, PL/SQL, SQL, Scala, PowerShell, C, C++, T-SQLCloud ServicesAzure Data Lake Storage Gen 2, Azure Data Factory, Blob storage, Data Analytics,Azure Databricks, Azure SQL DB, Data Bricks, Azure Event Hubs, AWS RDS, SaaS,Amazon Redshift, Amazon SQS, Amazon S3, AWS EMR, AWS S3, Redshift, Glue,Lambda, AWS SNS, BigQuery, GCS Bucket, G- cloud function, Data flow, pub/sub cloud shellDatabasesMySQL, SQL Server, Oracle, MS Access, Teradata, and SnowflakeNoSQL Data BasesMongoDB, DynamoDB, Cassandra DB, HBaseMonitoring toolApache AirflowVisualization & ETL toolsTableau, Power BI, Informatica, Talend, SSIS, and SSRSVersion Control & Containerization toolsGitHub, Bitbucket, Docker, KubernetesOperating SystemsUnix, Linux, Windows, Mac OSEducation:Masters in computer science from California State University, Fullerton, California.Bachelors in information technology from Galgotias College of Engineering.Work Experience:Client: Ameriprise Financial, Minneapolis, MN Mar 2023 to Present Azure Data EngineerRoles & Responsibilities:Implemented end-to-end data pipelines using Azure Data Factory to extract, transform, and load (ETL) data from diverse sources into Snowflake.Worked with business/user groups for gathering the requirements and working on the creation and development of pipelines.Migrated applications from Cassandra DB to Azure Data Lake Storage Gen 2 using Azure Data Factory, created tables, and loading and analyzed data in the Azure cloud.Expertise with AWS databases such as RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)Maintained SAAS EnvironmentsDeveloped the PySpark code for AWS Glue jobs and for EMR.Hands on experience on SPARK, SCALA, HBASE and KAFKA.Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc.Worked on developing the process and ingested the data in Azure cloud from web service and loaded it to Azure SQL DB.Worked with Spark applications in Python for developing the distributed environment to load high volume files using PySpark with different schema into PySpark Data frames and process them to reload into Azure SQL DB tables.Designed and developed the pipelines using Data Bricks and automated the pipelines for the ETL processes and further maintenance of the workloads in the process.Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.Worked on creating ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate, and reformat into parquet files and load them into the Azure Blob storage container.Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI.Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.Implemented invoking a pipeline of one Azure Data Factory (ADF) from another Azure Data Factory using Azure logic apps and ADF web activity post method.Developed AWS Athena extensively to ingest structured data from S3 into various systems such as RedShift or to generate reports.Worked with ETL operations in Azure Data Bricks by connecting to different relational databases using Kafka and used Informatica for creating, executing, and monitoring sessions and workflows.Worked on automating data ingestion into the Lakehouse and transformed the data, used Apache Spark for leveraging the data, and stored the data in Delta Lake.Extract Transform and Load data from sources systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, Azure Data Lake analytics.Ensured data quality and integrity of the data using Azure SQL Database and automated ETL deployment and operationalization.Used Data Bricks, Scala, and Spark for creating the data workflows and capturing the data from Delta tables in Delta Lakes.Performed Streaming of pipelines using Azure Event Hubs and Stream Analytics to analyze the data from the data-driven workflows.Worked with Delta Lakes for consistent unification of Streaming, processed the data, and worked on ACID transactions using Apache SparkPlanned and created CRQ's for deployment of new applications as needed. Monitor and manage hosted applications on SAAS.Created S3 buckets and managed S3 bucket policies, as well as using S3 buckets and Glacier for storage and backup on AWS.Worked with PowerShell scripting for maintaining and configuring the data. Automated and validated the data using Apache Airflow.Worked on optimization of Hive queries using best practices and right parameters and using Hadoop, YARN, Python, and PySpark.Created Azure logic apps to trigger when a new email received with an attachment and load the file to blog storage.Used Sqoop to extract the data from Teradata into HDFS and export the patterns analyzed back to Teradata.Expert in building containerized apps using tools like Docker, Kubernetes and Terraform.Designed and Developed ETL jobs in AWS GLUE to extract data from S3 objects and load it in data mart in Redshift.Used Accumulators and Broadcast variables to tune the Spark applications and to monitor the created analytics and jobs.Strong Knowledge of architecture and components of Tealeaf, and efficient in working with Spark Core, Spark SQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka.Tracked Hadoop cluster job performance and capacity planning and tuning Hadoop performance for high availability and Hadoop cluster recovery.Worked with Tableau for generating reports and created Tableau dashboards, pie charts, and heat maps according to the business requirements.Experience in azure cloud computing mostly in testing and building web services like SaaS, PaaS, and IaaS.Worked with all phases of Software Development Life Cycle and used Agile methodology for development.Environment: Python, PySpark, Matillion, Azure HDInsight, Bryte Flow, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, Spark SQL, Snowflake, Cassandra, Blob Storage, Data Factory, Shell/Bash, Power BIClient: Sherwin-Williams, Clevland, OH May 2021 to Feb 2023Senior Data EngineerRoles & Responsibilities:Worked with business/user groups for gathering the requirements and working on the creation and development of pipelines.Migrated applications from Cassandra DB to Azure Data Lake Storage Gen 2 using Azure Data Factory, created tables, and loading and analyzed data in the Azure cloud.Expertise with AWS databases such as RDS (Aurora), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis)Maintained SAAS EnvironmentsDeveloped the PySpark code for AWS Glue jobs and for EMR.Hands on experience on SPARK, SCALA, HBASE and KAFKA.Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc.Worked on developing the process and ingested the data in Azure cloud from web service and loaded it to Azure SQL DB.Worked with Spark applications in Python for developing the distributed environment to load high volume files using PySpark with different schema into PySpark Data frames and process them to reload into Azure SQL DB tables.Designed and developed the pipelines using Data Bricks and automated the pipelines for the ETL processes and further maintenance of the workloads in the process.Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.Worked on creating ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Construct data transformation by writing PySpark in Databricks to rename, drop, clean, validate, and reformat into parquet files and load them into the Azure Blob storage container.Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI.Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.Developed AWS Athena extensively to ingest structured data from S3 into various systems such as RedShift or to generate reports.Worked with ETL operations in Azure Data Bricks by connecting to different relational databases using Kafka and used Informatica for creating, executing, and monitoring sessions and workflows.Worked on automating data ingestion into the Lakehouse and transformed the data, used Apache Spark for leveraging the data, and stored the data in Delta Lake.Extract Transform and Load data from sources systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, Azure Data Lake analytics.Ensured data quality and integrity of the data using Azure SQL Database and automated ETL deployment and operationalization.Used Data Bricks, Scala, and Spark for creating the data workflows and capturing the data from Delta tables in Delta Lakes.Performed Streaming of pipelines using Azure Event Hubs and Stream Analytics to analyze the data from the data-driven workflows.Worked with Delta Lakes for consistent unification of Streaming, processed the data, and worked on ACID transactions using Apache SparkPlanned and created CRQ's for deployment of new applications as needed. Monitor and manage hosted applications on SAAS.Created S3 buckets and managed S3 bucket policies, as well as using S3 buckets and Glacier for storage and backup on AWS.Worked with PowerShell scripting for maintaining and configuring the data. Automated and validated the data using Apache Airflow.Worked on optimization of Hive queries using best practices and right parameters and using Hadoop, YARN, Python, and PySpark.Used Sqoop to extract the data from Teradata into HDFS and export the patterns analyzed back to Teradata.Expert in building containerized apps using tools like Docker, Kubernetes and Terraform.Designed and Developed ETL jobs in AWS GLUE to extract data from S3 objects and load it in data mart in Redshift.Used Accumulators and Broadcast variables to tune the Spark applications and to monitor the created analytics and jobs.Strong Knowledge of architecture and components of Tealeaf, and efficient in working with Spark Core, Spark SQL. Designed and developed RDD Seeds using Scala and Cascading. Streaming data to Spark streaming using Kafka.Tracked Hadoop cluster job performance and capacity planning and tuning Hadoop performance for high availability and Hadoop cluster recovery.Worked with Tableau for generating reports and created Tableau dashboards, pie charts, and heat maps according to the business requirements.Experience in azure cloud computing mostly in testing and building web services like SaaS, PaaS, and IaaS.Worked with all phases of Software Development Life Cycle and used Agile methodology for development.Environment: Python, SQL, Cassandra DB, Azure Data Lake Storage Gen 2, Power BI, SaaS, Azure Data Factory, Azure SQL DB, Spark, Data Bricks, SSIS, SQL Server, Kafka, Informatica, Apache Spark, Delta Lake, Azure Event Hubs, AWS GLUE, Stream Analytics, Terraform, Azure Blob Storage, PowerShell, Apache Airflow, Hadoop, YARN, PySpark, Hive, Teradata, Sqoop, HDFS, Spark, Agile.Client: Sol-Ark, Allen, TX Nov 2019 to Apr 2021Big Data EngineerRoles & Responsibilities:Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, design & development for the project.Created the infrastructure needed for optimal data extraction, transformation, and loading from a wide range of data sources.Created optimal data pipeline architecture. In the Hadoop environment with Linux for big data resources, developed Spark/Scala, Python for regular expression (regex) project.Hands-on experience in Azure Cloud Services (PaaS & SaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.Designed end-end ETL pipeline using Glue, Lambda using PySpark and Sql queries using Snowflake compute power.Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming and SQL queries.Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.Architect & implement medium to large-scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets.Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in AWS and coordinate task among the team.Created and written aggregation logic on Snowflake Data warehouse tables.Recreated and maintained existing Access Database artifacts in Snowflake.Consumed Kafka messages and curated using Python send the data into multiple targets Redshift, Athena and S3 buckets.Used AWS Quick sight for visualization.Used Python libraries like NumPy, Pandas, Scify, matplotlib, seaborn and sklearn.Used Data science models Linear regression, logistic Regression, Kneighbors Classifier, Random Forest Classifier, Dummy Classifier, ARIMA, SARIMA to predict and used the data insights to make business decisions.Worked with Data Science, Marketing and Sales team to develop the Data Pipelines as per there need.Proficient in SQLite, MySQL, and SQL databases with Python.Expertise in developing iterative algorithms using Spark Streaming in Scala and Python to build near-real-time dashboards.Designed, built, and maintained data integration programs in a Hadoop and RDBMS environment, worked with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data stores for data access and analysis.Used the Spark API to analyze Hive data in conjunction with the EMR Cluster Hadoop Yarn.AWS Cloud Formation templates were designed to create VPCs, subnets and NAT to ensure the successful deployment of Web applications and database templates.Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward.Migrated Hive and MapReduce jobs from on-premises MapR to AWS cloud using EMR and Quble.Worked with Spark applications in Python for developing the distributed environment to load high volume files using PySpark with different schema into PySpark Data frames and process them to reload into SQL DB tables.Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hash tags.Loaded the tables from the azure data lake to azure blob storage for pushing them to snowflake.Worked on creating ETL packages using SSIS to extract data from various data sources like Access database, Excel spreadsheet, and flat files, and maintain the data using SQL Server.Worked on automating data ingestion into the Lakehouse and transformed the data, used Apache Spark for leveraging the data, and stored the data in Delta Lake.Ensured data quality and integrity of the data using SQL Database and automated ETL deployment and operationalization.Analyze Logs for debugging & performance tuning.Writing Maven build script, Jenkins & Chef scripts.Environment: Python, SQL, Hadoop, Pig Scripts, HDFS, AWS S3, Lambda, SaaS, Azure SQL, Azure Data Lake, Dynamo DB, Snowflake, Redshift, Athena, Kafka, Quick sight, EMR, RDS, Elastic Cache, Jenkins.Cyient, India, Hyderabad Oct 2017 to Mar 2019Big Data EngineerRoles & Responsibilities:Experience in building and architecting multiple data pipelines end to end ETL and ELT for data ingestion and transformation in GCP and coordinate task among them.Implemented and Managed ET solutions and automating operational processes.Design and develop ET integration patterns using Python on Spark.Develop framework for converting existing PowerCenter mappings and to PySpark (Python and Spark) Jobs.Build data pipelines in airflow in GCP for ET related jobs using different airflow operators.Used Stitch ETL tools to integrate data into the central data warehouse.Experience in GCP Dataproc, GCS, Cloud functions, Data prep, Data Studio and Big Query.Experience in using G-cloud function with python to load data into BigQuery for on arrival CSV files to GCS Bucket.Used Spark SQL for Scala& amp, a Python interface that automatically converts RDD case classes to schema RDD.Maintained ELK (Elastic Search, Logstash, Kibana) and Wrote Spark scripts using Scala shell.Experience in loading bound and unbound data from Google subtopic to Big Query using cloud data flow with python.Used Rest API with python to ingest data from and other sites to BigQuery.Implemented Spark RDD transformations to map business analysis and apply actions on top of Transformations.Design star schema in Big Query.Worked on creating various types of indexes on different collections to get good performance in Mongo database.Monitoring Big query, Dataproc and cloud Data flow jobs via Stack driver for all the environments.Used Agile for the continuous model deployment.Worked with Google data catalog and other google cloud APIs for monitoring, query and billing related analysis for big query usage.Knowledge about cloud data flow and Apache beam.Used Snowflake for Data Storage, processing which is easier and faster to use.Write a Python program to maintain raw file archival in GCS bucket.Worked with google data catalog and other google cloud Apls for monitoring, query, and billing related analysis for Big Query usage.Used Airflow to manage task scheduling, progress, and success status using DAG graphs.Created Big Query authorized views for row level security or exposing the data to other teams.Integrated services like GitHub and Jenkins to create a deployment pipeline.Implemented new projects build framework using Jenkins as build framework tools.Environment: T-SQL, PL/SQL, Google Cloud, Python, Big query, Dataflow, Dataproc, Dataprep, Data Studio, Bigtable Stitch ETL, PySpark, Snowflake, MySQL, Airflow, Shell Scripts, Mongo DB, GIT, Apache, Spark, DockerOpenText, Bangalore, India Feb 2015 - Sep 2017Hadoop DeveloperRoles & Responsibilities:Installed, configured, and maintained Apache Hadoop clusters for the development of applications in accordance with the specifications.Create ETL data pipelines by combining technologies such as Hive, Spark SQL and PySpark.Created Spark programs using Scala and Batch processing using functional programming techniques.Added data to Power BI from a range of sources, including SQL, Excel, Oracle.Writing Spark Core Programs to process and clean data before loading it into Hive or HBase to be processed further.Utilization of tools for data transformation such as Data Stage, SSIS, Informatica, or DTS.Proficient in using UML for Use Cases, Activity Diagrams, Sequence Diagrams, Data Flow Diagrams, Collaboration Diagrams and Class.In charge of building ETL pipelines with Pig and Hive to extract data from various data sources and import it into the Hadoop Data Lake.Worked with several data types, including JSON and XML, and ran Python machine learning algorithms.Created reusable items, such as PL/SQL program units and libraries, database functions and procedures, and database triggers that the team could utilize to meet business rules.Used SQL Server Integrations Services (SSIS) to extract, manipulate, and load data from a variety of sources into the target system.Created data mapping, transformation, and cleaning rules for OLTP and OLAP data management.Used Tableau for the data visualization during the quick model construction process in Python. These models are then put into practice in SAS, where they are connected to MSSQL databases and have timely update schedules.Created numerous data frames and datasets using the Spark-SQL context to pre-process the model data.Worked on designing the HBase row key to store Text and JSON as key values in the database and to get/scan it in sorted order.In charge of ETL design (identifying the source systems, designing source to target relationships, data cleansing, data quality, creating source specifications, and ETL design documents).Strong Knowledge/experience in creating Jenkins CI pipelines.Implemented querying using Airflow, presto as well as reporting in PySpark, Zeppelin and Jupyter.Installed and set up Airflow for managing workflows and built workflows in Python.Environment: Hive, Spark SQL, PySpark, Oracle, Hive, HBase, Data Stage, Power BI, SSIS, Informatica, Pig, Jenkins, Airflow, Presto, Zeppelin, Jupyter. |