| 20,000+ Fresh Resumes Monthly | |
|
|
| Related Resumes Software Engineer Senior Portland, OR Data Engineer Software Beaverton, OR Software Engineer Senior Portland, OR Data Engineer and DBA Portland, OR Data Analysis Ml Engineer Portland, OR Software Engineer Data Hillsboro, OR Engineer Process, operations, equipment maintenance, metrology Portland, OR |
| Click here or scroll down to respond to this candidateCandidate's Name
EMAIL AVAILABLE PHONE NUMBER AVAILABLESr. Data EngineerPROFESSIONAL SUMMARY With over 6 years of extensive IT expertise, proficient in analyzing, developing, and implementing Data Warehousing Systems/System Integration solutions. Worked in all the phases of Software Development Life Cycle (SDLC) from requirement gathering, development, Unit testing, UAT and Production Deployment. Professional experience with emphasis on Azure Cloud Services like Azure Data Lake, Azure SQL Database, Azure Data Factory, Azure Storage, Azure Synapse, Azure Event Hub, Azure Logic Apps and Azure Databricks and migrating on-premises data lake to Azure Data Lake. Experience in creating Spark applications on Databricks using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns. Extensive experience with Datastage, including migrating data from Datastage to modern data platforms like Databricks and Azure Data Factory. Proficient in implementing Continuous Integration/Continuous Deployment (CI/CD) processes using GitHub and other tools, ensuring streamlined and automated deployments. Extensively worked on Job Sequences to control the execution of the job flow using various Activities and Triggers (Conditional and Unconditional) like Job Activity, Email Notification, Sequencer, Routine activity, and Exec Command Activities. Experienced in Unix Shell scripting as part of triggering the Datastage jobs, automated scripts, file manipulation; count matching, Scheduling and text processing. Working Experience on designing and implementing complete end-to-end Hadoop infrastructure using MapReduce, Hive, PIG, Sqoop, Oozie, Flume, Spark, HBase and Zookeeper. Extensively worked on Spark using Python and Scala on cluster for computational (analytics), installed it on top of Hadoop Expert in PySpark, Python, PlSql and design technique as well as experience working across large environments with multiple operating systems Hands-on experience in Automation of Sqoop incremental imports by using Sqoop and automating jobs using Oozie. Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, Data build tool, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm. Extensive experience in using SQL and PL/SQL to write Stored Procedures, Functions, Packages, snapshots, Triggers, and optimization with Oracle, DB2 and MySQL databases Knowledge in building and architecting numerous data pipelines, including end-to-end ETL and ELT processes for data import and transformation in Azure.TECHNICAL SKILLSLanguages: Python, R, C/C++, HTML/CSS, CSS3Frameworks: Apache Spark, Apache Hive, TensorFlow, PyTorch, Scikit-learn, Flutter Flow, MySQL, MongoDB, Firebase Tools: Azure Cloud Services (Azure SQL DB, Keys, Blob Storage, ADF, SQL Server, ADLS, Storage account, Azure Databricks Synapse Analytic), AWS (Sagemaker, EC2, S3, Lambda, CloudFormation), Hortonworks Data Platform, GCP, Tidal, Talend, SSIS, SSAS, SSRS, Jira, Confluence, Db Visualizer, WinSCP, Putty, DBeaver, Talend, MS Storage Explorer, MS SharePoint Designer, Postman, Heroku, No-SQL, Docker, Git/GitHub, Kubernetes, FastAPI, Flask, Airflow, Terraform, Jenkins, GitHub Actions, Unix Shell, Microsoft OfficeSkills: Backend development, Agile, SCRUM, Open source, Unit testing, Statistics, Risk Management and Analysis, Data Science, Software Development Life Cycle, Data Enterprise, Machine Learning, Deep Learning, Predictive Analytics, Data Collection and Preprocessing, Exploratory Data Analysis, Model Optimization, CI/CD, Containerization, ML Infrastructure, API Development.WORK EXPERIENCESenior Software Developer Wipro Technologies Ltd Bangalore KA July 2019 August 2022 Assessed and translated business requirements into Data Warehouse solutions; revamped data models and developed data pipelines in Azure Data Factory integrated with Snowflake for efficient data management as per specific business needs sourcing data from 15+ systems. Created and developed data pipelines in Azure Data Factory (ADF) utilizing Linked Services, Datasets, and Pipelines to efficiently Extract, Transform, and Load (ETL) data from diverse sources including Azure SQL, Blob storage, and Azure Synapse Data Warehouse. Implemented a data quality framework using Azure SQL, Apache Spark, and computational methods leading to 25% fewer production defects and consolidated KPI reporting. Directed financial analysis and reporting initiatives leveraging data-driven insights with the use of Apache Hive to optimize pricing strategies and increase profit margins by 12% within the first year of the project. Created mount/principals/key vaults on ADF/Databricks to make PySpark scripts connect to Key Vaults. Creating Spark clusters and configuring the clusters using Azure Databricks to speed up the preparation of high-quality data. Used Spark SQL on Databricks to merge the incremental data to Delta Tables. Involved in creating Hive tables, loading with data, and writing Hive queries. Configured Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Harness along with Shell scripts to automate routine jobs. Deployed and optimized machine learning pipelines using Spark and Databricks with CI/CD tools like Jenkins and GitHub Actions. Utilized Docker and Kubernetes for containerization and deployment of machine learning models. Managed and orchestrated workflows using Airflow on Kubernetes (EKS) and Databricks. Managed environments DEV, SIT, QA, UAT for various releases and designed instance strategies. Led a cross-functional team to identify and implement data integration strategies resulting in consolidated KPI reports that provided actionable insights to stakeholders and improved business performance with a 40% reduction in manual effort. Managed a cross-functional team of 10 in 3 locations (Bangalore, Pune, and Beijing) ranging from entry-level to senior analysts and closely collaborated with business development, data analysis, operations, and marketing teams. Provided critical production support and executed enhancements for ongoing projects. Contributed to seamless integration, conducted thorough end-user testing, and adeptly operated within an Agile Development framework.Technologies: Azure, SQL, Apache Spark, Hortonworks, Hive, Data modeling, Stored procedures, Azure Data Pipelines, DevOps, Azure Cloud Admin, Shell Scripting, Linux, Hive, Azure Data Factory, Databricks, Spark Clusters, Agile Development, Linked Services, Blob Storage, Azure Synapse Data Warehouse, Snowflake Software Developer Wipro Technologies Ltd Bangalore KA June 2018 July 2019 Created robust PySpark scripts to process streaming data ingested from data lakes using Spark Streaming, handling over 10TB of data monthly. Designed and implemented data processing pipelines with PySpark for reading, merging, enriching, and loading data into data warehouses. Developed PySpark applications in Azure Databricks for data extraction, transformation, and aggregation from multiple file formats, leading to a 30% increase in data processing efficiency. Developed and managed data processing workflows within the Azure ecosystem, leveraging Azure Data Lake, Azure SQL, and Azure Synapse to support large-scale data operations. Extensive experience working with SQL Server for managing and querying large datasets, optimizing SQL queries for performance, and integrating SQL Server with cloud-based data solutions. Spearheaded the migration of complex ETL processes from Datastage to Azure Data Factory and Databricks, ensuring data integrity and performance optimization during the transition. Built robust pipelines in Azure Data Factory to seamlessly transfer data from on-premises systems to Azure SQL Data Warehouse, enabling faster data availability. Developed an automated process using Azure Functions to ingest daily data from web services into Azure SQL DB, resulting in a 50% reduction in manual data handling. Implemented optimizations in Spark for effective and efficient processing in Azure Databricks. Utilized managed clusters in Databricks, cutting cloud computing costs through efficient resource management. Utilized managed clusters in Databricks for efficient resource management. Consumed real-time events from Kafka streams and persisted them using Parquet to HDFS, facilitating near real-time data analytics with a latency reduction of 35%. Extended PySpark Data Frames capabilities with UDFs to perform custom transformations, enhancing data processing flexibility. Collected and aggregated large volumes of log data using Kafka, staging it in HDFS for further analysis, improving log processing efficiency. Created interactive reports and dashboards with Power BI, incorporating customized parameters to produce actionable insights.Technologies: Python, SQL, Apache Spark, PySpark, Spark Streaming, Spark SQL, Hadoop Distributed File System (HDFS), Azure Databricks, Azure Data Factory, Apache Kafka, Hive, Azure Data Warehouse, Azure SQL DB, Azure Functions, Parquet, Power BI, GitETL Developer Intern Welfare Infotech Pvt Ltd Raipur CG Mar 2016 May 2018 Created and implemented a comprehensive terabyte-scale Data Warehouse infrastructure from inception to completion on Redshift, facilitating seamless management of millions of records across the entire data lifecycle. Data Integration and Data Warehousing experience with multiple ETL tools including Informatica PowerCenter, SQL Server Integration Services (SSIS), and Talend. Extensive experience with ETL tools including SSIS, Talend, and Informatica. Developed Oozie workflows tailored to business specifications, optimizing data extraction processes via Sqoop. Proficient in Hive SQL, Presto SQL, and Spark SQL for executing ETL tasks with precision, skillfully applying the appropriate technology to efficiently accomplish project objectives. Experienced in loading and transforming extensive sets of structured and semi-structured data utilizing Talend ingestion tools. Developed spark applications in Python (PySpark) on a distributed environment to load huge numbers of CSV files with different schemas into Hive ORC tables. Developed PL/SQL procedures/packages to kick off the SQL Loader control files/procedures to load the data into Oracle. Installed and configured Flume, Hive, Pig, Sqoop, HBase on the Hadoop cluster. Proficient in optimizing Apache MapReduce (MR) job configurations to enhance parallelism and meet specific project requirements. Worked on application development, particularly in the LINUX environment, and was familiar with all of its commands. Leveraged FastAPI and Flask for API development to support machine learning models. Actively participated in weekly meetings with the technical teams to review the code. Implemented data preprocessing pipelines for machine learning tasks using TensorFlow and PyTorch. Conducted regular consultations with internal teams to provide guidance on best practices in ML and DevOps. Technologies: Redshift, Oozie, Sqoop, Hive, SQL, Spark, Presto, Flume, Pig, HBase, PLSQL, Oracle, PySpark, Linux, Unix, Informatica, SSIS, TalendEDUCATION Masters in Computer Science Portland State University, OR, US GPA 3.82 Bachelor of Technology in Computer Science and Engineering National Institute of Technology Raipur, India |