| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Cloud Data EngineerTexas, USA PHONE NUMBER AVAILABLEMail :- EMAIL AVAILABLESUMMARYData Engineer with 7+ years of experience in building and optimizing scalable data pipelines, warehouses, and dashboards across cloud platforms.Proficient in utilizing Azure services such as Azure Data Factory, Azure Databricks, Azure SQL Database, and Azure Synapse Analytics for comprehensive data engineering solutions.Skilled in AWS technologies including Amazon S3, Amazon Redshift, AWS Glue, and Amazon EMR for building scalable and efficient data pipelines.Extensive experience in Python programming for data manipulation, analysis, and machine learning model development.Proficient in big data technologies such as Apache Spark for distributed data processing and analytics.Proficient in utilizing Snowflake for cloud-based data warehousing, enabling scalable and flexible data storage and analysis.Hands-on experience in data warehousing concepts, ETL processes, and SQL across various databases including MySQL, PostgreSQL, and SQL Server.Expertise in using tools like SSIS (SQL Server Integration Services) and SSRS (SQL Server Reporting Services) for ETL and reporting tasks.Skilled in collaborative data science projects using Databricks notebooks and collaborative features.Experienced in designing and orchestrating data pipelines using Apache Airflow for workflow management.Proficient in data visualization techniques using Python libraries like Matplotlib and Seaborn, and interactive tools like Power BI and Tableau.Practiced Agile methodologies for efficient project management and delivery.Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself.Implemented Azure Data Lake Storage for efficient storage and management of big data.Designed and implemented data governance policies to ensure data quality and compliance.Automated data ingestion processes using Azure Data Factory, reducing manual effort and errors.Leveraged AWS Lambda for serverless computing tasks, enhancing scalability and cost-effectiveness.Integrated machine learning models into data pipelines for predictive analytics using Azure Machine Learning.Managed data pipelines for real-time analytics using Azure Stream Analytics.Conducted performance tuning and optimization of data processes to improve efficiency and reduce costs.Implemented CI/CD pipelines for automated deployment and testing of data solutions.Provided technical leadership and mentorship to junior team members, fostering their professional growth and development.Experience in using different file formats like Text files, CSV, Parquet, and JSON.Experience working with NoSQL database technologies, including MongoDB, Cassandra, and HBase.Experienced in version control systems like Git and Bitbucket for efficient code management and collaboration in a team environment.Skilled in data modeling techniques to design robust and efficient databases for various applications.Proficient in data analysis methodologies and tools to derive actionable insights and drive informed decision-making.Experienced in leveraging Azure Functions for serverless computing tasks, enabling efficient and cost-effective application development.Implemented Snowflake's secure data-sharing feature to collaborate with external partners and securely exchange data without the need for complex data pipelines or transfers.Excellent Communication and presentation skills along with good experience in communicating and working with various stakeholders.Team Player as well as able to work independently with minimum supervision, innovative & efficient, good in debugging, and strong desire to keep pace with the latest technologies.Technical SKILLSCloud TechnologiesMicrosoft Azure, Amazon Web Services (AWS)Programming LanguagesPython, Spark, PL/SQL, SQL, Scala, Java, T-SQL, PowerShell Scripting, JavaScript, PigDatabasesSnowflake, AWS RDS, Oracle, MySQL, Microsoft SQL, PostgreSQL.NoSQL DatabasesMongoDB, Hadoop HBase, and Apache Cassandra.Querying LanguagesSQL, NO SQL, PostgreSQL, MySQL, Microsoft SQLWorkflow mgmt. toolsApache AirflowScalable Data ToolsDatabricks, Hadoop, Hive, Apache Spark, Pig, Map Reduce, Sqoop.Operating SystemsRed Hat Linux, Unix, Windows, macOS.Reporting & VisualizationPower BI, Tableau, Matplotlib.Data FormatsCSV, JSON, XMLVersion Control SystemsGit, SVNIDEsEclipse, Jupyter notebook, Spyder, PyCharm, IntelliJWORK EXPERIENCEWalmart Remote January 2024 PresentData EngineerDesigned and implemented an enterprise data integration platform using Azure Data Factory to streamline ETL processes.Developed complex data pipelines integrating data from Oracle DB and SQL Server into Azure Synapse Analytics, enhancing data accessibility.Architected and optimized Azure Synapse Data Warehouse to support high-performance queries and analytics.Utilized Azure Databricks and Apache Spark for large-scale data processing, ensuring high performance and scalability.Led the design and implementation of data integration solutions using Azure Data Factory (ADF), improving data flow and transformation across multiple sources.Created and optimized ETL pipelines using PySpark in Azure Databricks notebooks, increasing processing efficiency and scalability.Utilized Spark and Python libraries for advanced data manipulation, supporting complex data analytics projects.Implemented scalable and high-performance data models in Cosmos DB for various applications, optimizing read and write operations.Managed deployment and version control using Azure DevOps and Databricks Repos, ensuring consistent and reliable application delivery.Implemented CI/CD pipelines, reduced deployment times, and improved system reliability.Handled project management tasks and scheduling using JIRA, enhancing collaboration and meeting project milestones.Developed Databricks Pyspark notebooks to ingest semi-structured data (XML, JSON, pipe delimited, YAML) into Databricks Delta Lake.Used Azure Data Explorer for real-time analytics on large data sets, improving operational decision-making.Managed the migration of legacy SSIS packages to Azure, minimizing downtime and ensuring continuity of data integration.Managed IAM policies & Azure service principals to secure access and automate authentication processes across Azure services.Enhanced reporting capabilities with SQL Server Reporting Services (SSRS) and implemented multidimensional data models using SQL Server Analysis Services (SSAS).Utilized ADF's integration runtime to connect and migrate data from on-premises SQL Server to Azure, maintaining data integrity and transition accuracy.Implemented parameterized pipelines in ADF for dynamic data loading, replacing static SSIS packages and enhancing operational flexibility.Maintained silver, bronze, and gold data layers for efficient data curation and implemented regular data updates to keep systems current and accurate.Integrated Azure Synapse Analytics with Power BI to develop comprehensive dashboards and reports, enabling real-time business intelligence and actionable insights.Utilized Power BI to create dynamic visualizations and data models, significantly improving data presentation and stakeholder understanding of complex datasets.Enhanced data exploration and ad-hoc analysis capabilities by connecting Power BI directly to Azure Synapse Analytics, facilitating seamless data queries and visual analytics.Environment: Azure Data Factory, Azure Synapse Analytics, Oracle DB, SQL Server, Azure Databricks, Apache Spark, PySpark, Cosmos DB, Azure DevOps, Databricks Repos, JIRA, Databricks Delta Lake, Azure Data Explorer, SSIS, Azure service principals, SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), Power BI, Python, ADF's integration runtime.CSG Systems International Bengaluru, Karnataka, India August 2021 December 2022Information Specialist IIIncreased data processing efficiency by 66% through Pipelines using Databricks and Azure Data Factory with Apache Spark and Python, integrating diverse sources into the Snowflake data warehouse.Enhanced data transformation and analysis efficiency by 30% with complex SQL queries across Microsoft SQL Server, PostgreSQL, and Azure SQL DB.Implemented Azure Data Lake Storage Gen2 and Blob Storage for unified data storage, optimizing scalability and reducing costs by 30% through efficient data lifecycle management.Implemented data workflows on Azure Synapse Analytics for large-scale data processing and warehousing.Designed dashboards with Power BI, Tableau, and SSRS, achieving a 25% enhancement in data visualization and analytics.Authored PySpark transformations, implementing data aggregation, optimization, and enrichment methods for a 50% efficiency boost.Utilized Cosmos DB's multi-model capabilities, including SQL, MongoDB API, and Gremlin API, to support diverse data access patterns.Developed Azure Data Factory pipelines to move files between on-prem file servers to the database and also moved files and Databricks delta table data to on-prem and Azure SQL DB.Streamlined ETL processes, reporting, and analysis using SSIS, SSRS, and SSAS, improving efficiency by 40%.Created DAX measures and calculated columns in Power BI, enhancing data model performance by 30%.Utilized Scalas features to write efficient, high-performance data processing applications.Collaborated effectively in Agile development, optimizing project delivery and development processes.Leveraged Python and Spark for data processing tasks, implementing complex transformations and analytics.Designed and optimized SQL queries for data analysis and reporting across PostgreSQL and SQL Server databases.Automated data integration processes using SSIS, reducing manual effort and improving data accuracy.Utilized Databricks notebooks for collaborative data analysis and model development, leveraging its version control and collaboration features.Orchestrated data workflows using Apache Airflow, scheduling and monitoring tasks for efficient pipeline execution.Developed scripts and applications in Python/Java to automate data workflows and processes.Created visualizations using Python libraries like Matplotlib and Seaborn to communicate insights effectively.Implemented Agile methodologies for project management, participating in sprint planning and retrospective meetings.Developed and implemented robust data models using relational and dimensional modeling techniques to meet complex business requirements.Designed and normalized database schemas ensuring scalability, performance, and data integrity across multiple business units.Implemented Snowflake's time-travel feature to easily recover and access historical data, providing valuable insights for trend analysis and compliance purposes.Collaborated with cross-functional teams to design and implement data solutions meeting business requirements.Implemented data monitoring and alerting systems to ensure data quality and integrity, proactively identifying issues and anomalies.Implemented Flask-based web applications for data visualization and reporting, providing intuitive user interfaces for stakeholders.Utilized Confluence for project documentation and knowledge sharing, ensuring alignment and transparency across teams.Implemented Azure Functions for serverless data processing tasks, improving efficiency and reducing operational overhead.Leveraged Power BI DAX for advanced data analysis and modeling, enabling deeper insights and actionable intelligence.Utilized Snowflake's multi-cluster warehouses to parallelize data processing and maximize throughput, enabling faster query execution and analytics.Implemented Parquet's columnar storage format in big data environments, facilitating faster analytical queries and data processing.Environment: Scala, Databricks, Azure Data Factory, Apache Spark, Python, Snowflake, SQL Server, PostgreSQL, Azure SQL Database, Azure Data Lake Storage Gen2, Blob Storage, Azure Synapse Analytics, Power BI, Tableau, SSRS, PySpark, Cosmos DB, SSIS, SSAS, DAX, Agile development, Apache Airflow, Java, Matplotlib, Seaborn, Flask, Confluence, Azure Functions, Parquet.Cognizant Technology Solutions Chennai, Tamil Nadu, India December 2018 July 2021AssociateDesigned and implemented a robust data ingestion and processing pipeline using Apache Spark and Apache Nifi on AWS EMR clusters, optimizing for performance and scalability.Utilized Python for data transformation and automation scripts, enhancing data processing efficiency and reliability.Designed and implemented complex data transformations and aggregations in Spark SQL and DataFrame APIs within Databricks notebooks.Configured and managed AWS Redshift for data warehousing, achieving optimized query performance and data storage solutions.Developed real-time data streaming solutions integrating AWS services with Apache Kafka, ensuring seamless data flow and immediate availability for analytics.Automated data pipeline workflows using Apache Airflow, significantly reducing manual oversight and improving operational efficiency.Engineered data storage solutions using AWS S3 and Cassandra, ensuring high availability and disaster recovery readiness.Developed and optimized data pipelines using Databricks and PySpark, leveraging distributed computing capabilities for efficient processing of large datasets.Utilized Amazon Athena to perform ad-hoc queries on large datasets stored in Amazon S3, enabling quick and cost-effective data analysis without the need for complex ETL processes.Implemented a comprehensive data lake architecture using AWS S3, facilitating centralized storage of structured and unstructured data.Leveraged Kubernetes to orchestrate containerized Python applications, enhancing scalability across AWS instances.Established a DevOps CI/CD pipeline using Jenkins and GitLab, significantly reducing deployment times and human errors.Enhanced streaming data processing using Apache Spark Structured Streaming, implementing stateful computations for time-sensitive data streams.Optimized batch data processing within Apache Spark, improving ETL jobs by managing resource allocation and job partitioning.Integrated Power BI with AWS data sources (e.g., Amazon Redshift, Amazon RDS, Amazon S3) to create interactive dashboards and reports for business intelligence.Developed custom ETL scripts using AWS Glue's built-in Python and Scala libraries, optimizing data transformation processes for performance and scalability.Developed and maintained scalable and efficient data pipelines to populate and update Delta tables, facilitating real-time analytics and data versioning for rollback and audit purposes.Environment: AWS (EMR, Redshift, S3, Glue, Athena), Apache Spark, Databricks, Apache Nifi, Apache Kafka, Apache Airflow, Python, Kubernetes, Jenkins, GitLab, Cassandra, Delta TablesHTC Global Services India Private Ltd, India June 2016 November 2018Data EngineerImplemented a multi-tier data processing system utilizing Apache Spark and Apache Nifi, integrating Snowflake to leverage cloud-native elastic scaling for real-time and batch data workloads.Implemented Python automation scripts for dynamic schema management in Snowflake, drastically reducing manual configuration and accelerating data pipeline modifications.Expanded data warehousing capabilities using AWS Redshift, integrating with Snowflake for cross-platform data consistency checks and synchronization tasks.Developed complex Apache Airflow workflows to automate data transformations across Snowflake and AWS Redshift, ensuring data integrity and timeliness.Orchestrated data replication and recovery strategies between AWS S3 and Snowflake, ensuring business continuity through fault-tolerant design.Designed and optimized SQL queries in AWS Athena to analyze structured and semi-structured data, including JSON, CSV, and Parquet formats, achieving significant performance improvements.Optimized EMR configurations to handle large-scale data transformations, employing best practices in data partitioning and parallel processing.Utilized Databricks Delta Lake for managing structured and semi-structured data, ensuring data reliability, versioning, and ACID transactions.Deployed Kubernetes to manage containerized data applications, optimizing resource use across cloud environments and improving deployment cycles.Implemented a DevOps CI/CD pipeline integrating Docker, Jenkins, and GitHub Actions for continuous integration and continuous deployment of applications and data pipelines, enhancing code quality and collaboration.Designed and implemented complex data transformations and aggregations using PySpark DataFrame and SQL APIs within Databricks notebooks.Developed Power BI data models using Power Query Editor and DAX (Data Analysis Expressions) for efficient data transformation and calculation operations.Managed data storage costs, achieving a 25% reduction through efficient partitioning and compression techniques in AWS Glue and Redshift, demonstrating cost awareness and efficient resource utilization.Developed a Spark-based batch processing system to aggregate and analyze time-series data from multiple sources, employing advanced data windowing techniques for performance optimization.Implemented a real-time anomaly detection system in Spark Streaming, integrating with Snowflake to immediately identify and respond to critical events in data streams.Enhanced Apache Spark job performance by optimizing data serialization and compression techniques, significantly speeding up data processing tasks.Automated ETL processes using AWS Lambda functions triggered by Amazon S3 events, transforming and loading data into Amazon Redshift and other data stores.Automated the scaling of Snowflake compute resources using Apache Airflow, tailoring resource use based on workload demands.Environment: AWS (Redshift, S3, EMR, Glue, RDS, Athena), Apache Spark, Apache Nifi, Snowflake, Databricks, Python, Apache Airflow, Kubernetes, Docker, Jenkins, Power BI, GitHub Actions |