| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Lead Data EngineerEmail ID: EMAIL AVAILABLEContact Details: PHONE NUMBER AVAILABLELinkedIn: http://LINKEDIN LINK AVAILABLEPROFESSIONAL SUMMARYWith over 10 years of experience as a Lead Data Engineer, specializing in the design and deployment of scalable data infrastructure across GCP, AWS, and Azure cloud platforms.Proficient in managing and optimizing Big Query, Cloud SQL, and GCS for high-performance data analytics and storage solutions.Demonstrated expertise in Cloud Data Proc, GCP Dataproc, and Azure Data Factory, streamlining data processing and ETL workflows.Advanced skills in Informatica Power Center and Talend for Big Data, enhancing data integration and quality across diverse data sources.Extensive hands-on experience in GCP development and migration, specializing in Big Query, Looker, Snowflake, and GCP technologies.Highly proficient in SQL and data analysis with a proven track record of handling large-scale data processing using tools like Hive, PySpark, and Kafka.Proficient in PowerShell scripting along with other programming languages like Python, Scala, and Java.Skilled in Scala, Python, and Java programming, developing robust, scalable data processing scripts and applications.Experienced in Airflow and Cloud Composer for orchestrating complex data workflows, ensuring efficient and reliable data operations.Leveraged Power BI, Tableau, and SSRS for dynamic data visualization and reporting, driving insights and decision-making.Proficient in using modeling and mapping tools such as Visio and Erwin, and adept with the Microsoft Office Suite including Word, Excel, PowerPoint, and MS Project. Strong problem-solving skills with a focus on optimizing data processes and ensuring data integrity.Expertise in database management using Teradata, Cloud Spanner, and SQL Server, ensuring optimal performance and data integrity.Advanced knowledge of SAS analytics, employing statistical methods to analyze and derive valuable insights from large datasets.Designed and executed data migrations and transformations using AWS Data Pipeline, GCP Dataflow, and Azure Synapse Analytics, enhancing data agility and accessibility.Managed VM Instances, EC2, RDS, and DynamoDB for cloud computing and database services, optimizing resource allocation and performance.Proven ability to manage and model complex data structures using various technologies. Eager to learn and adapt to new technologies like NoSQL databases (e.g., MongoDB) to enhance data storage and retrieval capabilities.Developed and maintained secure data exchange protocols using SFTP, ensuring the integrity and confidentiality of data transfers.Skilled in Azure analysis services and Azure SQL for cloud-based analytics and database services, supporting advanced analytics initiatives.Develop and maintain Jupyter Notebooks for data analysis, visualization, and machine learning tasks.Managed tasks and sprint planning using Jira, ensuring timely completion of deliverables and tracking project milestones.Utilized Apache Beam and Spark Streaming for real-time data processing, enabling timely insights and responses to dynamic data streams.Expertise in SSIS, SSDT, and BIDS for SQL Server data integration and business intelligence development, enhancing data-driven strategies.Proficient in Salesforce SOQL and Dell Boomi, integrating CRM data and cloud data integration platforms for comprehensive data insights.TECHNICAL SKILLS:Big Data TechnologiesHadoop, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Ambari, Oozie, MongoDB, Cassandra, Mahout, Puppet, Avro, Parquet, Snappy, Falcon, HDFS, Data frame, RDDs.NoSQL DatabasesPostgres, HBase, Cassandra, MongoDB, Amazon DynamoDB, RedisHadoop DistributionsCloudera (CDH3, CDH4, and CDH5), Hortonworks, MapReduce, and Apache.Programming LanguagesScala, Python, Java, SQL, PL/SQL, HiveQL, Unix Shell ScriptingCloud Computing ToolsGCP (Big Query, GCS, Data Flow, Apache Beam, Cloud SQL, Composer, Cloud Functions, Cloud Storage, Looker)AWS (S3, EMR, EC2, Lambda, VPC, Cloud Watch, CloudFront, AWS Glue, Step Functions)Microsoft Azure (Datalake, Azure Data Factory, Azure Synapse Analytics, Blob storage, Azure SQL, Azure analysis services)Databases & WarehousingMySQL, DB2, Cloud SQL, Big Query, Snowflake, Teradata, SQL Server (2012, 2016, SSDT-2012 &2015), Oracle (10g/9i), Netezza Mako 7.2, Dynamo DB, PL/SQL Developer,Business Intelligence ToolsTableau, Power BI, LookerData Integration & ETL ToolsTalend, Pentaho, Informatica, Ab Initio, SSIS, IBM Data Stage, Jupyter NotebooksDevelopment MethodologiesAgile, Scrum, Waterfall Methodology,JiraEMPLOYMENT BACKGROUNDHCA Healthcare, Nashville, Tennessee July 2023 to PresentData Engineer /Big Data - LeadExperience in building multiple Data pipelines, end-to-end ETL, and ELT processes for Data ingestion and transformation in GCP and coordinating tasks among the team.Set up GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.Designed and implemented a real-time data processing pipeline using Kafka to ingest and process streaming data from various healthcare systems.Configured Kafka topics, producers, and consumers to ensure efficient data flow and minimal latency.Integrated Snowflake with Kafka for real-time data ingestion and Big Query for advanced analytics and reporting.Developed and deployed interactive dashboards and reports using Looker to provide business insights and drive data-driven decision-making.Integrated Looker with Big Query to create robust data models and explore datasets effectively.Developed and deployed the outcome using spark and Scala code in the Hadoop cluster running on GCP.Designed and implemented various layers of Data Lake, Design star schema in Big Query.Using the g-cloud function with Python to load data into a big query for arrival CSV files in the GCS bucket.Processed and load bounded and unbounded data from Google pub/subtopic to big query using cloud Dataflow with Python.Developed Spark applications by using Scala, and Java and implemented Apache Spark data processing project to handle data from RDMS.Used Scala components to implement the credit line policy based on the conditions applied to spark data frames.Designed and created datasets in Big Query to facilitate efficient querying and analysis.Worked on Secure File Transfer Protocol (SFTP) to transfer data between data sources.Got involved in migrating the on-prem Hadoop system to using GCP (Google Cloud Platform).Migrated previously written cron jobs to airflow/composer in GCP.Created GCP Big Query authorized views for row-level security or exposing the data to other teams.Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on-premises ETLs to Google Cloud Platform (GCP) using cloud-native tools such as GCP BIG query, GCP Cloud Data Proc, Google Cloud Storage, Composer.Designed and implemented a data pipeline to extract and transform customer data from various sources.Utilized data modeling techniques to structure the data for efficient storage and retrieval in a relational database.Designed and implemented ETL processes using Informatica to extract, transform, and load data from on-premises sources to the cloud.Designed Pipelines with Apache Beam and Dataflow and orchestrated jobs into GCP.Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud Platform using GCS, Big Query, Cloud SQL, and Cloud DataProc.Documented the inventory of modules, infrastructure, storage, and components of the existing On-Prem data warehouse for analysis and identifying the suitable technologies/strategies required for Google Cloud Migration.Designed and implemented data models and schemas using Erwin and Visio to support data warehousing and business intelligence solutions.Generated reports and dashboards using Excel and PowerPoint to present data insights to business users.Utilized Microsoft Office Suite to document data processes, create project plans, and present findings to stakeholders.Monitored and maintained data warehouse performance, ensuring optimal data retrieval and storage.Worked closely with stakeholders to understand their needs and deliver actionable insights through customized Power BI reports.Utilized PowerShell scripting to automate various aspects of the cloud migration process, including VM provisioning, configuration management, and data transfer tasks.Developed custom PowerShell modules and cmdlets to streamline repetitive tasks and enhance operational efficiency during the migration project.Provide training and support to team members on using Jupyter Notebooks and associated tools.Adept at developing reusable Jira workflows, scripts, and plugins to enhance productivity and streamline processes.Develop and maintain custom Jira workflows and plugins to support business processes.Development Stack: GCP, Cloud SQL, Big Query, Kafka, Looker, Cloud Data Proc, GCS, Cloud SQL, Cloud Composer, Informatica Power Center, Scala, SFTP, PowerShell, Talend for Big Data, Power BI, Jira, Airflow, Hadoop, Hive, Teradata, SAS, Teradata, Spark, Snowflake, Python, Java, SQL Server.Macy's, New York, NY May 2022 to June 2023Senior/Lead Data EngineerDesigned and deployed scalable data processing pipelines using GCP Dataflow and Pyspark, enhancing data analysis capabilities.Utilized SAS and Hive for advanced analytics and reporting, providing insights into customer behavior and sales trends.Integrated Sqoop and Teradata for efficient data transfer and warehousing, optimizing data storage and accessibility.Managed GCP's DataProc Big Query environments to support high-volume data analytics, improving decision-making processes.Implemented data pipelines using Kafka to stream data into Snowflake in real-time, ensuring up-to-date customer profiles.Used Looker to build interactive dashboards and reports on top of Snowflake, providing deep insights into customer behavior and preferences.Utilized Looker to create real-time dashboards and reports, providing insights into customer purchase patterns and preferences.Implemented PowerShell scripts to automate data pipeline workflows and schedule ETL jobs using Apache Airflow on GCP.Integrated PowerShell scripts with Apache Airflow DAGs to trigger data processing tasks, monitor job status, and handle error handling scenarios.Ability to work with complex data structures, data transformation, and data preparation skills readily transferable to NoSQL databases like MongoDB.Conducted performance tuning on Looker queries to ensure quick and efficient data retrieval from Big Query.Enabled real-time monitoring of key performance metrics, leading to improved customer engagement strategies.Implemented Hadoop ecosystems for distributed data processing, increasing data processing efficiency and scalability.Orchestrated data storage solutions using GCS, ensuring data security and high availability for retail operations.Developed and maintained data models in Python and Snowflake, enabling sophisticated data analysis and reporting.Integrated data from multiple sources, including Snowflake and Big Query, into Power BI to provide a unified view of retail operations.Utilized Power BI's features to create visually compelling and interactive reports that enabled data-driven decision-making across the organization.Leveraged Power BI for creating interactive dashboards and reports, enhancing data visualization and business intelligence.Utilized Data Flow and SQL Database for real-time data processing and analysis, supporting operational and strategic needs.Maintained data models using dbt, including dimensional modeling for data warehouses.Implemented complex data transformations and business logic within dbt models using SQL.Experienced in orchestrating data pipelines and workflows using DBT scheduling, version control, and dependency management features.Managed and optimized Big Query for complex queries and analytics, delivering timely insights into inventory and sales performance.Automated data integration processes with GCP Dataprep, improving data quality and preparation efficiency.Orchestrated complex data workflows using Cloud Composer, facilitating seamless data operations across multiple sources.Implemented Cloud Pub/Sub for event-driven data integration, enhancing real-time data availability and analysis.Utilized Cloud Storage Transfer Service for efficient data migration, ensuring data consistency and reliability.Implemented globally distributed databases using Cloud Spanner, supporting high-throughput and scalable applications.Administered relational and non-relational databases using Cloud SQL, ensuring optimized data storage and retrieval.Cataloged and managed data assets with Data Catalog, improving data governance and compliance.Deployed GCP Databricks for scalable data analytics and machine learning, driving innovation in retail analytics.Analyzed customer interaction data to tailor marketing strategies and enhance customer engagement and loyalty programs.Optimized supply chain analytics using Big Query, reducing operational costs and improving inventory management.Designed and implemented custom Jira dashboards and reports to provide visibility into project status and metrics.Assisted in the migration of Jira projects and data during company mergers and acquisitions.Development Stack: GCP, Pyspark, SAS, Hive, Looker, Sqoop, Kafka, PowerShell, Teradata, DBT, Hadoop, Hive, GCS, Python, Snowflake, Power Bi, SQL Database, Big Query, GCP Dataprep, GCP Dataflow, GCP Dataproc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Jira, Data Catalog, GCP Databricks.Homesite insurance Boston, MA November 2019 to March 2022Data EngineerDeveloped scalable data pipelines using GCP Dataflow and Apache Beam, significantly improving data processing efficiency.Managed and optimized Big Query for advanced data analytics, supporting decision-making with actionable insights.Implemented secure data storage solutions using GCS Buckets, ensuring data integrity and accessibility.Automated data integration processes with G-Cloud Functions, enhancing system responsiveness and efficiency.Utilized Cloud Shell and Gsutil for efficient management of cloud resources, streamlining administrative tasks.Leveraged Bq Command Line Utilities to manage Big Query resources effectively, optimizing data query performance.Orchestrated complex data workflows using Cloud Composer, improving operational reliability and automation.Integrated Cloud Pub/Sub for real-time data messaging, facilitating seamless inter-service communication.Utilized Cloud Storage Transfer Service for efficient data migration and synchronization across cloud environments.Implemented Cloud Spanner for globally distributed database management, ensuring high availability and consistency.Designed and implemented ETL processes to load data into Snowflake from various sources, ensuring data quality and consistency.Leveraged Snowflake's advanced features, such as time travel and zero-copy cloning, to support audit and compliance requirements.Integrated Snowflake with PySpark for advanced data transformations and Kafka for real-time data ingestion.Maintained Looker dashboards for monitoring key performance indicators and operational metrics.Managed relational databases using Cloud SQL, enhancing data storage and retrieval processes.Cataloged data assets with Data Catalog, improving data discoverability and governance.Deployed GCP Databricks for collaborative data science and analytics, accelerating innovation.Migrated large-scale data processing workflows from Azure Datalake to GCP Dataproc, ensuring seamless transition and scalability.Leveraged Azure Data Factory for data movement and transformation, prior to migrating to GCP, enhancing workflow continuity.Architected a financial reporting platform using Power BI to provide transparent and accurate financial data to stakeholders.Developed Power BI reports that tracked key financial metrics, compliance status, and audit trails, ensuring regulatory adherence.Managed VM Instances for flexible computing needs, optimizing performance and cost.Administered databases in MySQL and PostgreSQL, ensuring data security and high performance.Utilized Salesforce SOQL for efficient querying of Salesforce data, enhancing customer data analysis.Wrote and optimized data processing scripts in Python and Scala, supporting complex data analysis tasks.Developed and maintained Spark applications for large-scale data processing, improving throughput and latency.Implemented data warehouses using Hive, enhancing data organization and accessibility for analytics.Employed Sqoop for efficient data transfer between Hadoop and relational databases, streamlining data integration.Utilized Spark-SQL for querying and analyzing data in Spark, enhancing data analysis capabilities.Migrated critical insurance data processing pipelines from Azure Cloud to GCP, ensuring data integrity and minimizing downtime.Led the transition of data warehousing solutions from Azure to GCP, leveraging Big Query for improved analytics and storage efficiency.Development Stack: Big query, Looker, GCS Bucket, Kafka, Apache Beam, Cloud Dataflow, Cloud Shell, GCP Dataflow, GCP Dataproc, Snowflake, Power BI, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks, Dataproc, Azure Cloud, Azure Datalake, Azure Data Factory, Cloud SQL, MySQL, PostgreSQL, Salesforce SOQL, Python, Scala, Spark, Hive, Sqoop, Spark-SQL.Charter Communications, Negaunee, MI June 2017 to October 2019Data EngineerLeveraged AWS Redshift for efficient data warehousing, enabling scalable analytics solutions across the organization.Managed data storage and retrieval using AWS S3, optimizing data accessibility and security for various business applications.Designed and implemented AWS Data Pipelines to automate data movement and transformation, enhancing operational efficiency.Utilized AWS Glue for serverless data integration, facilitating seamless data preparation for analytics.Developed and maintained Hadoop YARN applications to process large data sets efficiently, ensuring high availability and performance.Managed the end-to-end migration of the retail data warehouse from on-premises servers to Amazon Redshift.Created ETL pipelines using Informatica to facilitate data extraction, transformation, and loading into the cloud environment.Administered SQL Server databases, implementing best practices for data management and security.Employed Spark and Spark Streaming for real-time data processing, supporting timely insights and decision-making.Wrote robust, scalable code in Scala and Python to build and optimize data processing workflows.Managed streaming data ingestion and processing using Kinesis, improving data flow and accessibility.Designed and implemented data warehouses in Hive, optimizing data storage for fast querying and analysis.Administered Linux servers hosting data applications, ensuring system stability and performance.Utilized Sqoop for efficient data transfer between Hadoop and relational databases, enhancing data integration capabilities.Leveraged Informatica for enterprise data integration, supporting complex data management needs.Developed interactive dashboards and reports in Tableau, providing actionable insights to business users.Employed Talend for data integration and transformation tasks, improving data quality and accessibility.Designed and managed NoSQL databases in Cassandra, ensuring scalability and performance for large-scale applications.Automated data workflow management using oozie and Control-M, streamlining operations and reducing manual overhead.Integrated Fivetran for seamless data integration from various sources, enhancing data completeness and reliability.Managed EMR clusters for distributed data processing, optimizing computing resources and reducing processing times.Configured and maintained EC2 instances for various data applications, ensuring optimal performance and reliability.Administered RDS and DynamoDB databases, implementing best practices for scalability and data security.Managed Oracle 12c databases, ensuring high availability and performance for critical business applications.Utilized advanced features of AWS Glue for data cataloging and ETL processes, enhancing data discoverability and usability.Development Stack: AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue, AWS S3, AWS Glue, Hadoop YARN, SQL Server, Spark, Spark Streaming, Scala, Kinesis, Python, Hive, Linux, Sqoop, Informatica, Tableau, Talend, Cassandra, oozie, Control-M, Fivetran, EMR, EC2, RDS, Dynamo DB Oracle 12c, AWS GlueAmigos Software Solutions, Hyderabad, India November 2015 to March 2017Data EngineerDesigned and configured relational servers and databases on the Azure Cloud, carefully analyzing both current and future business requirements.Played a key role in migrating data from on-premises SQL servers to Cloud databases, specifically Azure Synapse Analytics (DW) and Azure SQL DB.Demonstrated extensive expertise in creating pipeline jobs, scheduling triggers, and mapping data flows using Azure Data Factory (V2), securely storing credentials in Key Vaults.Proficient in the creation of elastic pool databases and scheduling elastic jobs to execute T-SQL procedures.Developed ETL jobs to load, serve, and transport data into buckets, facilitating the transfer of S3 data to the Data Warehouse.For log analytics and improved query response times, I effectively used Kusto Explorer and crafted alerts using Kusto query language.Successfully created tabular models within Azure Analysis Services to meet specific business reporting requirements.Proficiently worked with Azure BLOB and Data Lake storage, efficiently loading data into Azure SQL Synapse Analytics (DW).tackled complex business queries involving multiple tables from different databases by creating both correlated and non-correlated sub-queries.designed and implemented business intelligence solutions using SQL Server Data Tools 2015 and 2017 versions, effectively loading data into both SQL and Azure Cloud databases.Crafted reports in TABLEAU for data visualization and tested native Drill, Impala, and Spark connectors.Developed various Python scripts for vulnerability assessment, including SQL injection, permission checks, and performance analysis.Proficiently managed the import of data from various sources into HDFS using Sqoop, executed transformations using Hive and MapReduce, and subsequently loaded data into HDFS.Development Stack: Microsoft SQL Server 2012, 2016, Python, SSDT-2012 &2015, Azure Synapse Analytics, Azure Data Lake & BLOB, Azure SQL, Azure data factory, Azure analysis services, BIDS.Yana Software Private Limited, Hyderabad, India August 2014 to October 2015Data AnalystActively participated in various transformation and data cleansing tasks within SSIS packages during data migration.Applied a variety of data transformations, including Lookup, Aggregate, Sort, Multicasting, Conditional Split, Derived Column, and more.Developed Mappings, Sessions, and Workflows to extract, validate, and transform data in accordance with business rules using Informatica.Designed target tables based on the reporting team's requirements and devised Extraction, Transformation, and Loading (ETL) processes using Talend.Worked with Netezza SQL scripts to transfer data between Netezza tables.Scheduled Talend Jobs using Job Conductor, a scheduling tool available in Talend.Responsible for querying, creating stored procedures, writing complex queries, and using T-SQL joins to address various reporting operations and handle ad-hoc data requests.Focused on performance monitoring and optimizing indexes by using tools like Performance Monitor, SQL Profiler, Database Tuning Advisor, and Index Tuning Wizard.Served as the point of contact for resolving locking, blocking, and performance issues.Authored scripts and devised indexing strategies for migrating data to Amazon Redshift from SQL Server and MySQL databases.Employed AWS Data Pipeline to configure data loads from S3 into Redshift.Used JSON schema to define table and column mappings from S3 data to Redshift and devised indexing and data distribution strategies optimized for sub-second query response.Hands-on experience with Dell Boomi Connectors, including FTP, Mail, Database, Salesforce, Web Services Listener, HTTP Client, Web Services SOAP Client, Success Factors, and Trading Partner.Developed Database/Flat file/JSON/XML profiles, Boomi Mappings, and Processes using different connectors and logic shapes between application profiles and various Trading partners within Dell Boomi.Development Stack: Amazon Redshift, AWS Data Pipeline, Talend Platform for Big Data MS SQL Server 2008R2/2012, Oracle 10g/9i, Dell Boomi, Netezza Mako 7.2, S3, SQL Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), Share Point, TFS, MS Project, MS Access and Informatica. |