Google Cloud Data Processing Resume Cent...

Google Cloud Data Processing Resume Cent...
Resumes | Register
Candidate Information
Title	Google Cloud Data Processing
Target Location	US-VA-Centreville
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Related Resumes
Click here or scroll down to respond to this candidate
Phone Number: PHONE NUMBER AVAILABLE; Email: EMAIL AVAILABLEProfessional SummaryAn accomplished Strategic Professional with 7+ years of immersive proficiency in the realm of Big Data Development and ETL Technologies. Specializing in the creation of innovative solutions through a meticulous analytical approach, I have gained recognition for my contributions to the development of large-scale systems, consistently delivering high-quality results.My expertise extends to Amazon Web Services (AWS), where I possess a versatile understanding of cloud technologies. I excel in facilitating seamless knowledge transfer across platforms, including Google Cloud Platform (GCP) and Microsoft Azure, ensuring adaptability in diverse cloud environments.Key highlights of my skill set include:oMastery in managing data lakes on GCP, implementing best practices for robust data lake architecture.oProficiency in Google Cloud Data Prep for efficient data preparation and transformation.oUtilization of Terraform within Google Cloud for the proficient management and provisioning of GCP resources, enhancing the efficiency of data pipelines.oDesigning intricate workflows with Google Cloud Composer to orchestrate complex data processing tasks.oHarnessing the power of Google Cloud Pub/Sub for scalable, event-driven messaging solutions.oImplementation of Google Cloud Audit Logging for meticulous auditing and compliance monitoring.In the domain of Azure, I showcase expertise in designing scalable data architectures using services like Azure Data Lake, Azure Synapse Analytics, and Azure SQL. My accomplishments include adept construction and management of data pipelines via Azure Data Factory.Within the AWS ecosystem, I demonstrate proficiency in AWS EMR and AWS Lambda for high-volume data processing and analysis. Noteworthy achievements include the design and implementation of a data warehousing solution using AWS Redshift and Athena for streamlined data querying and analysis.Additional areas of expertise encompass:oIn-depth knowledge of AWS CloudWatch for comprehensive monitoring and management of AWS resources.oHands-on experience in implementing data security and access controls through AWS Identity and Access Management (IAM).oDevelopment of workflows using AWS Step Functions to handle intricate data processing tasks.oFormulation of comprehensive technical strategies and scalable CloudFormation Scripts for efficient project execution.My extensive experience also covers performance optimization for Spark across platforms such as Databricks, Glue, EMR, and on-premises systems. I have actively participated in on-premises data migration into the cloudMy software development proficiency within Big Data and Hadoop Ecosystems, utilizing Apache Spark, kafka, and various ETL technologies, has been a cornerstone of my career. I possess hands-on expertise in major components within the Hadoop Ecosystem, enabling the development of robust data solutions.Furthermore, I bring:oProficiency in Apache NIFI, integrated with Apache Kafka for streamlined data flow and processing.oExpertise in designing and managing data transformation and filtration patterns using Spark, Hive, and Python for optimal data processing.oDevelopment of scalable, reliable data solutions for real-time and batch data movement across multiple systems, ensuring seamless integration.oA proven ability to validate the technical and operational feasibility of Hadoop developer solutions, providing valuable insights into project success.oDirect engagement with business stakeholders, honing my ability to deeply comprehend project needs and objectives, aligning technical solutions with overarching goals.Technical Skills:Programming Languages: Python, Scala, PySpark, SQLScripting: Hive, SQL, Spark SQL, Shell ScriptingIDE: Jupyter Notebooks, Eclipse, IntelliJ, PyCharm Vs CodeDatabase & Tools: Redshift, DynamoDB, Synapse DB, Big Table, AWS RDS, SQL Server, PostgreSQL, Oracle, MongoDB, Cassandra, HBaseHadoop Distributions: Hadoop, Cloudera Hadoop, Hortonworks HadoopELT Tools: Spark, NIFI, AWS Glue, AWS EMR, Data Bricks, Azure Data Factory, Google Data flowFile Format & Compression: CSV, JSON, Avro, Parquet, ORCFile Systems: HDFS, S3, Google Storage, Azure Data LakeCloud Platforms: Amazon AWS, Google Cloud (GCP), Microsoft AzureOrchestration Tools: Apache Airflow, Step Functions, OozieContinuous Integration CI/CD: Jenkins, Code Pipeline, Docker, Kubernetes, TerraformVersioning: Git, GitHubProgramming Methodologies: Object-Oriented Programming, Functional ProgrammingProject Methods: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development, Functional Testing, Design Thinking, Lean Six SigmaData Visualization Tools: Tableau, Power BI, AWS Quick SightSearch Tools: Apache Lucene, ElasticsearchSecurity: Kerberos, Ranger, IAMProfessional ExperienceBig Data EngineerTech Consulting, Atlanta, Georgia Since Jan 2024 - PresentOrchestrated the deployment of Amazon EC2 instances and skillfully managed S3 storage, customizing instances for specific applications and Linux distributions.Managed critical AWS services including S3, Athena, Glue, EMR, Kinesis, Redshift, IAM, VPC, EC2, ELB, Code Deploy, RDS, ASG, and CloudWatch, constructing a resilient and high-performance cloud architecture.Offered expert insights on AWS system architecture, implementing rigorous security measures and protocols to safeguard vital assets and ensure compliance.Optimized data processing using Amazon EMR and EC2, enhancing system performance and scalability.Designed and fine-tuned data processing workflows utilizing AWS services like Amazon EMR and Kinesis, ensuring efficient data analysis.Leveraged AWS Glue for data cleaning and preprocessing, performed real-time analysis using Amazon Kinesis Data Analytics, and harnessed services like EMR, Redshift, DynamoDB, and Lambda for scalable data processing.Implemented robust data governance and security protocols through AWS IAM and Amazon Macie, ensuring protection of sensitive data.Utilized AWS Glue and Fully Managed Kafka for efficient data streaming, transformation, and preparation.Established and managed data storage on Amazon S3 and Redshift, ensuring accessibility and organization of data.Seamlessly integrated Snowflake into the data processing workflow, elevating data warehousing capabilities.Evaluated on-premises data infrastructure for migration opportunities to AWS, orchestrated data pipelines via AWS Step Functions, and employed Amazon Kinesis for event-driven processing.Proficiently utilized Apache Airflow for workflow automation, orchestrating complex data pipelines and automating tasks to boost operational efficiency.Implemented AWS CloudFormation to automate the provisioning of AWS resources, ensuring consistent and repeatable deployments across multiple environments.Employed an array of tools (Scala, Hive, Sqoop) within Hadoop ecosystems for data cleaning, transformation, analysis, and querying.Fine-tune Kafka configurations to optimize performance, throughput, and latency.Developed efficient Spark jobs using Scala and Python, harnessing Spark SQL for swift data processing and analysis. Implemented robust data quality frameworks to ensure accuracy, consistency, and reliability across extensive datasets, conducting thorough profiling, cleansing, and validation processes.Conducted streaming data ingestion processes using PySpark, enhancing data acquisition capabilities.Transferred meticulously transformed data to various destinations, including databases, data warehouses, flat files, and Excel spreadsheets, ensuring seamless data mapping between incoming and destination fields.Integrate Kafka seamlessly with other components of the Big Data ecosystem, such as Hadoop, Spark, and Flink.Designed and implemented effective data models tailored for NoSQL databases, considering specific data and application.Collaborated cross-functionally to establish robust data quality best practices and governance frameworks.Utilized AWS Lambda and Serverless Architecture to run code without provisioning servers, significantly reducing operational costs and complexity.Data ScientistCVS Health, Cranberry Twp, Pennsylvania Apr 2023  May 2023Cleaned and manipulated 40M+ Medicare Members' dataset using Python, resulting in a 30% increase in data accuracy.Developed comprehensive machine learning models utilizing advanced techniques such as XGboost, and ensemble learning to predict clients' medication adherence risk levels with an impressive accuracy rate of ~90%.Utilized advanced data analysis techniques to identify key factors impacting CVS's STAR ratings, leading to the implementation of targeted interventions that led to a 10% increase in overall rating scores.Conducted extensive data mining and statistical analysis to identify key factors influencing medication adherence, resulting in the development of targeted outreach methods predicted to improve patient compliance by 15%.Worked on publishing Power BI reports on dashboards in Power BI server.Worked with both live and import data into Power BI for creating reports.Developed reports summarizing patient and call data trends, providing actionable insights to senior leadership.Data EngineerPrintboda Limited, Kampala, Uganda Jun 2017  July 2022Utilized PySpark to efficiently ingest structured and unstructured financial data from various sources.Utilized Python libraries, including NumPy, SciPy, Scikit-Learn, and Pandas, within the GCP ecosystem to enhance the analytics components of the data analysis framework, along with PySpark data frames and RDDs.Built scalable and high-performance data processing applications by leveraging PySpark libraries.Utilized Python and Scala for data manipulation, analysis, and modeling tasks within Spark and other data processing frameworks.Designed and optimized data models and schema using Hive for querying structured data stored in Hadoop Distributed File System (HDFS).Integrated Apache HBase for storing and accessing large-scale semi-structured data, optimizing data retrieval performance for analytical queries.Orchestrated workflow scheduling and automation using Apache Oozie, ensuring timely execution of data processing tasks and job dependencies.Leveraged Impala for interactive querying and analysis of data stored in Hadoop clusters, enabling ad-hoc analytics and exploration.Collaborated with business intelligence teams to develop data visualization dashboards and reports using Power BI, providing actionable insights to stakeholders.Contributed to the development of data engineering best practices and standards, including code review, documentation, and performance optimization techniques.Participated in cross-functional teams to design and implement data solutions, ensuring alignment with business requirements and scalability.Conducted comprehensive assessments of existing on-premises data infrastructure, implementing CI/CD pipelines via Jenkins for streamlined ETL code deployment and version control using Git.Ensured data quality through automated testing in CI/CD pipelines and implemented ETL pipeline automation using Jenkins and Git.Managed and worked with NoSQL databases like MongoDB and HBase at Printboda.EducationMaster of Science, Business AnalyticsWilliam & Mary, Mason School of Business, Williamsburg, VASolomon ItanyBig Data/Cloud Engineer
Respond to this candidate
Your Message
Please type the code shown in the image: