| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
EMAIL AVAILABLEPHONE NUMBER AVAILABLEBig Data EngineerLinkedIn: https://LINKEDIN LINK AVAILABLEPROFILE SUMMARYWith over 5 years of progressive IT experience, I specialize in Software Life Cycle Development, covering analysis, design, development, deployment, testing, documentation, implementation, and maintenance of web-based and client/server applications.My expertise spans AWS EMR, Spark deployment with S3, and designing end-to-end Hadoop infrastructures using MapReduce, Hive, PIG, and more. Proficient in Spark using Python and Scala, I have optimized algorithms in Hadoop and enhanced data processing performance.My background includes Global Search with Elasticsearch, working across various Linux server environments, and employing cloud strategies with AWS and Azure.I have collaborated with technical and non-technical teams to develop scalable platform designs using containerization tools like Rancher, Kubernetes, and Docker.My experience includes migrating SQL databases to Azure Data Lake and Azure SQL Data Warehouse, leveraging Azure Data Factory.Skilled in Java and Spring Boot microservice design, I deliver high-quality, maintainable code and have worked with Software as a Service (SaaS) models.With extensive data integration and data warehousing experience using ETL tools such as Informatica PowerCenter, AWS Glue, SSIS, Talend, and Abinitio.I adeptly analyze and resolve performance and scalability bottlenecks. I have built and maintained automated deployment pipelines, implemented data security measures, and developed RESTful APIs and microservices with Node.js and Express.js.My expertise includes designing scalable cloud architectures with AWS, Azure, GCP, and Snowflake, and working with big data technologies like Apache Kafka, Flink, NiFi, Spark, and PySpark. Versatile in Agile and Waterfall methodologies, I excel both independently and collaboratively, with strong communication and interpersonal skills.TECHNICAL PROFICIENCYOperating Systems: Windows 98/2000/XP/7,8, Mac OS and Linux CentOS, Debian, UbuntuProgramming Languages: Python, R, C, C++Web Technologies: HTML/HTML5, CSS/CSS3, XML, jQuery, JSON, Bootstrap, Angular JSPython Libraries/Packages: NumPy, SciPy,Boto, Pickle, PySide, PyTables, Data Frames, Pandas, Matplotlib, SQLAlchemy, HTTPLib2, Urllib2, Beautiful Soup, Py QueryStatistical Analysis Skills: A/B Testing, Time Series Analysis, MarkoIDE: PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans, Sublime Text, Visual CodeMachine Learning and Analytical Tools: Supervised Learning (Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, Classification), Unsupervised Learning (Clustering, KNN, Factor Analysis, PCA), Natural Language Processing, Google Analytics Fiddler, Tableau.Cloud Computing: AWS, Azure, Rackspace, OpenStackAWS Services: Amazon EC2, Amazon S3, Amazon Simple DB, Amazon MQ, Amazon ECS, Amazon Lambdas, Amazon Sage maker, Amazon RDS, Amazon Elastic Load Balancing, Elastic Search, Amazon SQS, AWS Identity and access management, AWS Cloud Watch, Amazon EBS and Amazon CloudFormationDatabases/Servers: MySQL, SQLite3, Cassandra, Redis, PostgreSQL, CouchDB, MongoDB, Teradata, Apache Web Server 2.0, Nginx, Tomcat, JBoss, WebLogicETL: Informatica 9.6, Data Stage, SSIS.Web Services/ Protocols: TCP/IP, UDP, FTP, HTTP/HTTPS, SOAP, Rest, RestfulBuild and CI tools: Docker, Kubernetes, Maven, Gradle, Jenkins, Hudson, BambooSDLC/Testing Methodologies: Agile, Waterfall, Scrum, TDDWORK EXPERIENCE:Sunrise Senior Living Feb 2024- Till DateSr. Data Engineer/ Big Data EngineerRoles & Responsibilities:Proficient in leveraging ETL tools to craft efficient data pipelines, ensuring integrity and consistency in data processing.Demonstrated mastery in Apache Druid for real-time data ingestion, facilitating low latency querying and analytics on extensive datasets.Skilled in deploying and managing Druid clusters for high-throughput data processing, ensuring performance and scalability for petabyte-scale environments.Adept at integrating Druid with Apache Kafka, Hadoop, and Spark to create comprehensive, high-performance data solutions.Proven proficiency in designing, implementing, and managing end-to-end data pipelines using Azure Data Factory for seamless integration.Utilized Python and PySpark extensively for advanced data manipulation and transformation, enabling complex transformations and machine learning tasks.Employed SQL for efficient querying and manipulation within ETL pipelines, leveraging its declarative syntax for data retrieval and aggregation.Integrated ETL workflows with relational database management systems for efficient data storage, querying, and management.Incorporated data quality and cleansing techniques to ensure accuracy and reliability, handling missing values, standardizing formats, and deduplicating records.Extensive experience in administering and optimizing Azure SQL databases, enhancing performance through indexing, partitioning, and query optimization.Leveraged PySpark's parallel processing capabilities for scalable and high-performance ETL processing across machine clusters.Integrated ETL processes with cloud-based storage services for scalable and cost-effective data management.Applied real-time data processing with Apache Kafka and Spark Streaming, enabling ETL processes to handle data streams and provide insights promptly.Integrated ETL workflows with data warehousing solutions for optimized storage structures supporting complex queries on large datasets.Viral Fission March 2021- Jan 2024Data ModelerRoles & Responsibilities:Orchestrated meetings with stakeholders to gather business requirements, ensuring alignment between development teams and analysts.Designed cloud-based solutions using AWS Redshift for analytics and AWS Oracle RDS for reference and master data.Led the design and development of data warehouse environments, liaising between business users and technical teams to identify data sources and targets.Developed logical and physical data models for OLTP systems, ensuring adherence to third normal form and enforcing referential integrity.Utilized various ER modeling techniques to develop data models for claim systems, including parent-child and associative relationships.Identified and analyzed various facts from source systems and business requirements following the Kimball Approach for data warehousing.Generated DDL scripts for Redshift and RDS Oracle using ADS ER Modeler.Provided day-to-day data administration and security tasks for ETL team related to AWS Redshift and AWS Oracle RDS.Translated business requirements into detailed system specifications, developed use cases, and business process flow diagrams using UML.Ensured adherence to modeling standards and documentation completeness, including naming conventions and entity relationships.Created physical data models with specifications for keys, constraints, indexes, and other attributes.Developed and maintained data dictionaries, naming conventions, standards, and class words documents.Managed Embarcadero ER/Studio repository, performing check out/in models and reverse engineering data models from databases.Followed enterprise standards for design and quality management of project data models.Led full data warehouse lifecycle implementation, including upgrading legacy systems to enterprise data warehouses using Kimballs Four Fixes approach.Environment: Embarcadero ER/studio 16.0, XMLSpy, Oracle 11g, DB2 Z/OS, JSON, IBM DB2 UDB, AWS Redshift, OBIEE, AWS, SAP, SQL Server 2012/14, Informatica Power Center 9.6, Toad, PL/SQL, XML Files, XML Spy, Windows, MS Office Tools.SIXPEP TECHNOLOGY, HYD, IND 2019 Feb 2021Title: Data EngineerRoles & Responsibilities:Leveraged IBM InfoSphere DataStage as a fundamental component for crafting and managing data pipelines within the ETL process, ensuring seamless extraction, transformation, and loading of data.Proficient in utilizing Azure Synapse Analytics for scalable data storage, processing, and analytics, fostering unified data management within a cohesive platform.Utilized DataStage's expansive repertoire of connectors and transformations to seamlessly integrate with diverse data sources and destinations, ensuring efficient data conveyance across multifaceted platforms.Enriched the ETL process with robust data quality mechanisms embedded within DataStage, affirming precision, uniformity, and reliability of the data.Capitalized on DataStage's prowess in parallel processing to optimize data pipelines, expediting data transformation and loading, especially beneficial for substantial data volumes.Demonstrated strong skills in designing and implementing effective data models for Azure SQL databases, prioritizing data integrity and business alignment.Integrated DataStage seamlessly with IBM Db2, ensuring reliability and scalability for structured data storage and retrieval.Exploited DataStage's data lineage and metadata management functionalities to trace and visualize data journeys, enhancing clarity and comprehension.Leveraged DataStage's symbiosis with IBM Cognos Analytics for illuminating data visualization and reporting, empowering users to create interactive insights.Coalesced DataStage with IBM Watson Studio for infusing advanced analytics vigor into data workflows, fostering collaboration in deploying machine learning models.Utilized DataStage for data archiving and lifecycle management, optimizing storage expenses and ensuring compliance with data retention mandates.Elevated DataStage's capabilities through bespoke integration plugins and connectors, fostering interconnectivity between systems and services.Tapped into DataStage's robust scheduling and automation tools to navigate complex data workflows with precision and efficiency.Leveraged DataStage's elasticity and tenacity for processing substantial data volumes within clustered environments, ensuring operational continuity and performance.Environment: Hadoop, Kafka, Spark, Sqoop, Spark SQL, Spark-Streaming, Hive, Impala, Scala, pig, NoSQL, Oozie, Hbase, Data Lake, Python, Azure, Databricks, AWS(Glue, Lambda, StepFunctions, SQS, Code Build, Code Pipeline, EventBridge, Athena), Unix/Linux Shell Scripting, Informatica PowerCenter.Weblink Solutions, HYD, IND Jan 2018- Jan 2019Title: GCP Data EngineerRoles & Responsibilities:Proficient in using IBM InfoSphere DataStage for data integration, transformation, and loading processes, with a strong foundation in data warehousing and data integration principles.Developed and optimized ETL jobs in DataStage to extract data from diverse sources, perform intricate transformations, and load it into target systems, including data warehouses and cloud platforms like Hadoop and GCP.Specialized in designing and implementing data warehousing solutions in Azure, incorporating dimensional modeling and adhering to best practices.Leveraged DataStage's robust parallel processing capabilities to enhance job performance, efficiently handle substantial data volumes, and expedite data processing tasks.Designed and orchestrated complex data integration workflows within DataStage, harmonizing data flow across multiple stages, systems, and environments.Utilized various DataStage stages, including Source, Transformer, Lookup, Aggregator, Join, Filter, and Target, to execute precise data operations and transformations.Applied data quality checks and executed sophisticated data cleansing routines within DataStage, ensuring data accuracy, integrity, and adherence to standards.Hands-on experience with Azure DevOps for version control, continuous integration, and deployment, streamlining the development lifecycle for Azure and SQL projects.Collaborated closely with business analysts and stakeholders, translating data requirements into actionable DataStage job specifications aligned with organizational goals.Engaged in data profiling and mapping exercises, identifying optimal data sources, transformations, and mappings within DataStage to streamline integration processes.Integrated DataStage with a variety of tools and platforms, including data warehouses, databases, and reporting tools, fostering data flow and synchronization.Implemented robust error handling and logging mechanisms within DataStage jobs, facilitating accurate capture and management of data processing errors.Demonstrated expertise in performance tuning and optimization of DataStage jobs, leveraging execution statistics to pinpoint bottlenecks and apply strategic optimizations.Developed reusable DataStage job templates and components, promoting reusability and consistency across the ETL landscape.Conducted comprehensive unit testing and debugging of DataStage jobs, ensuring seamless functionality and data integrity.Proficiently interfaced with version control systems and deployment tools, managing the end-to-end lifecycle of DataStage jobs.In-depth understanding of data security best practices and compliance requirements in Azure, ensuring robust data privacy and adherence to standards.Provided proactive support and swift troubleshooting to resolve issues related to DataStage job execution and data integration processes, minimizing downtime and ensuring continuity.Meticulously documented DataStage job designs, configurations, and processes, establishing a valuable knowledge repository.Bolstered by a robust Python skillset, adept at implementing custom data transformations and integrations within DataStage-driven workflows.Familiar with Databricks and Informatica Intelligent Cloud Services (IICS), augmenting DataStage's capabilities for cloud-based analytics and big data processing.Possess a comprehensive understanding of data warehousing concepts, optimizing data storage for reporting and analytics, enriching DataStage-driven solutions.Aptitude in data integration methodologies, understanding the intricate dance between data sources, transformations, and destinations for holistic information orchestration.Environment: Hadoop YARN, Azure, Databricks, Spark 1.6, Spark Streaming, Spark SQL, Scala, Kafka, Python, impala, Hive, Sqoop 1.4.6, Impala, Tableau, Talend, Oozie, Control-M, Java, AWSS3, Oracle 12c, LinuxEDUCATION AND CERTIFICATIONMaster of Science (MS) Degree in Information Technology Jan 2024FRANKLIN UNIVERISTY Columbus, OH GPA: 3.49/4.0Bachelor of Technology (BTech) Degree in Computer Science June 2021ICFAI UNIVERSITY HYD, INDIA GPA: 7.2/10.0Crash Course on Python July 2020Google Grade: 92%Managing Employee Compensation Managing Employee Compensation July 2020University of Minnesota Grade: 86%Python Data Structures June 2020University of Michigan Grade: 96% |