Data Engineer Resume Indianapolis, IN

Data Engineer Resume Indianapolis, IN
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Data Engineer
Target Location	US-IN-Indianapolis
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Data Engineer Modeler Zionsville, IN

Devops Engineer Azure Data Plainfield, IN

Data Engineer Quality Bloomington, IN

Data Engineer Processing Avon, IN

Data Engineer Analytics Indianapolis, IN

Senior Data Scientist Indianapolis, IN

Software Engineer Greenwood, IN

Click here or scroll down to respond to this candidate

Candidate's Name
Indianapolis, Indiana, USAMobile: PHONE NUMBER AVAILABLE Email: EMAIL AVAILABLE LinkedIn: Candidate's Name .dev EDUCATION:Master of Science in Data Science August Street Address   May 2023 Indiana University, Bloomington Indiana, USACoursework: Statistics, Machine Learning, Cloud Computing, Advanced Database Concepts, High-Performance Computing, Signal Processing, Bioengineering. Achievements: Secretary, Data Science Club Google Advanced Data Analytics Professional Certificate, Link Winner, AWS Game Day challenge, Link Bachelor of Technology in Computer Science June Street Address   March 2020 Shivaji University, Kolhapur Kolhapur, IndiaCoursework: Distributed Systems, Operating System, Computer Networking, Database Management Systems, Data Mining, Algorithms, Microprocessors. SKILLS:Programming Languages : Python, SQL, C++ and JavaScript, R, Shell Scripting. Fundamentals : Data modelling, Data quality, Query Optimization, Automation, Custom ETL, CI/CD, Data Warehousing. Databases : MySQL, PostgreSQL, Hadoop, Spark, BigQuery MongoDB, Firebase, Google Cloud Storage, DynamoDB. Visualization Tools : Tableau, plotly, ggplot, Matplotlib, Seaborn, PowerBI, Excel. Machine Learning Tools : SciPy, Scikit, Pandas, NumPy, PyTorch, Regression, Classification, Clustering, Decision Trees, Neural Networks. Cloud Technologies : Linux/Unix, AWS (S3, EC2, Lambda), Google Cloud Platform, Cloud native technologies, Docker, Kubernetes. Generative AI : LangChain, Transformers (Hugging Face), OpenAI API, Google Gemini API, Streamlit. Miscellaneous : Informatica, Git, Apache Spark, Apache Kafka, Apache Tomcat, Snowflake, Hadoop, MapReduce, Hive, Yarn. EXPERIENCE:Data Engineer CVS Health Bloomington, Indiana, USA October 2023  PresentLed the development of scalable data pipelines and optimized custom ETL processes using Python and Apache Spark, increasing data processing efficiency by 60%. Automated CI/CD pipelines to handle up to 4TB of data daily from diverse sources.Migrated legacy data systems to Snowflake, managing a secure data lake infrastructure that cut query times by 40% and ensuring data security with robust quality checks using Apache Airflow. Performed performance tuning and query optimization to enhance database efficiency.Authored and updated comprehensive documentation for ETL processes, data pipelines, and system architecture; facilitated seamless handovers while enhancing team collaboration that contributed to a 30% reduction in onboarding time for new engineers.Integrated five new data sources into the data ecosystem, providing insights that led to a 15% increase in actionable recommendations. Data Engineer Indiana University - Kelley School of Business Bloomington, Indiana, USA October 2021  May 2023Automated the digitization of invoice data from PDFs into a centralized database using SQL and Python, reducing processing time by 30%.Enhanced data accuracy and integrity via robust validation and cleansing; presented analysis results via Tableau and Excel.Collaborated with cross-functional teams to redesign database architecture, develop data templates, and improve data scraping methodologies.Executed SQL queries to extract and analyze 1M+ financial transactions from multiple tables. Collaborated with stakeholders to identify data sources, improving accuracy by 20%. Utilized Excel (Pivot Tables, VLOOKUP, Macros) to ensure data compliance and integrity.Developed interactive Tableau dashboards with calculated fields and KPIs, reducing manual data analysis efforts by 30%. Software Engineer, Data Platform Tata Consultancy Services Pune, India May 2019 - August 2021Designed and maintained scalable database solutions for mission-critical applications, ensuring high availability and optimal performance.Optimized SQL queries, achieving a 20% reduction in query execution time and improving overall database performance by 12%.Integrated RESTful API web services for precise data retrieval and storage, optimizing external data source interactions.Collaborated on developing web applications for a local grocery store and a hotel inventory management system using Django and MySQL. Implemented seamless e-commerce features including payment gateway integration, order tracking, and inventory management.Architected a Python-based data pipeline using Selenium to automate data scraping, preprocessing, and modeling of utility data. PROJECTS:PragyaYantra: A Generative AI web application Github WebsiteLaunched a sophisticated AI web application showcasing a range of cutting-edge AI capabilities, powered by the Google Gemini API. This application provides users with intuitive tools for text generation, intelligent dialogues, file handling, and more.Utilized HTML, JavaScript, CSS and NodeJS to create an intuitive user interface for seamless interaction across multiple AI modules.Integrated the Google Gemini API to power real-time text generation, conversation simulation, document analysis, and code generation.Focused on user experience and responsiveness, featuring interactive elements to enhance engagement and streamline content generation. Parallel K-means Accelerator for multidimensional data GithubArchitected K-Means Accelerator: a high-performance parallel K-means clustering solution for multidimensional data using C++.Achieved dramatic speedups for K-means clustering of high-dimensional datasets by harnessing efficient multithreaded (OpenMP) and distributed- memory (MPI) parallelization on a supercomputer.Scaled the solution to a massive 256-node 64-core supercomputer, enabling ultrafast processing of colossal, multidimensional datasets.Slashed K-means clustering computation time, facilitating potential large-scale deployments on more than 1000-node supercomputers. Distributed Search Engine: MapReduce, Cloud Integration, and ETL Pipelines GithubEngineered a sophisticated MapReduce-based search engine for over 1000 textbooks, integrating ETL pipelines for data acquisition.Applied GCP, Node.js and Google Cloud Functions to deploy Mapper and Reducer components, optimizing scalability.Built an innovative web interface featuring rapid sub-second search results and advanced batch search via file links, streamlining efficiency.Showcased versatility in merging cloud deployment, ETL architecture, user-centric interface design, distributed computing, and data engineering.

Respond to this candidate
Your Message
Please type the code shown in the image: