Quantcast

Senior Data Engineer Resume Elmhurst, IL
Resumes | Register

Candidate Information
Name Available: Register for Free
Title Senior Data engineer
Target Location US-IL-Elmhurst
Email Available with paid plan
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Data Engineer Senior Chicago, IL

Data Engineer Senior Tinley Park, IL

Software Engineer Senior Chicago, IL

Senior AI Data Scientist Chicago, IL

Data Engineer Elkhorn, WI

Senior Systems Administrator/Production Operations Engineer Joliet, IL

Data Engineer South Elgin, IL

Click here or scroll down to respond to this candidate
 Data EngineerCandidate's Name
Email- EMAIL AVAILABLEPhone No- PHONE NUMBER AVAILABLEProfessional Summary:
      9+ years of overall IT experience in a variety of industries, this includes hands-on experience in Big Data Analytics and development.      Proficient troubleshooting Like problem-solving and finding solutions to challenging technical problems. Conduct ad-hoc and deep dive analysis including funnel/lifecycle/cohorts  analysis to drive growth.
      Troubleshooting production support issues post-deployment and coming up with solutions as required.      Experience in managing and reviewing Hadoop log files.      Experience in Object Oriented Analysis, Design, and development of software using UML Methodology.      Designed and implemented Spark test bench application to evaluate the quality of recommendations made by the engine.      Implemented Machine Learning Model to classify categories based on a few elements (features) from Purchase Order/Invoice to assign categories for spend Data analytics.      Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.      Experience in PL/SQL databases such as HBase and Cassandra.      Designed/developed data quality and batch framework. The batch framework is used to run/monitor all batch processes, this was done in NOSQL Oracle golden gate Korn shell scripts and Informatica, these frameworks are used extensively in the production and development environments.
      Familiarity with data encryption, and data masking.      Developed data pipeline using Flume, Pig, and Sqoop to ingest cargo data and customer histories into HDFS for analysis.      I work closely with a cross-functional team to help the FDA CVM architect a major infrastructure upgrade to their legacy environment through collaborative planning sessions.
      Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.      Developed stored procedures to extract the data from different sources and load it into the data warehouse.      Experienced in data analysis using HIVE, PIG LATIN, HBASE, and custom Map Reduce programs in Java.      Expertise in design, development, and implementation of Enterprise Data Warehouse solutions using Mediation Zone Digital Route and Talend ETL Big Data Integration suite version 6.2.      Designed and implemented streaming solutions using Kafka or AWS cloud Stream Data Analytics      Designed and developed Tableau content for the Users and co-workers for a better understanding of each concept.      Used IAM for creating roles, users, and groups and implemented MFA to provide additional security to the AWS glue account and its resources. AWS cloud Batch ECS and EKS for docker image storage and deployment.      Experienced in performance tuning by identifying bottlenecks in sources, mappings, targets, and Partitioning.      Executed process improvements in data workflows using Alteryx processing engine and PL/SQL server.      Knowledge of existing HL7 standards and the evolving Fast Healthcare Interoperability Resources (FHIR) standard JavaScript, jQuery, XML, DOM, CSS, HTML, and Restful web services.      Reviewing code and providing feedback relative to best practices, improving performance, etc.      Worked on expertise with big data technologies (HBASE, HIVE, MAPR PIG, and Talend).      Experience in extracting source data from Sequential files, XML files, and CSV files, transforming and loading it into the target Data warehouse.      Worked with ELT tools Including Talend Data Integration, Talend Big Data, Pentaho Data Integration, and Informatica.      Experience in integrating Talend Open Studio with Hadoop, Hive, Spark, and NoSQL.      Responsible for estimating the cluster size, monitoring, and troubleshooting of the Azure data factory cluster.      Experience in working with Titan/Janus graph DB, creating Vertices/Edges/Search commands to fetch the data for corresponding claim/member details for respective person/group to provide real-time services.      Used AWS Glue for the data transformation, validation, and data cleansing.      Designed and implemented configuring Topics in the new Kafka cluster in all environments.      Exposure to Spark Architecture and how RDDs work internally by involving and processing the data from Local files, HDFS, and RDBMS sources by creating RDD and optimizing for performance.      Familiar with data science including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning, and advanced data processing.      Imported the data from different sources like HDFS/HBase into Graph DB.      Executed process improvements in data workflows using Alteryx processing engine and PL/SQL      Experienced in deploying, managing, and developing MongoDB clusters.Technical Skills: Big Data Ecosystem HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala Programming languages Python, PySpark, Scala, SQL, PL/SQL, Spark Hadoop Distribution Cloudera CDH, Horton Works HDP, Apache Machin Learning Classification AlgorithmsLogistic Regression, Decision Tree, Random Forest, K-                                           Nearest Neighbor (KNN), Principal Component Analysis Version ControlGitHub, Jenkins, Bitbucket, CI/CD, IDE & Tools, DesignEclipse, Visual Studio, Net Beans, MySQL, Power BI, Tableau DatabasesOracle, SQL Server, MySQL, DynamoDB, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQL Database (HBase, MongoDB). Cloud TechnologiesMS Azure, Amazon Web Services (AWS), Google CloudProfessional Experience:_______________________________________________________________________________________Client: Magellan Health, Eagan, MN                                                                                   Mar 2022   PresentRole: Sr. Data EngineerResponsibilities:      Built highly scalable, distributed data science models that generate high-quality datasets.      Worked with a team of 2-10 members during various projects.
      Strong understanding of Data science (Relational, dimensional, Star, and Snowflake Schema), Data analysis, Palantir Foundry, and implementations of Data warehousing using Windows and UNIX.      Working on cluster maintenance and data migration from one server to another and upgrading the ELK stack.      Proficient in designing, developing, and maintaining ETL processes using Matillion ETL for data integration, transformation, and loading tasks.      Experience in building efficient pipelines for moving data between GCP and Azure using Azure Data Factory.      Familiarity with standards such as CCD/CCDA, DIRECT, HIE, MDM, FHIR      Orchestrated the deployment of Trino clusters, ensuring optimal performance and resource utilization for distributed data processing.      Installation and configuration of Oracle Golden gate for real-time fast replication.      Developed solutions using the Alteryx tool to provide the data for the dashboard in formats that include JSON, CSV, excel, etc.      Worked with all modules of TOSCA like Modules, Data Lake, DBA database, azure data factory Requirements, Test case Design & Execution List      Performed data science using the Scikit-learn package in Python.      Developed robust data pipelines for healthcare applications, leveraging HL7 FHIR standards.      Deep experience designing data warehouse/data lake and ETL architectures with Hadoop ecosystem, such as Spark, Hive, Elastic Search, Kibana, Presto, Kafka, Synapse, HBase, SQL databases, and HDFS, DBA.      Implemented best practices for data quality, validation, and error handling within Matillion ETL workflows to ensure the accuracy and reliability of data.      Performed change impact analysis on data processes and systems, anticipating potential issues and conducting root cause analysis for any unforeseen problems.      Implemented and automated DV package daily/weekly/monthly dashboards using Tableau to track key metrics.
      Created and developed data load and scheduler process for ETL jobs using the Matillion ETL package.      Used python Boto 3 to configure the services AWS glue, EC2, S3, Data Lake.      Worked on PL/SQL databases like HBase imported data from MySQL processed data using Hadoop Tools and exported to Cassandra NoSQL database.      Adept with various distributions such as Cloudera Hadoop, Hortonworks, Elastic Cloud, and Elasticsearch.      Designed and maintained data models to ensure efficient storage and retrieval of structured and unstructured data.      Engineered robust data pipelines to integrate diverse customer data sources, ensuring a unified and comprehensive view of customer information.      Created use case-specific implementation guides by HL7 standards including FHIR.      Implemented data ingestion processes to extract, transform, and load HL7 FHIR data from diverse sources, ensuring data integrity and compliance with regulatory standards.      In command of setup, configuration, and security for Hadoop clusters using Kerberos      Worked on Dimensional Data modeling in Star and Snowflake schemas and Slowly Changing Dimensions (SCD).      Extensive experience with UNIX shell scripting for automating ELT workflows, data cleansing, and file manipulation tasks.      Extensive use of cloud shell SDK in GCP to configure/deploy the services DataProc, Storage, and Big Query.      Utilized BigID s machine learning capabilities to classify and tag sensitive data, enhancing data protection and privacy measures.      Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.      Set up AWS Repo and Pipelines for CI/CD Pipelines deployment of objects.      Implemented data aggregation scripts using elastic search and/or spark to evaluate backend services from functional as well as performance points of view.      Implemented security measures and access controls within OEM 13, safeguarding sensitive database information and ensuring compliance with security policies.      Creating AWS notebooks using PL/SQL, Python, and automated notebooks using jobs.      Leveraging MongoDB Aggregation Framework for complex data analysis, generating insights and metrics, and optimizing data processing efficiency. As a MongoDB Data Engineering,
      I designed and maintained database systems, utilizing my expertise in MongoDB.
      Implemented Spark using Scala and MSSQL for faster testing and processing of data.      Demonstrated substantial depth of knowledge and experience in a specific area of Big Data and development.      Provided technical guidance and support to team members on HL7 FHIR implementation and related technologies.      Expertise in Snowflake advanced concepts like setting up resource monitors, and RBAC.      Designed and implemented end-to-end data pipelines within Palantir Foundry, ensuring efficient extraction, transformation, and loading (ETL) processes for diverse datasets.      Created Python Framework for AWS Batch Cloud automation, Multiprocessing eod daily and intraday upload and extract using AWS SDK and CDK.      Integrated and optimized Customer Data science, ensuring seamless data flow across systems and providing a centralized hub for customer data.      Implemented real-time monitoring and alerting mechanisms within OEM 13, enabling proactive identification and resolution of database issues.      Utilized pod affinity and anti-affinity rules in Kubernetes for workload placement, optimizing data locality, and minimizing communication latency.      Implemented ETL framework to provide features such as Master Data Management, ETL - restart capability, security model, and version control.      Installed, configured, and managed Oracle Golden Gate 11g/12c, Golden Gate      Worked on migrating data from Teradata to AWS using Python and BI tools like Alteryx.      Implemented pipeline using PySpark. Also used Talend spark components.      Developed workflows in Oozie and scheduled jobs in Mainframes by preparing data refresh strategy documents & Capacity planning documents required for project development and support.      Created Hive External tables and loaded the data into tables and query data using HQL.
      Set up and configure MS SQL server installations, user security, replication, and linked servers. Collaborating with Power BI and Salesforce teams to integrate MongoDB data into business processes, providing data insights and analytics for sales and marketing.
      Implemented security measures and access controls within MDM, ensuring that only authorized users have access to sensitive master data information.      Snowflake, AWS Redshift, S3, Redshift Spectrum, RDS, Glue, Athena, Lambda CloudWatch, HIVE, Rest.      Integrated external data sources and APIs into Google Looker to enrich existing datasets and enhance reporting capabilities.      Knowledge of AWS pricing models. Knowledge of IAAS, PAAS models, and Synapse Data Lake, Implemented ELT data pipeline for marketing data.      Analyzed the data by performing Hive queries (HiveQL) and running Pig Scripts (Pig Latin).      Developed and optimized complex PL/SQL queries and scripts for data extraction and transformation within ETL processes.      Built dashboards and reports using Qualtrics survey data to provide insights and support decision-making processes.      Create Terraform scripts to automate the deployment of EC2 Instance, S3, EFS, EBS, IAM Roles, Snapshots, and Jenkins Server.      Provided support in Maintaining and enhancing legacy Oracle DBA databases in support of production break fixes and application enhancements.      Extensive experience in working with HDFS, PIG, Sqoop, Flume, Hive, Oracle DB, Phoenix, and ELK Stack.      Ingested data from disparate data sources using APIs, connectors, and cloud functions and transformed it using DB (data build tool), Google Big Query, and PL/SQL to provide for further analysis in Snowflake.      Conducted data quality assessments and implemented data cleansing processes using Big ID s data discovery and classification tools.      Used python Boto 3 to configure the services AWS glue, EC2, S3, Looker NLP      Hands-on expertise with AWS Databases such as RDS (Aurora DB), Redshift, DynamoDB, and Elastic Cache (Memcached & Redis).      Provided training and support to end-users on utilizing Google Looker effectively for self-service reporting and analysis.      Developed multi-cloud strategies in better using GCP (for its PAAS) and Azure data bricks (for its SAAS).      Involved in Cube Partitioning, Refresh strategy and planning, and Dimensional data visualization in Analysis Services (SSAS) Hands-on use of Spark and Scala APIs to compare the performance of Spark with Hive and MySQL, and Spark SQL to manipulate Data Frames in Scala.
      Responsible for setting up and maintaining Prod, Dev, and UAT environments for development and testing teams.
      Used Domo to create data pipelines and provide data to the marketing and analyst team for further dashboard reporting for clients. Migrated PL/ SQL 2017 databases from on-prem to AWS IAAS server. Backup, restore, import exports, users create and manage.
______________________________________________________________________________________Client: First American, Chicago, IL                                                                                 Oct 2019   Feb 2022Role: Sr. Data EngineerResponsibilities:      Collaborated with business analysts and stakeholders to gather requirements and translate them into technical solutions.
      Tuned and performed optimization techniques for improving report/dashboard performance.      Created Data Pipeline and implemented it using PySpark, Python, PyCharm, Hive, AWS S3, EMR, HDFS, Redshift, and Airflow to process product and sales data.
      Worked in Python data manipulation for loading and extraction as well as with Python libraries such as NumPy and Pandas for data analysis.      Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive metastore with MySQL, Databricks which stores the metadata for Hive tables.      Creating documentation on Confluence and Updating Jira stories on a timely basis.      Responsible for handling the design review calls with Architects and demos with Business. Extensively involved in Design and Code Reviews and Performance Tuning to ensure the delivery of high-quality applications and products.
      Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Gradient Boosting, and Random Forest using Python Scikit-learn to predict whether a patient might be readmitted.      Optimized storage and processing capabilities within the data Lakehouse      Designed and implemented data pipelines to extract survey data from Qualtrics using APIs.      Experienced in building scalable and efficient data pipelines to extract, transform, and load large volumes of data from various sources into data warehouses or data lakes.      Implemented data validation and quality checks to ensure the accuracy and reliability of Qualtrics survey data.      Ensured effective data governance and security measures in the data Lakehouse framework.      Worked closely with Data Stewards and Data Owners to resolve data governance issues and ensure alignment with business objectives.      Designed and implemented data pipelines to extract, transform, and load (ETL) data into Google Looker for streamlined reporting processes.      Developed a comprehensive understanding of modern enterprise data management concepts such as Data Lakehouse architecture.      Develop and deploy the outcome using spark and Scala code in the Hadoop cluster running on GCP.      Used clustering algorithms like k-means and hierarchical clustering to cluster the data and Devised recommendation models using different machine learning algorithms and validated the models.
      Used an ensemble of models to improve the prediction accuracy by up to 75% Conducting studies, rapid plots, and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.      Optimized MDM systems for scalability, ensuring the platform can handle growing volumes of master data while maintaining optimal performance.      Code review of all the Kafka Unit test case documents, Palantir Foundry, Data Lake, Talend, and EQA documents, completed by a team with a proper review checklist. Also, do development for same.      Experience monitoring overall infrastructure security and availability, and monitoring of space and capacity usage including Hadoop, Hadoop clusters, and Hadoop APIs.      Strong experience in migrating other databases to Snowflake.      Extracted data from various source systems such as databases, flat files, DB, etc. and written MS MySQL queries or utilized Informatica connectors to retrieve data from different sources.
      Worked on data pre-processing and cleaning the data to perform feature engineering and data imputation techniques for the missing values in the dataset using Python.
      Delivered the components to take text data from the raw layer for each source all the way to model input for the machine learning models.
      Work with business stakeholders in creating and grooming the product and sprint Backlog items in Jira.      Developed interactive dashboards and reports using Google Looker to visualize key performance indicators (KPIs), trends, and metrics for business stakeholders.      Extensively worked on XML technologies like XSD, XSLT, Data Lake, Kubernetes, XQuery, and PySpark for data transformations in MuleSoft.      Designed a visual dashboard in Tableau using data extracted from MES Oi. 153 KPIs visually transcribed.      Create JSON output files using Alteryx and merge them using JavaScript and Lua, to replicate the data on front front-end executive dashboard.      Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances.      Worked on implementing hive-HBase integration by creating hive external tables and using an HBase storage handler.      Engineered data pipelines to integrate and consolidate customer data from various sources, creating
      schema to replace non-performant legacy datastore for one of Palantir s largest commercial customers      Monitoring and logging tools such as ELK Stack (Mirth Connect, Elasticsearch, and Kibana).      Designed and implemented a configurable data delivery pipeline for scheduled updates to customer-facing data visualization built with Python.      Conducted performance tuning exercises on Palantir Foundry data processes to ensure scalability and efficient resource utilization.      Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling.      Experience in using Snowflake Clone and Time Travel.      Excellent interpersonal and communication skills, creative, research-minded, technically competent, and result-oriented with problem-solving and leadership skills._______________________________________________________________________________________Client: Comcast, Kansas City, MO                                                                                    Nov 2016   Sep 2019Role: Data Engineer
Responsibilities:      Collaborated with development and operations teams to address complex issues and ensure timely incident resolution.      Used Tableau to produce the dashboards which will compare the results before using this solution and after using this solution in a staging environment.      We have implemented real-time data processing solutions, enabling the analysis of customer interactions at the moment and facilitating timely decision-making.      Developed and implemented data pipelines for extracting, transforming, and loading (ETL) large volumes of data using the Big ID platform.      Implemented and maintained Control-M scheduling solutions to automate and orchestrate ETL workflows, ensuring timely and efficient data processing.      , etc.      Written Hive queries to transform data for further downstream processing.      Implemented CI/CD Pipelines process on Apache Airflow and DAGs by building Airflow Docker images by Docker-Compose and deploying on AWS batch, PySpark ECS Cluster Automate dv package resulting scripts and workflow using Apache Airflow and shell scripting to ensure corn jobs in productions.
      Worked with Informatica developers to troubleshoot and resolve data quality issues, ensuring data integrity and reliability across enterprise data platforms.      Implemented and maintained data governance policies, standards, and procedures to ensure data quality and integrity across multiple projects.      We have implemented advanced features such as filters, drill-downs, and parameterized reports in Google Data Studio to empower users to explore data dynamically.      Executed process improvements in data engineering using Alteryx processing engine and MySQL      Utilized Palantir Foundry's data modeling capabilities to design and implement effective data schemas, supporting the organization's analytical and reporting needs.      Maintaining regular communication with the customer.      Conducted regular performance tuning and capacity planning for Control-M environments to handle growing data volumes and workload demands effectively.      Experienced in Vue TypeScript or Node.JS Postgres, Synapse, PL/SQL, and building Data Pipeline in AWS including ELT (Spark, Glue, Azure Data Factory, Qualtrics, etc.)      Define virtual warehouse sizing for Snowflake for different types of workloads.      Experience in conducting several digital transformations      Developed and maintained Informatica mappings, transformations, and workflows to support complex data integration requirements across various source systems and data warehouses.      Worked on the backend using Scala and Spark to perform several aggregation logics.      Working very closely with the Architecture group and driving solutions.      Wrote PL/SQL (DDL and DML) queries, stored procedures used them to build packages, and handled slowly changing dimensions to maintain the history of the data.      Managed Zookeeper configurations and Z Nodes to ensure High Availability on the Hadoop Cluster      Experienced with setup, configuration, and maintain ELK stack (Elasticsearch, Logstash, and Kibana) and Open Grok source code (SCM)      Developed custom scripts and solutions using Control-M's API and CLI to extend its functionalities and meet specific business requirements.      In-depth knowledge of Snowflake Database, Schema, and Table structures.      Worked on submitting the Spark jobs which shows the metrics of the data which is used for Data Quality Checking.      Designed various Jenkins jobs to continuously integrate the processes and executed CI/CD pipeline using Jenkins_______________________________________________________________________________________Client: Absolut Data Analytics, Gurgaon, HR                   					Oct 2014   Jun 2016Role: Data EngineerResponsibilities:      Involved in Agile methodologies, daily Scrum meetings, and Sprint planning.      Performed Data transformations in HIVE and used partitions, and buckets for performance improvements.      Integrated Bitbucket with Jenkins for automated code deployments, ensuring smooth and efficient delivery of software artifacts.      Collaborated with cross-functional teams to troubleshoot and resolve build and deployment issues, ensuring minimal downtime and maximum reliability.      Utilized DBT's modeling capabilities to create structured, reliable, and maintainable data models for analytics and reporting purposes.      I have implemented best practices for data governance and metadata management within Informatica PowerCenter, ensuring compliance with organizational data standards and policies.      Worked as a Spark Expert and performance Optimizer.      Member of Spark COE (Center of Excellence) in Data Simplification project at Cisco      Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDDs, Spark.      Handled Data Skewness in Spark-SQL.      Conducted performance tuning and optimization of Neo4j queries and database operations to improve overall system efficiency and scalability.      Implemented Spark using Python, and Java and utilizing Data frames and Spark SQL API for faster processing of data.      Collaborated with cross-functional teams to understand data requirements and implement SnowPipe solutions tailored to specific business needs.      Ensure deliverables (Daily, Weekly & Monthly MIS Reports) are prepared to satisfy the project requirements cost, and schedule      Configured and managed Bitbucket repositories, branches, and permissions for version control and collaboration within the development team.      Designed and developed Snow Pipe integration workflows, ensuring efficient and reliable data ingestion from various sources into Snowflake.      Worked on a direct query using Power BI to compare legacy data with the current data and generated reports and stored dashboards.      Designed SSIS Packages to extract, transfer, and load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)      Experience in Data Extraction from and loading into heterogeneous Data Sources such as Oracle, MS SQL Server, Flat files, XML, and loaded into SAP ECC, SAP HANA, and Data Warehouses.      Optimized Snow Pipe configurations to enhance data ingestion performance, minimizing latency and improving overall system throughput.      Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports      Created action filters, parameters, and calculated sets for preparing dashboards and worksheets using Power BI      Developed visualizations and dashboards using Power BI      Used ETL to implement the Slowly Changing Transformation, to maintain historical data in the Data warehouse.      Designed and maintained Jenkins jobs for continuous integration and continuous deployment of data pipelines and applications.      Designed and implemented data transformation pipelines using DBT (Data Build Tool) to streamline data processing workflows.      Implemented custom Spark transformations and actions to address specific business requirements and data transformations.      Compiled data from various sources to perform complex analysis for actionable results      Measured Efficiency of Hadoop/Hive environment ensuring SLA is met      Optimized the TensorFlow Model for efficiency      Analyzed the system for new enhancements/functionalities and performed Impact analysis of the application for implementing ETL changes      Implemented a Continuous Delivery pipeline with Docker, Git Hub, and AWSEducation:*Bachelor in Computer Science Osmania University (2014)

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise