Sleep Analysis
Overview
This project analyzes sleep data collected from Sleep Cycle iOS App on Kaggle in order to determine what contributes to good sleep. Factors examined are caffeine consumption, physical activity, heart rate, and length of time sleeping. We used Linear Regression Model to train our data so we could measure the quality of a person’s sleep based on these factors.
Questions to answer with the data
- Which behaviors contribute to quality sleep?
- How does one quantify “good sleep”?
- What will my sleep quality be?
- How well rested will I be?
Data discovery and selection
Initially, we looked for topics of interest on Kaggle to see the data that was available. We came upon the sleep data of an individual who tracked his/her sleep patterns for just over 3 years. The dataset started with 8 columns, one of which was titled “Sleep Notes” and had several features in it. Part of the initial clean up of this data was to break those out to separate columns.
Original Kaggle Data Set - Sleep Cycle iOS App
Database
Initially, we talked about using other database web applications. However due to the potential cost and our desire to use free applications, we selected MongoDB Atlas and Compass to store and connect to our dataset.
- Free
- Simple
- High Speed
- Flexible
Database Schema
Mongo DB
Starting Data
Modified Data
Data Cleaning
Preliminarily, the data preprocessing and cleaning reduced our dataset to 162 useable data points. Then the plan was to try to predict what emoji would result. Based on this initial plan, a classifier model was selected for our Machine Learning Model. Once this initial model was run, we determined that the useable data was not enough and using a classifier machine learning model was not appropriate.
Pivot Direction
We decided to make our target the Sleep Quality feature as it has more outcomes to predict. This change in plan required more preprocessing. A df_tranformed.csv has been generated where all features are in the form of a 1 or 0, allowing for model functionality and keeping 887 complete rows of data.
Linear Regression Model
Model Testing
Flask Assembly
Project Challenges
- Dataset shortcomings
- Casting user inputs to integers
- Achieving high accuracy model
- Shaping variables
Final Product
Finished Model
Web Page
Dashboard
Visualizing the Data with Tableau:
Improvement Ideas
- Save user inputs and retrain model as more data is gathered.
- Format user inputs to adhere to needed numbers or trigger an error.
- Suggest ways to improve sleep quality
- Build marketing database of users for various sponsors.