Learning Based Portion Estimation with Application in Dietary Assessment and Evaluations
Accurate dietary assessment is essential for monitoring health, managing chronic diseases, and guiding nutritional interventions. Traditional self-report methods, such as 24-hour recalls, are often tedious, time-consuming, and susceptible to bias and error due to reliance on individual memory and estimation. Image-based dietary assessment methods offer a promising alternative by directly estimating nutrient intake from food images, paving the way for automated and unobtrusive dietary monitoring.
This thesis addresses the challenges of accurately estimating food portion sizes from images, leveraging advances in deep learning and computer vision. A significant challenge in image-based portion estimation is the inherent loss of depth and 3D information when capturing real-world objects in 2D images. Traditional methods often require additional inputs like depth sensors or multiple images from different angles, which can be impractical and burdensome for users.
To overcome these limitations, we develop a deep regression model that combines features from both the RGB domain and learned energy distribution maps. By supervising the training process on intermediate energy distribution outputs, the model significantly outperforms approaches that rely solely on RGB inputs. Extensive evaluations of normalization techniques for cross-domain feature adaptation are conducted to mitigate imbalances in feature spaces, further enhancing estimation accuracy. Our method achieves state-of-the-art performance on challenging real-world food image datasets, surpassing non-expert human estimates and demonstrating the potential for practical applications in automated dietary and health monitoring.
Recognizing the data-intensive nature of deep learning methods and the scarcity of high-quality, annotated datasets for food portion estimation, we introduce two new datasets to support the development and evaluation of our methods: (1) Food Replica Image Dataset: This dataset comprises images of food replicas along with corresponding depth maps and ground-truth volume measurements obtained through water displacement methods. The controlled environment allows for precise volume estimation, providing a valuable resource for supervised training and validation of volume estimation models. (2) Real-world Eating Occasion Dataset: Collected during a nutrition study, this dataset includes images of actual meals with ground-truth energy values provided by registered dietitians. It captures the complexity and variability of real-world eating scenarios, facilitating the evaluation of our proposed methods in realistic settings.
Building upon this, we propose a novel end-to-end deep learning framework that estimates food energy directly from a single monocular RGB image by reconstructing the 3D shape of food items. By employing a generative model to reconstruct voxel representations of food objects, the framework effectively recovers missing 3D information necessary for accurate portion size estimation. This approach eliminates the need for additional external physical reference, making it more practical for real-world applications.
In addition to methodological advancements, we address the critical need for high-quality annotated data by designing and implementing a semi-automatic system for online food image collection and annotation. This system includes: (1) Web Crawler: Automatically retrieves large collections of food images based on specific food labels from the internet; (2) Automatic Food Detection Tool: Filters out irrelevant images, reducing the burden on human annotators and improving dataset quality; (3) Web-based Crowdsourcing Platform: Enables efficient human annotation of food items within images.
To further facilitate fine-grained food classification necessary for accurate nutrient estimation, we design a protocol for linking food images to nutrient databases using USDA food codes. We enhance our web-based annotation tool by introducing features to systematically match generic food labels to USDA food codes, accommodating the hierarchical structure of food categorization.
Using this system, we create a comprehensive Nutrition and Food Group-Based Image Database containing 16,114 food images representing the 74 most frequently consumed food sub-categories in the United States. Each image is linked to one or more of the 1,865 USDA food codes, providing detailed nutrient information from the USDA Food and Nutrient Database for Dietary Studies (FNDDS). This database bridges the gap between visual data and nutritional analysis.
This work advances the field of image-based dietary assessment by addressing critical challenges in portion size estimation and data scarcity. The proposed methods and tools have the potential to significantly improve the accuracy of dietary assessments, reduce the burden on individuals and professionals, and contribute to better health outcomes through more precise monitoring of dietary intake.
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette