(→Impact: add) |
|||
Line 46: | Line 46: | ||
* Database Design and Population | * Database Design and Population | ||
We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance. | We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance. | ||
Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. | |||
This [query | https://tinyurl.com/y99qtk7p] is an example of all of the food items in our project Wikibase that have an associated image in Wikimedia Commons. | |||
=Impact= | =Impact= |
Revision as of 11:52, 30 July 2020
This page is for writing down ideas for grants.
List of potential grants and deadlines
Organization | Category | Deadline | Funding Aims | Amount |
---|---|---|---|---|
NSF | Information Integration and Informatics (III) under CISE | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects | "The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:
|
up to $500,000 total budget with durations up to three years |
Project Aims
Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.
We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system.
First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.
Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.
Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.
Outputs
- list products and other outputs
- Wikibase instance with FCD data from multiple sources
- SPARQL query code to combine this data with subsets of Wikidata data
- Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas
Methods
- Database Design and Population
We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.
Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. This [query | https://tinyurl.com/y99qtk7p] is an example of all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.
Impact
This project will provide a knowledge graph that will support queries across multiple food composition tables with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data.
This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names in as many human languages as possible.
Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask.
We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.
People
- Project manager/nutritional epidemiologist (volunteer) - Mika Matsuzaki
- Data scientist - Kat Thornton
- Software Engineer- Kenneth Seals-Nutt
- Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage