Editing Mika/Temp/WikiFCD

From WikiDotMako

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 50: Line 50:


<!-- Please write your response below -->
<!-- Please write your response below -->
[[File:Marketvegetables.jpg|thumb|200px]]
Food composition data (FCD) are an essential part of nutrition research. FCD provide nutrient data for both processed (e.g. veggie burger) and unprocessed (e.g. apples) food. These information are often compiled by a governmental agency (e.g. U.S. Department of Agriculture), based on lab measurements by the agency or companies. Many FCDs are available online, although they come in many different formats (e.g. PDF, CSV).


Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. [[Food_composition_data| Food Composition Data (FCD)]] is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.  
There are vast variations in food that are commonly consumed across regions. Additionally, nutrients of unprocessed food could vary even for the same item, because of the variations in terroir. Area- and time-specific data are key to understanding nutrition and health. Additionally, some countries lack these data or only have data that are out of date, leading to disparities in data and ultimately, in scientific evidence in health research.  


We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system.
Despite several attempts by research institutes and intergovernmental agencies to create a global FCD in the past, none has succeeded in developing a universally accessible, up-to-date, and comprehensive global FCD. The wiki system has a potential to bring a better solution to this problem. We propose WikiFCD to compile nutrition data for unprocessed food that are published and already available online. These existing databases come from diverse settings, which becomes costly and difficult to maintain if the contributors are limited to small sets of researchers and employees in this field. The need for diverse participants is very much in line with the missions of projects supported by Wikimedia Foundation and we hope to show how peer production can contribute to improvement in knowledge disparities in global nutrition.


===What is your solution to this problem?===
===What is your solution to this problem?===
Line 63: Line 63:
<!-- Please write your response below -->
<!-- Please write your response below -->


'''1. What is the solution to this problem?'''
# We will create a wikibase for food composition data from around the world. We will populate the wikibase with nutrition information. We will write schemas to describe our data model. We will map our properties to Wikidata properties.


'''We propose a Wikibase instance, WikiFCD, to create a global nutrient FCD. This wiki-based system will engage participants from diverse wiki communities to make this database universally accessible, up-to-date, and comprehensive.'''
# Why is this a good idea?
First, this database can contribute to a significant advancement in nutritional research as this wikibase system will improve usability of FCD from different sources, identify and borrow most appropriate data in places where up-to-date FCD are not readily available, and open up new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year).  


Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition.
Secondly, it is important to put this data into Wikibase because this dataset is relevant for people from many language communities. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.
We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see [[Mika/Temp/WikiFCD#Project_plan|Project Plan]] section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.
 
'''2. Why is this a good idea?'''
 
* First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.
 
: Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.
 
* Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.
 
* Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.


==Project goals==
==Project goals==
Line 86: Line 75:


<!-- Please write your response below -->
<!-- Please write your response below -->
''' Goal #1: We will build and test the wikibase in which food composition data from diverse settings can be entered, maintained, and retrieved. '''
''' Goal #1: We will build a wikibase where food composition data from around the world and from different time points can be entered, maintained, and retrieved. '''


''' Goal #2: We will translate and link the data into other languages.'''
''' Goal #2: We will involve participants from diverse communities to make sure that all available data are accommodated in this database.'''


''' Goal #3: We will involve participants from diverse communities to make sure that all available data are accommodated and made available in this database.'''
''' Goal #3: We will make it possible for Wikidata to incorporate the data in this database, if the community finds any of the data to be relevant.'''


== Project impact ==
== Project impact ==
Line 104: Line 93:
<!-- Please write your response below -->
<!-- Please write your response below -->


1. Proof-of-concept  
# Proof-of-concept  
::a. We will use 5 databases mentioned below to test if our schema is appropriate to accommodate various information included in databases from different places.
## We will use databases listed in [http://www.fao.org/infoods/infoods/tables-and-databases/en/| FAO/INFOODS] to test if our schema is appropriate to accommodate various information included in databases from different places.
::b. Once the project is over, other databases can be entered, following the examples we develop in this project.
## Once the project is over, other databases not included on FAO/INFOODS can be entered, following the examples we develop in this project.


2. Methodology
# Methodology
::a. We will develop a tutorial and documentation for edit-a-thon participants to follow.
## We will develop a tutorial and documentation for editathon participants to follow.
::b. Once the project is over, these tutorials and documentation can be used by future participants to enter and maintain the database.
## Once the project is over, these tutorials and documentation can be used by future participants to enter and maintain the database.
 
3. Alignment with WMF strategy
::a. One of the elements of Wikimedia’s strategy focuses on “Knowledge equity”, which includes “communities that have been left out by structures of power and privilege”.
::b. Supporting multiple language communities serves this purpose, as food composition databases are more common in English and languages spoken in the EU.


=== Do you have any goals around participation or content? ===
=== Do you have any goals around participation or content? ===
Line 121: Line 106:


<!-- Please write your response below -->
<!-- Please write your response below -->
# A Wikibase instance with good documentation, a narrative, and an example that can be used for diverse needs (e.g. Wikidata and Wikipedia editors and viewers, future Wikibase users).
# n new
# 50 participants covering at least 10 languages.
# n new data sources  
# 10 new participants from the nutrition communities who have never been involved in wiki-based projects.
# 5 new data sources.


==Project plan==
==Project plan==
Line 134: Line 117:


;System development
;System development
# Description - We will use the docker image of Wikibase created by WMDE [https://github.com/wmde/wikibase-docker].  We will use QuickStatements as well as custom bots developed using the [https://github.com/SuLab/WikidataIntegrator| WikidataIntegrator python library] to populate the Wikibase.
# Description -  
# Outputs - A wikibase instance named WikiFCD hosted on a university server.
# Outputs -  
; Data Modeling & bulk data import
;Implementation
# Description- We will use [http://shex.io/shex-semantics/index.html| ShEx] to express the schemas for our data models. We will align the properties in our Wikibase with relevant Wikidata properties.
# Description -  
::We will first create a wikibase, based on our analyses of 2 large food composition databases as the starting examples:
# Outputs -  
:: 1) [http://www.fao.org/fileadmin/templates/food_composition/documents/AnFooD2.0.xlsx FAO/INFOODS Analytical Food Composition Database Version 2.0 (AnFooD2.0)]
:: 2) [https://fdc.nal.usda.gov/download-datasets.html USDA Foundation Foods database December 2019]
 
: 2. Outputs- ShEx schemas and SPARQL query code
 
;Editathon (Data entry, cross-reference checking, and translation)
# Description - We will host five Edit-a-thon events in Seattle (2), in Boston (1), in Baltimore (1), and in Kerala (1) to check "automated" data entries against the original databases, check global vs individual databases, and translate data in this Wikibase into Japanese, Spanish, and Malayalam. Appropriate communities (universities and wikidata communities) have been contacted for hosting these events.
 
::We will check if overlap exists between these larger databases and individual databases listed below. Then we will add any information that was not cited in the larger databases above.
:: 3) [https://drive.google.com/file/d/1eqQ578gHiPoIaHaVYjQa_3sFe_LzGhm1/view Indian Food Composition Tables 2017]
:: 4) [http://www.fao.org/3/I8897EN/i8897en.pdf Kenya Food Composition Tables 2018].
:: 5) [https://www.mext.go.jp/en/policy/science_technology/policy/title01/detail01/sdetail01/sdetail01/1385122.htm STANDARD TABLES OF FOOD COMPOSITION IN JAPAN - 2015 - (Seventh Revised Edition)]
: 2. Outputs - Linked FCD in Wikibase.
 
;Documentation
;Documentation
#Description - The process of finding datasets, identifying meta-data (e.g. copyright, year of publication), entering data, translating data, and using data for analyses will be documented.
#Description -  
#Outputs - We will generate multiple ShEx schemas that will help us communicate our data model to stakeholders. We will write a tutorial for users of the system. We will write federated SPARQL queries that others may reuse that demonstrate how to combine WikiFCD data with data from Wikidata.
#Outputs -  
 
;Communication
;Communication
#Description - Promotion of project outputs, feedback gathering, presentation at Wikimania and nutrition workshops, tutoring of interested volunteers
#Description -  
#Outputs - Blog posts, feedback reports, ShEx schemas
#Outputs -  
 
;Project management
;Project management
#Description - We will report our progress twice in 12 months.
#Description -  
#Outputs - Mid-term report (6 months), Final report (12 months).
#Outputs -  


{| class="wikitable"
{| class="wikitable"
! WP/Month !! 1 !! 2 !! 3 !! 4 !! 5 !! 6 !! 7 !! 8 !! 9 !! 10 !! 11 !! 12  
! WP/Month !! 1 !! 2 !! 3 !! 4 !! 5 !! 6 !! 7 !! 8 !! 9 !! 10 !! 11 !! 12 !!
|-
|-
| WP1 - System development || X || X || X ||X || X ||X ||X ||X ||X ||X ||X ||   
| WP1 - System development || X || X || X || || || || ||  
|-
|-
| WP2 - Data Modeling || X || X || X || X || X || X || X || X || || || ||  
| WP2 - Implementation || || X || X || X || X || X || ||  
|-
|-
| WP3 - Edit-a-thon ||  ||  ||  ||  ||  || X || X ||X ||X ||X ||  || 
| WP3 - Documentation ||  ||  ||  ||  || X || X || X || X  
|-
|-
| WP4 - Documentation ||  ||  || ||  || X || X || X || X   || X  || X  || X || X  
| WP4 - Communication ||  ||  || x || X || X || X || X || X  
|-
|-
| WP5 - Communication ||  ||  || X || X || X || X || X || X  || X || X || X || X
| WP5 - Project management || X || X || X || X || X || X || X || X  
|-
| WP6 - Project management || X || X || X || X || X || X || X || X ||X ||X ||X ||X  
|}
|}


Line 185: Line 149:
''How you will use the funds you are requesting? List bullet points for each expense.  (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!''<br/><br/>
''How you will use the funds you are requesting? List bullet points for each expense.  (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!''<br/><br/>
<!-- Please write your response below -->
<!-- Please write your response below -->
Our budget includes the costs of two main developers who will be involved with the project data models and software engineering, for the duration of the project. Engineering/scientist positions will be filled by the grantees. The positions are split as follows: One person working on data models and model alignment with Wikidata and one person working on maintaining the Wikibase and leading bot development work. The community outreach intern (to be recruited) will work with wiki communities, academic researchers, and students to organize edit-a-thon events.
For dissemination reasons, we are planning to visit Wikimania, to talk with editors in person about their needs and wishes for contributing food composition data. Wikimania is the right venue for this, as it will have a large pool of editors from different Wikipedia language versions.
{| class="wikitable"
{| class="wikitable"
! Item !! Budget
! Item !! Budget
|-
|-
| Data scientist (10 hours per week for 8 months (34 weeks)) || $30x10x34 = $10,200
| Senior Data scientist (x hours per week for 8 months) || $
|-
| Software engineer (10 hours per week for 8 months)|| $30x10x34 = $10,200
|-
| Community outreach/Communication intern (8 hours per week for 8 months) || $25x8x34 = $6,800
|-
|-
| Server hosting (12 months) || $22x12 = $264
| Community outreach intern (x hours per week for 8 months) || $
|-
|-
| Edit-a-thon event costs || $1000 x 5 = $5,000
| Server hosting || $
|-
|-
| Travel (Wikimania 2020, 2 people) || $ 4,000
| Travel (Wikimania 2020, 2 people) || $ 4,000
|-
|-
| Total || $ 36,464
| Total || $
|}
|}


Line 211: Line 166:
''How will you let others in your community know about your project? Why are you targeting a specific audience?  How will you engage the community you’re aiming to serve at various points during your project?  Community input and participation helps make projects successful.''<br/><br/>
''How will you let others in your community know about your project? Why are you targeting a specific audience?  How will you engage the community you’re aiming to serve at various points during your project?  Community input and participation helps make projects successful.''<br/><br/>
<!-- Please write your response below -->
<!-- Please write your response below -->
* Wikipedian communities in Seattle and in India
* Facebook?
* Academic nutrition communities in Boston and in Baltimore.
* Blog post?
* We will host a workshop at Wikimania 2020.
* Workshop at Wikimania?
* We will share our data models via ShEx schemas
* We will share SPARQL queries that others can use to combine WikiFCD data with Wikidata


==Get involved==
==Get involved==
Line 223: Line 176:
''Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.''<br/><br/>
''Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.''<br/><br/>
<!-- Please write your response below -->
<!-- Please write your response below -->
* Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]
* Project manager (volunteer) - Mika Matsuzaki
* Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]
* Data scientist - Kat Thornton  
* Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]
* Advisor -
* Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage
* Volunteer -
* Volunteer -


===Community notification===
===Community notification===
Please note that all contributions to WikiDotMako are considered to be released under the Attribution-Share Alike 3.0 Unported (see WikiDotMako:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)

Templates used on this page: