Mika/Temp/WikiFCD

From WikiDotMako

Taken from an example project for this grant.

Template:Probox

Project idea

Template:TOC right

What is the problem you're trying to solve?

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Food composition data (FCD) are an essential part of nutrition research. FCD provide nutrient data for processed/cooked (e.g. veggie burger, hard-boiled eggs) and unprocessed (e.g. apples) food. Many FCDs are available online, although they come in various different formats (e.g. PDF, CSV) with varying degrees of details in content. Nutrient content of unprocessed food can also vary for the same item because of factors such as climate and terroir. Area- and time-specific data are key to understanding nutrition and health. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally suitable FCD, FCD in their own languages, or only have older FCD, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research.

Despite several attempts by research institutes and intergovernmental agencies to create a global FCD in the past, none has succeeded in developing a universally accessible, up-to-date, easy-to-use, and comprehensive global FCD. Development and maintenance of these databases are difficult if the contributors are limited to small sets of researchers and employees in this field. The wiki system has a potential to bring a better solution to this problem. We propose WikiFCD to compile detailed food composition data that are already available online. The need for diverse participants in this project is very much in line with the missions of projects supported by Wikimedia Foundation and, through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition.

What is your solution to this problem?

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.


1. What is the solution to this problem?

We will use an iterative process to test several automated and manual methods to populate the wikibase with nutrient data from 4 food composition databases from around the world (see Project Plan section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.

2. Why is this a good idea?

First, this wikibase system will significantly improve the usability of FCD from different sources for diverse users - from health-conscious individuals to academic researchers to public health workers. Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.

Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.

Finally, this project will reach diverse communities from around the world as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.

Project goals

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

Goal #1: We will build and test the usaa wikibase in which food composition data from diverse settings can be entered, maintained, and retrieved.

Goal #2: We will translate and link the data into other languages.

Goal #3: We will involve participants from diverse communities to make sure that all available data are accommodated in this database.

Project impact

How will you know if you have met your goals?

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.


1. Proof-of-concept

a. We will use 5 databases mentioned above to test if our schema is appropriate to accommodate various information included in databases from different places.
b. Once the project is over, other databases can be entered, following the examples we develop in this project.

2. Methodology

a. We will develop a tutorial and documentation for edit-a-thon participants to follow.
b. Once the project is over, these tutorials and documentation can be used by future participants to enter and maintain the database.

3. Alignment with WMF strategy

a. One of the elements of Wikimedia’s strategy focuses on “Knowledge equity”, which includes “communities that have been left out by structures of power and privilege”.
b. Supporting multiple language communities serves this purpose, as food composition databases are more common in English and languages spoken in the EU.

Do you have any goals around participation or content?

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

  1. 50 participants covering at least 3 languages.
  2. 3 new data sources.

Project plan

Activities

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

System development
  1. Description - We will use the docker image of Wikibase created by WMDE [1]. We will use QuickStatements as well as custom bots developed using the WikidataIntegrator python library to populate the Wikibase.
  2. Outputs - A wikibase instance named WikiFCD hosted on a university server.
Data Modeling & bulk data import
  1. Description- We will use ShEx to express the schemas for our data models. We will align the properties in our Wikibase with relevant Wikidata properties.
We will create a wikibase, based on our analyses of 2 distinct food composition databases as the starting examples first:
1) FAO/INFOODS Analytical Food Composition Database Version 2.0 (AnFooD2.0)
2) USDA Foundation Foods database December 2019
Then we will check how much overlap exists between these "global" databases and individual databases listed below. If any information is omitted in the global databases, we will add those to our system.
3) Indian Food Composition Tables 2017
4) Kenya Food Composition Tables 2018.
5) STANDARD TABLES OF FOOD COMPOSITION IN JAPAN - 2015 - (Seventh Revised Edition)
  1. Outputs- ShEx schemas and SPARQL query code
Editathon (Data entry, cross-reference checking, and translation)
  1. Description - We will host five Edit-a-thon events in Seattle (2), in Boston (1), in Baltimore (1), and in Kerala (1) to check "automated" data entries against the original databases, check global vs individual databases, and translate data in this Wikibase into Japanese, Spanish, and Malayalam. Appropriate communities (universities and wikidata communities) have been contacted for hosting these events.
  2. Outputs - Linked FCD in Wikibase.
Documentation
  1. Description - The process of finding datasets, identifying meta-data (e.g. copyright, year of publication), entering data, translating data, and using data for analyses will be documented.
  2. Outputs - We will generate multiple ShEx schemas that will help us communicate our data model to stakeholders. We will write a tutorial for users of the system. We will write federated SPARQL queries that others may reuse that demonstrate how to combine WikiFCD data with data from Wikidata.
Communication
  1. Description - Promotion of project outputs, feedback gathering, presentation at Wikimania and nutrition workshops, tutoring of interested volunteers
  2. Outputs - Blog posts, feedback reports, ShEx schemas
Project management
  1. Description - We will report our progress twice in 12 months.
  2. Outputs - Mid-term report (6 months), Final report (12 months).
WP/Month 1 2 3 4 5 6 7 8 9 10 11 12
WP1 - System development X X X X X X X X X X X
WP2 - Data Modeling X X X X X X X X
WP3 - Edit-a-thon X X X X X
WP4 - Documentation X X X X X X X X
WP5 - Communication X X X X X X X X X X
WP6 - Project management X X X X X X X X X X X X

Budget

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Our budget includes the costs of two main developers who will be involved with the project data models and software engineering, for the duration of the project. Engineering/scientist positions will be filled by the grantees. The positions are split as follows: One person working on data models and model alignment with Wikidata and one person working on maintaining the Wikibase and leading bot development work. The community outreach intern will work with wiki communities, academic researchers, and students to organize edit-a-thon events.

For dissemination reasons, we are planning to visit Wikimania, to talk with editors in person about their needs and wishes for contributing food composition data. Wikimania is the right venue for this, as it will have a large pool of editors from different Wikipedia language versions.


Item Budget
Data scientist (10 hours per week for 8 months (34 weeks)) $30x10x34 = $10,200
Software engineer (10 hours per week for 8 months) $30x10*34 = $10,200
Community outreach intern (5 hours per week for 8 months) $25x5x34 = $4,250
Server hosting (12 months at Johns Hopkins School of Public Health) $22x12 = $264
Event costs $1000 x 5 = $5,000
Travel (Wikimania 2020, 2 people) $ 4,000
Total $ 33,914

Community engagement

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

  • Wikipedian communities in Seattle and in India
  • Academic nutrition communities in Boston and in Baltimore.
  • We will host a workshop at Wikimania 2020.
  • We will share our data models via ShEx schemas
  • We will share SPARQL queries that others can use to combine WikiFCD data with Wikidata

Get involved

Participants

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

  • Project manager/nutritional epidemiologist (volunteer) - Mika Matsuzaki
  • Data scientist - Kat Thornton
  • Software Engineer- Kenneth Seals-Nutt
  • Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage
  • Volunteer -

Community notification

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Discussions during drafting of the proposal
  • email correspondence and phone calls with senior Wikidata editors.
  • Feedback on draft proposal from members of Wikimedia Cascadia and WikiPathways.
Additional notifications
  • WikiProject Food
  • Open Food Facts

Endorsements

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).