https://wiki.mako.cc/api.php?action=feedcontributions&user=Hweyl&feedformat=atom WikiDotMako - User contributions [en] 2024-03-28T17:36:46Z User contributions MediaWiki 1.38.4 https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57252 Talk:Mika/Temp/WikiFCD 2020-11-03T13:30:30Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |-<br /> ||| || || Fructose (P34) || [http://wikifcd.wiki.opencura.com/prop/P34]<br /> |-<br /> ||| || || Lactose (P35) || [http://wikifcd.wiki.opencura.com/prop/P35]<br /> |-<br /> ||| || || Maltose (P36) || [http://wikifcd.wiki.opencura.com/prop/P36]<br /> |-<br /> ||| || || Galactose (P37) || [http://wikifcd.wiki.opencura.com/prop/P37]<br /> |- <br /> ||| || || Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> ||| || || Folic acid (P40) || [http://wikifcd.wiki.opencura.com/prop/P40]<br /> |-<br /> ||| || || Folate, food (P41) || [http://wikifcd.wiki.opencura.com/prop/P41]<br /> |-<br /> ||| || || Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> ||| || || Vitamin A, RAE (P45) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> ||| || || Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49]<br /> |-<br /> ||| || || Vitamin K (phylloquinone) (P52) || [http://wikifcd.wiki.opencura.com/prop/P52]<br /> |-<br /> ||| || || Vitamin K (Dihydrophylloquinone) (P53) || [http://wikifcd.wiki.opencura.com/prop/P53]<br /> |-<br /> ||| || || Retinol (P75) || [http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> ||| || || USDA Food Data Central fcdid (P80) || [http://wikifcd.wiki.opencura.com/prop/P80]<br /> |-<br /> ||| || || Fatty acids, total saturated (P86) || [http://wikifcd.wiki.opencura.com/prop/P86]<br /> |- <br /> ||| || || Fatty acids, total polyunsaturated (P88) || [http://wikifcd.wiki.opencura.com/prop/P88]<br /> |-<br /> ||| || || Carbohydrate, by difference (P89) || [http://wikifcd.wiki.opencura.com/prop/P89]<br /> |-<br /> ||| || || Vitamin B-12 (P90) || [http://wikifcd.wiki.opencura.com/prop/P90]<br /> |-<br /> ||| || || Cholesterol (P99) || [http://wikifcd.wiki.opencura.com/prop/P99]<br /> |-<br /> ||| || || Tocopherol, delta (P100) || [http://wikifcd.wiki.opencura.com/prop/P100]<br /> |-<br /> ||| || || Tocotrienol, gamma (P101) || [http://wikifcd.wiki.opencura.com/prop/P101]<br /> |-<br /> ||| || || Tocotrienol, delta (P102) || [http://wikifcd.wiki.opencura.com/prop/P102]<br /> |-<br /> ||| || || Sugars, total including NLEA (P104) || [http://wikifcd.wiki.opencura.com/prop/P104]<br /> |-<br /> ||| || || Vitamin E (alpha-tocopherol) (P105) || [http://wikifcd.wiki.opencura.com/prop/P105]<br /> |-<br /> ||| || || Tocopherol, beta (P106) || [http://wikifcd.wiki.opencura.com/prop/P106]<br /> |-<br /> ||| || || Tocopherol, gamma (P107) || [http://wikifcd.wiki.opencura.com/prop/P107]<br /> |-<br /> ||| || || Tocotrienol, alpha (P108) || [http://wikifcd.wiki.opencura.com/prop/P108]<br /> |-<br /> ||| || || Tocotrienol, beta (P109) || [http://wikifcd.wiki.opencura.com/prop/P109]<br /> |-<br /> ||| || || Fatty acids, total monounsaturated (P119) || [http://wikifcd.wiki.opencura.com/prop/P119]<br /> |-<br /> ||| || || 8:0 (P124) || [http://wikifcd.wiki.opencura.com/prop/P124]<br /> |-<br /> ||| || || 10:0 (P125) || [http://wikifcd.wiki.opencura.com/prop/P125]<br /> |-<br /> ||| || || 12:0 (P126) || [http://wikifcd.wiki.opencura.com/prop/P126]<br /> |-<br /> ||| || || 14:0 (P127) || [http://wikifcd.wiki.opencura.com/prop/P127]<br /> |-<br /> ||| || || 16:0 (P128) || [http://wikifcd.wiki.opencura.com/prop/P128]<br /> |-<br /> ||| || || 18:0 (P129) || [http://wikifcd.wiki.opencura.com/prop/P129]<br /> |-<br /> ||| || || 20:0 (P130) ||[http://wikifcd.wiki.opencura.com/prop/P130]<br /> |-<br /> ||| || || 18:1 (P131) || [http://wikifcd.wiki.opencura.com/prop/P131]<br /> |-<br /> ||| || || 18:2 (P132) || [http://wikifcd.wiki.opencura.com/prop/P132]<br /> |-<br /> ||| || || 18:3 (P133) || [http://wikifcd.wiki.opencura.com/prop/P133]<br /> |-<br /> ||| || || 20:4 (P134) || [http://wikifcd.wiki.opencura.com/prop/P134]<br /> |-<br /> ||| || || 22:0 (P136) || [http://wikifcd.wiki.opencura.com/prop/P136]<br /> |-<br /> ||| || || 14:1 (P137) || [http://wikifcd.wiki.opencura.com/prop/P137]<br /> |- <br /> ||| || || 16:1 (P138) || [http://wikifcd.wiki.opencura.com/prop/P138]<br /> |-<br /> ||| || || 20:1 (P140) || [http://wikifcd.wiki.opencura.com/prop/P140]<br /> |-<br /> ||| || || 15:0 (P154) || [http://wikifcd.wiki.opencura.com/prop/P154]<br /> |-<br /> ||| || || 17:0 (P155) || [http://wikifcd.wiki.opencura.com/prop/P155]<br /> |-<br /> ||| || || 20:2 n-6 c,c (P165) || [http://wikifcd.wiki.opencura.com/prop/P165]<br /> |-<br /> ||| || || 18:3 n-6 c,c,c (P170) || [http://wikifcd.wiki.opencura.com/prop/P170]<br /> |-<br /> ||| || || 17:1 (P171) || [http://wikifcd.wiki.opencura.com/prop/P171]<br /> |-<br /> ||| || || 20:3 (P172) || [http://wikifcd.wiki.opencura.com/prop/P172]<br /> |-<br /> ||| || || 15:1 (P177) || [http://wikifcd.wiki.opencura.com/prop/P177]<br /> |-<br /> ||| || || Starch (P218) || [http://wikifcd.wiki.opencura.com/prop/P218]<br /> |-<br /> ||| || || Fatty acids, total trans (P271) || [http://wikifcd.wiki.opencura.com/prop/P271]<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57251 Talk:Mika/Temp/WikiFCD 2020-11-03T13:24:49Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |-<br /> ||| || || Fructose (P34) || [http://wikifcd.wiki.opencura.com/prop/P34]<br /> |-<br /> ||| || || Lactose (P35) || [http://wikifcd.wiki.opencura.com/prop/P35]<br /> |-<br /> ||| || || Maltose (P36) || [http://wikifcd.wiki.opencura.com/prop/P36]<br /> |-<br /> ||| || || Galactose (P37) || [http://wikifcd.wiki.opencura.com/prop/P37]<br /> |- <br /> ||| || || Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> ||| || || Folic acid (P40) || [http://wikifcd.wiki.opencura.com/prop/P40]<br /> |-<br /> ||| || || Folate, food (P41) || [http://wikifcd.wiki.opencura.com/prop/P41]<br /> |-<br /> ||| || || Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> ||| || || Vitamin A, RAE (P45) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> ||| || || Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49]<br /> |-<br /> ||| || || Vitamin K (phylloquinone) (P52) || [http://wikifcd.wiki.opencura.com/prop/P52]<br /> |-<br /> ||| || || Vitamin K (Dihydrophylloquinone) (P53) || [http://wikifcd.wiki.opencura.com/prop/P53]<br /> |-<br /> ||| || || Retinol (P75) || [http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> ||| || || USDA Food Data Central fcdid (P80) || [http://wikifcd.wiki.opencura.com/prop/P80]<br /> |-<br /> ||| || || Fatty acids, total saturated (P86) || [http://wikifcd.wiki.opencura.com/prop/P86]<br /> |- <br /> ||| || || Fatty acids, total polyunsaturated (P88) || [http://wikifcd.wiki.opencura.com/prop/P88]<br /> |-<br /> ||| || || Carbohydrate, by difference (P89) || [http://wikifcd.wiki.opencura.com/prop/P89]<br /> |-<br /> ||| || || Vitamin B-12 (P90) || [http://wikifcd.wiki.opencura.com/prop/P90]<br /> |-<br /> ||| || || Cholesterol (P99) || [http://wikifcd.wiki.opencura.com/prop/P99]<br /> |-<br /> ||| || || Tocopherol, delta (P100) || [http://wikifcd.wiki.opencura.com/prop/P100]<br /> |-<br /> ||| || || Tocotrienol, gamma (P101) || [http://wikifcd.wiki.opencura.com/prop/P101]<br /> |-<br /> ||| || || Tocotrienol, delta (P102) || [http://wikifcd.wiki.opencura.com/prop/P102]<br /> |-<br /> ||| || || Sugars, total including NLEA (P104) || [http://wikifcd.wiki.opencura.com/prop/P104]<br /> |-<br /> ||| || || Vitamin E (alpha-tocopherol) (P105) || [http://wikifcd.wiki.opencura.com/prop/P105]<br /> |-<br /> ||| || || Tocopherol, beta (P106) || [http://wikifcd.wiki.opencura.com/prop/P106]<br /> |-<br /> ||| || || Tocopherol, gamma (P107) || [http://wikifcd.wiki.opencura.com/prop/P107]<br /> |-<br /> ||| || || Tocotrienol, alpha (P108) || [http://wikifcd.wiki.opencura.com/prop/P108]<br /> |-<br /> ||| || || Tocotrienol, beta (P109) || [http://wikifcd.wiki.opencura.com/prop/P109]<br /> |-<br /> ||| || || Fatty acids, total monounsaturated (P119) || [http://wikifcd.wiki.opencura.com/prop/P119]<br /> |-<br /> ||| || || 8:0 (P124) || [http://wikifcd.wiki.opencura.com/prop/P124]<br /> |-<br /> ||| || || 10:0 (P125) || [http://wikifcd.wiki.opencura.com/prop/P125]<br /> |-<br /> ||| || || 12:0 (P126) || [http://wikifcd.wiki.opencura.com/prop/P126]<br /> |-<br /> ||| || || 14:0 (P127) || [http://wikifcd.wiki.opencura.com/prop/P127]<br /> |-<br /> ||| || || 16:0 (P128) || [http://wikifcd.wiki.opencura.com/prop/P128]<br /> |-<br /> ||| || || 18:0 (P129) || [http://wikifcd.wiki.opencura.com/prop/P129]<br /> |-<br /> ||| || || 20:0 (P130) ||[http://wikifcd.wiki.opencura.com/prop/P130]<br /> |-<br /> ||| || || 18:1 (P131) || [http://wikifcd.wiki.opencura.com/prop/P131]<br /> |-<br /> ||| || || 18:2 (P132) || [http://wikifcd.wiki.opencura.com/prop/P132]<br /> |-<br /> ||| || || 18:3 (P133) || [http://wikifcd.wiki.opencura.com/prop/P133]<br /> |-<br /> ||| || || 20:4 (P134) || [http://wikifcd.wiki.opencura.com/prop/P134]<br /> |-<br /> ||| || || 22:0 (P136) || [http://wikifcd.wiki.opencura.com/prop/P136]<br /> |-<br /> ||| || || 14:1 (P137) || [http://wikifcd.wiki.opencura.com/prop/P137]<br /> |- <br /> ||| || || 16:1 (P138) || [http://wikifcd.wiki.opencura.com/prop/P138]<br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57250 Talk:Mika/Temp/WikiFCD 2020-11-03T12:59:01Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |-<br /> ||| || || Fructose (P34) || [http://wikifcd.wiki.opencura.com/prop/P34]<br /> |-<br /> ||| || || Lactose (P35) || [http://wikifcd.wiki.opencura.com/prop/P35]<br /> |-<br /> ||| || || Maltose (P36) || [http://wikifcd.wiki.opencura.com/prop/P36]<br /> |-<br /> ||| || || Galactose (P37) || [http://wikifcd.wiki.opencura.com/prop/P37]<br /> |- <br /> ||| || || Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> ||| || || Folic acid (P40) || [http://wikifcd.wiki.opencura.com/prop/P40]<br /> |-<br /> ||| || || Folate, food (P41) || [http://wikifcd.wiki.opencura.com/prop/P41]<br /> |-<br /> ||| || || Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> ||| || || Vitamin A, RAE (P45) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> ||| || || Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49]<br /> |-<br /> ||| || || Vitamin K (phylloquinone) (P52) || [http://wikifcd.wiki.opencura.com/prop/P52]<br /> |-<br /> ||| || || Vitamin K (Dihydrophylloquinone) (P53) || [http://wikifcd.wiki.opencura.com/prop/P53]<br /> |-<br /> ||| || || Retinol (P75) || [http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> ||| || || USDA Food Data Central fcdid (P80) || [http://wikifcd.wiki.opencura.com/prop/P80]<br /> |-<br /> ||| || || Fatty acids, total saturated (P86) || [http://wikifcd.wiki.opencura.com/prop/P86]<br /> |- <br /> ||| || || Fatty acids, total polyunsaturated (P88) || [http://wikifcd.wiki.opencura.com/prop/P88]<br /> |-<br /> ||| || || Carbohydrate, by difference (P89) || [http://wikifcd.wiki.opencura.com/prop/P89]<br /> |-<br /> ||| || || Vitamin B-12 (P90) || [http://wikifcd.wiki.opencura.com/prop/P90]<br /> |-<br /> ||| || || Cholesterol (P99) || [http://wikifcd.wiki.opencura.com/prop/P99]<br /> |-<br /> ||| || || Tocopherol, delta (P100) || [http://wikifcd.wiki.opencura.com/prop/P100]<br /> |-<br /> ||| || || Tocotrienol, gamma (P101) || [http://wikifcd.wiki.opencura.com/prop/P101]<br /> |-<br /> ||| || || Tocotrienol, delta (P102) || [http://wikifcd.wiki.opencura.com/prop/P102]<br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57249 Talk:Mika/Temp/WikiFCD 2020-11-03T12:56:48Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |-<br /> ||| || || Fructose (P34) || [http://wikifcd.wiki.opencura.com/prop/P34]<br /> |-<br /> ||| || || Lactose (P35) || [http://wikifcd.wiki.opencura.com/prop/P35]<br /> |-<br /> ||| || || Maltose (P36) || [http://wikifcd.wiki.opencura.com/prop/P36]<br /> |-<br /> ||| || || Galactose (P37) || [http://wikifcd.wiki.opencura.com/prop/P37]<br /> |- <br /> ||| || || Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> ||| || || Folic acid (P40) || [http://wikifcd.wiki.opencura.com/prop/P40]<br /> |-<br /> ||| || || Folate, food (P41) || [http://wikifcd.wiki.opencura.com/prop/P41]<br /> |-<br /> ||| || || Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> ||| || || Vitamin A, RAE (P45) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> ||| || || Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49]<br /> |-<br /> ||| || || Vitamin K (phylloquinone) (P52) || [http://wikifcd.wiki.opencura.com/prop/P52]<br /> |-<br /> ||| || || Vitamin K (Dihydrophylloquinone) (P53) || [http://wikifcd.wiki.opencura.com/prop/P53]<br /> |-<br /> ||| || || Retinol (P75) || [http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> ||| || || USDA Food Data Central fcdid (P80) || [http://wikifcd.wiki.opencura.com/prop/P80]<br /> |-<br /> ||| || || Fatty acids, total saturated (P86) || [http://wikifcd.wiki.opencura.com/prop/P86]<br /> |- <br /> ||| || || Fatty acids, total polyunsaturated (P88) || [http://wikifcd.wiki.opencura.com/prop/P88]<br /> |-<br /> ||| || || Carbohydrate, by difference (P89) || [http://wikifcd.wiki.opencura.com/prop/P89]<br /> |-<br /> ||| || || Vitamin B-12 (P90) || [http://wikifcd.wiki.opencura.com/prop/P90]<br /> |-<br /> ||| || || Cholesterol (P99) || [http://wikifcd.wiki.opencura.com/prop/P99]<br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57248 Talk:Mika/Temp/WikiFCD 2020-11-03T12:50:27Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |-<br /> ||| || || Fructose (P34) || [http://wikifcd.wiki.opencura.com/prop/P34]<br /> |-<br /> ||| || || Lactose (P35) || [http://wikifcd.wiki.opencura.com/prop/P35]<br /> |-<br /> ||| || || Maltose (P36) || [http://wikifcd.wiki.opencura.com/prop/P36]<br /> |-<br /> ||| || || Galactose (P37) || [http://wikifcd.wiki.opencura.com/prop/P37]<br /> |- <br /> ||| || || Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> ||| || || Folic acid (P40) || [http://wikifcd.wiki.opencura.com/prop/P40]<br /> |-<br /> ||| || || Folate, food (P41) || [http://wikifcd.wiki.opencura.com/prop/P41]<br /> |-<br /> ||| || || Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> ||| || || Vitamin A, RAE (P45) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> ||| || || Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49]<br /> |-<br /> ||| || || Vitamin K (phylloquinone) (P52) || [http://wikifcd.wiki.opencura.com/prop/P52]<br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |-<br /> ||| || || <br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57247 Talk:Mika/Temp/WikiFCD 2020-11-03T12:35:58Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> |-<br /> ||| || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57246 Talk:Mika/Temp/WikiFCD 2020-11-03T12:35:23Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75) || Pantothenic acid (P28) || [http://wikifcd.wiki.opencura.com/prop/P28]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) || Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78) || Sucrose (P32) || [http://wikifcd.wiki.opencura.com/prop/P32]<br /> || || || Glucose (P33) || [http://wikifcd.wiki.opencura.com/prop/P33]<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57245 Talk:Mika/Temp/WikiFCD 2020-11-03T12:31:22Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39) || Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42) || Thiamin (P25) || [http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45) || Riboflavin (P26) || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46) || Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57244 Talk:Mika/Temp/WikiFCD 2020-11-03T12:06:59Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26) || Copper (P20) || [http://wikifcd.wiki.opencura.com/prop/P20]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27) || Manganese (P21) || [http://wikifcd.wiki.opencura.com/prop/P21]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29) || Selenium (P23) || [http://wikifcd.wiki.opencura.com/prop/P23]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57243 Talk:Mika/Temp/WikiFCD 2020-11-03T12:03:26Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) || Potassium (P17) || [http://wikifcd.wiki.opencura.com/prop/P17]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24) || Sodium (P18) || [http://wikifcd.wiki.opencura.com/prop/P18]<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25) || Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57242 Talk:Mika/Temp/WikiFCD 2020-11-03T12:01:08Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13) || Magnesium (P15) || [http://wikifcd.wiki.opencura.com/prop/P15]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14) || Phosphorus (P16) || [http://wikifcd.wiki.opencura.com/prop/P16]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57241 Talk:Mika/Temp/WikiFCD 2020-11-03T11:57:28Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57240 Talk:Mika/Temp/WikiFCD 2020-11-03T11:56:54Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57205 Talk:Mika/Temp/WikiFCD 2020-10-29T14:14:08Z <p>Hweyl: /* Useful links */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/RCE/rceALL.html Bibliography related to variation in mineral composition of vegetables]<br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Nutritional%20Quality%20of%20Organically-Grown%20Food.html Bibliography related to Nutritional Quality of Organically Grown Food] <br /> * [https://soilandhealth.org/wp-content/uploads/06clipfile/Organically%20Produced%20Foods%20Nutritive%20Content.htm Bibliography related to Organically Produced Foods: Nutritive Content]<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11)<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57188 Talk:Mika/Temp/WikiFCD 2020-10-27T18:01:30Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10) || dietary fiber (P11) || [http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11)<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57187 Talk:Mika/Temp/WikiFCD 2020-10-27T17:58:58Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10)<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11)<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57186 Talk:Mika/Temp/WikiFCD 2020-10-27T17:58:27Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! 😊 Vietnam !! link !! 😊 Indonesia !! FDC !! link<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)|| water (P5)|| [http://wikifcd.wiki.opencura.com/prop/P5]<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6) ||energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)|| Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)|| total lipid (P8)|| [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)||Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10)<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11)<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Food composition table for Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Food composition table for Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57185 Talk:Mika/Temp/WikiFCD 2020-10-27T17:33:52Z <p>Hweyl: /* Properties used by FCT */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! SMILING Vietnam !! link !! SMILING Indonesia<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] || water (P5)<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6] || energy (P6)<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7] || Protein (P7)<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8] || total lipid (P8)<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9] ||Ash (P9)<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10] || carbohydrate (P10)<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11] || dietary fiber (P11)<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13] || Calcium (P13)<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14] || Iron (P14)<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19] || Zinc (P19) <br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] || Vitamin C (P24)<br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25] || Thiamin (P25)<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26] || Riboflavin (P26)<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27] || Niacin (P27)<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29] || Vitamin B-6 (P29)<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39] || Folate, total (P39)<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42] || Folate, DFE (P42)<br /> |-<br /> | Vitamin A, RAE (P45)|| [http://wikifcd.wiki.opencura.com/prop/P45] || Vitamin A, RAE (P45)<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] || Carotene, beta (P46)<br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75] || Retinol (P75)<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76] || common name (P76) <br /> |-<br /> | SMILING Food composition table for Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77] || SMILING Food composition table for Indonesia food code (P78)<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57184 Talk:Mika/Temp/WikiFCD 2020-10-27T17:25:43Z <p>Hweyl: /* Properties used by SMILING Vietnam */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by FCT==<br /> {| class=&quot;wikitable&quot;<br /> |-<br /> ! SMILING Vietnam !! link !!<br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] ||<br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Protein (P7)|| [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | total lipid (P8) || [http://wikifcd.wiki.opencura.com/prop/P8]<br /> |-<br /> | Ash (P9) || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | carbohydrate (P10)||[http://wikifcd.wiki.opencura.com/prop/P10]<br /> |-<br /> | dietary fiber (P11) ||[http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | Calcium (P13)|| [http://wikifcd.wiki.opencura.com/prop/P13]<br /> |-<br /> | Iron (P14)|| [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Zinc (P19) || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Vitamin C (P24) || [http://wikifcd.wiki.opencura.com/prop/P24] <br /> |-<br /> | Thiamin (P25) ||[http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Riboflavin (P26)|| [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Niacin (P27) || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Vitamin B-6 (P29) || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | Folate, total (P39) || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> | Folate, DFE (P42) || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> | Vitamin A, IU (P49) || [http://wikifcd.wiki.opencura.com/prop/P49] <br /> |-<br /> | Retinol (P75) ||[http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> | common name (P76) || [http://wikifcd.wiki.opencura.com/prop/P76]<br /> |-<br /> | SMILING Food composition table for Vietnam food code (P77)|| [http://wikifcd.wiki.opencura.com/prop/P77]<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57183 Talk:Mika/Temp/WikiFCD 2020-10-27T17:20:46Z <p>Hweyl: /* Kat flag */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> <br /> ==Properties used by SMILING Vietnam==<br /> {| class=&quot;wikitable&quot;<br /> |+ Smiling Vietnam properties<br /> |-<br /> ! English Label !! link <br /> |-<br /> | water (P5) || [http://wikifcd.wiki.opencura.com/prop/P5] <br /> |-<br /> | energy (P6) || [http://wikifcd.wiki.opencura.com/prop/P6]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P7]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P9]<br /> |-<br /> | Example ||[http://wikifcd.wiki.opencura.com/prop/P10]<br /> |-<br /> | Example ||[http://wikifcd.wiki.opencura.com/prop/P11]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P13]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P14]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P19]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P24] <br /> |-<br /> | Example ||[http://wikifcd.wiki.opencura.com/prop/P25]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P26]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P27]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P29]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P39]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P42]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P45]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P49] <br /> |-<br /> | Example ||[http://wikifcd.wiki.opencura.com/prop/P75]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P76]<br /> |-<br /> | Example || [http://wikifcd.wiki.opencura.com/prop/P77]<br /> |}<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57177 Talk:Mika/Temp/WikiFCD 2020-10-26T20:07:35Z <p>Hweyl: /* Kat flag */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a source and then create statements in the Wikibase according to our data models. As of November, 2020 we have written bots to:<br /> # add countries to the Wikibase (sourced from Wikidata)<br /> # add taxon names that have a GRIN id (sourced from Wikidata)<br /> # add human languages (sourced from Wikidata)<br /> # add USDA Food Data Central (sourced from FDC's API)<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57176 Talk:Mika/Temp/WikiFCD 2020-10-26T20:04:54Z <p>Hweyl: /* Kat flag */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator]. These bots can be used to read in data from a CSV file and then create statements in the Wikibase according to our data models.<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57175 Talk:Mika/Temp/WikiFCD 2020-10-26T20:03:33Z <p>Hweyl: /* Brown Bag Seminar */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> ==Kat flag==<br /> *Bots: We wrote a series of bots using the WikidataIntegrator python module [https://github.com/SuLab/WikidataIntegrator].<br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=57019 Talk:Mika/Temp/WikiFCD 2020-10-06T17:47:59Z <p>Hweyl: /* Questions */</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> ** I think we will be able to present multiple aliases/ multiple values for names. Some of these may conflict, but each will have a reference back to the source. If our group can determine something is incorrect, we can deprecate it.<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you? It is very tricky because sometimes we don't have enough information to decide what to do here. <br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> ** Yes, this is possible.<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> ** We create a single item for a food item and then statements from all different databases are placed on that item. <br /> * Is it easy to have Recoin on WikiFCD?<br /> ** Not sure about this. I haven't seen it available for wikibases yet. I'll keep looking.<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?<br /> ** My current plan is get this data via federated SPARQL queries with Wikidata.</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Course&diff=56771 Mika/Temp/WikiFCD/Course 2020-09-10T11:53:25Z <p>Hweyl: /* Tools */ adding a second tool to extract tables from PDF</p> <hr /> <div>This page is for planning a course on how to contribute to and use data from WikiFCD.<br /> <br /> = About the course=<br /> <br /> This course aims to introduce students to 1) the issues surrounding FCDs in global nutrition research and 2) the concept of peer production. Students will gain knowledge and practical skills to engage directly in building a global food composition database. <br /> <br /> The course will have two parts. During the first term, the students will learn about peer production and how to contribute to Wiki-based projects. In the second term, students will learn to use the data from WikiFCD to explore their own research questions while they continue to also contribute various FCDs to WikiFCD.<br /> <br /> =Schedule=<br /> <br /> * JHSPH has 4 terms (7 weeks each; 39-40 instruction days) + 1 summer term (8 weeks) and 1 winter intersession term (2 weeks):<br /> : Summer Term: Wednesday, July 1 - Wednesday, August 26 (39 class days)<br /> : 1st Term: Monday, August 31 - Monday, October 26 (40 class days)<br /> : 2nd Term: Tuesday, October 27 - Wednesday, December 23 (39 class days)<br /> : Winter Intersession: Monday, January 4 - Friday, January 15<br /> : 3rd Term: Monday, January 25 - Friday, March 19 (39 class days)<br /> : 4th Term: Monday, March 29 - Friday, May 21 (40 class days)<br /> <br /> * If we can teach in the 1st and/or 2nd term, that would be nice for the grant application to be submitted in November.<br /> * It seems it's about 1 hour instruction time per credit. Students register 16 credits per semester (2 terms).<br /> * Aim for meet once a week for 2 hours = 2 credits?<br /> <br /> =Course Aims=<br /> <br /> # To learn about issues with FCDs in global nutrition research<br /> # To learn about peer production<br /> # To learn about Wikidata and Wikibase<br /> # To learn how to contribute to WikiFCD<br /> # To learn how to use data from WikiFCD<br /> <br /> =Structure of the course=<br /> <br /> * Lecture<br /> * Lab<br /> * Individual projects<br /> <br /> =Course schedule=<br /> <br /> == 1st term: Aim 1 through 4 ==<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Week !! Aim/Learning objectives !! Content !! Assignment !! Assignment Due date<br /> |-<br /> | Week 1 <br /> | Aim 1: Learn about FCDs<br /> | <br /> # Learn about FCDs<br /> # Learn about issues related to FCDs and history of global databases<br /> # Discuss potential solutions<br /> | <br /> # Readings on peer production<br /> # Create an account on Wikimedia<br /> # Wiki Education Wikidata Professional Development Training Modules: Orientation, Introduction to Wikidata<br /> | Week 2<br /> |-<br /> | Week 2<br /> | Aim 2/3:<br /> # Learn about peer production<br /> # Learn about Wikimedia projects, Wikidata, and Wikibase<br /> # Learn how to participate in wiki-based projects<br /> |<br /> # Lecture on peer production<br /> # Discussion on pros and cons of peer production for global FCD development<br /> # Lecture on Wikimedia projects, Wikidata, and Wikibase - pull materials from Wiki Education on Wikidata?<br /> * [https://commons.wikimedia.org/wiki/File:WikidataHumanists.pdf Example of material we could adapt that I created for another event]<br /> # Edit an Wikipedia article<br /> |<br /> # Readings<br /> # Databases and Linked Data<br /> # Make Edits on Wikipedia or Wikidata<br /> # Create an account on WikiFCD<br /> |<br /> |-<br /> | Week 3<br /> | Aim 4:<br /> # Learn more about wiki editing<br /> # Learn about structured data<br /> |<br /> # Lecture on Wiki markup<br /> # Edit Wikidata <br /> |<br /> # Readings<br /> # Edit food/nutrition related pages on Wikidata<br /> |<br /> |-<br /> | Week 4<br /> | Aim 4:<br /> # Learn about WikiFCD and Open Food Facts<br /> # Learn about different ways to import data from elsewhere<br /> # Learn about licensing<br /> |<br /> # Lecture on structured data and how this could help our research<br /> # Examples of good and bad Wikidata pages<br /> # Edit meta data (FCD information)<br /> | <br /> # Readings<br /> # Search and add missing FCD meta data<br /> |<br /> |-<br /> | Week 5<br /> | Aim 4:<br /> * Learn about online communities and collaboration<br /> * OR<br /> * Learn how nutrient data are produced (field trip to USDA chemical analysis lab)<br /> |<br /> * Lecture on successful online communities<br /> * Discussion on how to build a successful global WikIFCD community<br /> * OR<br /> * Field trip.<br /> |<br /> # Readings<br /> # Learn a script to import example FCD csv (develop for this class)<br /> |<br /> |-<br /> | Week 6<br /> | Aim 4: <br /> # Explore ways to connect knowledge bases<br /> | <br /> # Learn about other Wikimedia projects and Wikibase based projects<br /> # Discuss potential research questions that can be answered by connecting these projects<br /> |<br /> # Choose your research proposal topi<br /> |<br /> |-<br /> | Week 7<br /> | individual project consultation<br /> |<br /> |<br /> |<br /> |-<br /> | Week 8<br /> | Presentation on research proposals<br /> |<br /> |<br /> |<br /> |}<br /> <br /> == 2nd Term: Aim 4/5 ==<br /> <br /> * Students will select their own research questions and try to answer them analyzing data from WikiFCD.<br /> * Skills: SPARQL<br /> * Optional skills: statistical software<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Week !! Aim !! Content !! Assignment !! Assignment Due date<br /> |-<br /> | Week 1 <br /> | Aim 4/5<br /> | <br /> |<br /> |<br /> |-<br /> | Week 2<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 3<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 4<br /> | individual project consultation<br /> |<br /> |<br /> |<br /> |-<br /> | Week 5<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 6<br /> | Aim 4/5<br /> | <br /> |<br /> |<br /> |-<br /> | Week 7<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 8<br /> | Presentation<br /> |<br /> |<br /> |<br /> |}<br /> <br /> =Evaluation=<br /> <br /> =Tools=<br /> * Software to extract tabular data from PDF files: [https://www.wikidata.org/wiki/Q96774878] and another [https://vinayak.io/2018/10/03/camelot-python-library-extract-tables-pdf/]<br /> * WikidataIntegrator: [https://github.com/SuLab/WikidataIntegrator]<br /> * OpenRefine: [https://openrefine.org/]</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD&diff=56645 Talk:Mika/Temp/WikiFCD 2020-08-26T14:52:57Z <p>Hweyl: /* Questions */ Reply to question</p> <hr /> <div>=Related Work=<br /> This paper from ISWC 2019 describes a knowledge graph that includes nutrient data.<br /> [http://www.cs.rpi.edu/~zaki/PaperDir/ISWC19.pdf]<br /> Data:<br /> [https://foodkg.github.io/foodkg.html]<br /> <br /> =Wikiprojects to notify=<br /> * [https://www.wikidata.org/wiki/Wikidata:WikiProject_Food Wikiproject Food]<br /> :: What's the best to notify them? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 12:33, 20 January 2020 (EST)<br /> ::: I think there is a ping project template that we can use. [https://www.wikidata.org/wiki/Template:Ping_project Ping Project]. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 19:51, 21 January 2020 (CET)<br /> :::: Great! Is the plan to say that we'll build this in a way that'd be easy to be incorporated into WikiData if they choose to do so in the future? [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:17, 21 January 2020 (CET)<br /> ::::: Draft text: Greetings members of WikiProject Food! We'd like to let you know about a complementary set of work we are planing related to food composition data. We are planing to create a Wikibase just for food composition data. To support this work we are proposing a Grant. We invite your review of the description of the project so we can learn of any feedback you may be willing to share with us.<br /> <br /> = People to review =<br /> <br /> == Gene Wiki ==<br /> *[http://sulab.org/research/crowdbio/gene-wiki/]<br /> <br /> == About WD ==<br /> <br /> * draft response<br /> <br /> * Thank you for the feedback! We have been thinking about this for a while as we'd initially planned to do this in Wikidata. Knowing how variable the types and depths of information are in FCDs, it is possible that we will include data that may never be appropriate for Wikidata. We decided that it'd make sense to build a Wikibase where we can hold all the details. <br /> <br /> Example datasets may look like this FAO data on [http://www.fao.org/fileadmin/templates/food_composition/documents/PhyFoodComp_1.0.xlsx detailed information on phytate] or [https://cdn1.sph.harvard.edu/wp-content/uploads/sites/30/2012/10/tanzania-food-composition-tables.pdf more standard data] which includes aggregate phytate data. And this is just one nutrient; there are many more nutrient data as well as meta data. We believe that it is important have these discussions in WD. We also feel that the discussion on WD and this Wikibase database could develop simultaneously.<br /> <br /> Our approach includes the creation of ShEx schemas [https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas Wikidata:WikiProject Schemas] we will publish these schemas in Wikidata's E [https://www.wikidata.org/wiki/Help:Namespaces namespace] for entity schemas. This way the data models that we are discussing could also be shared and discussed on Wikidata. Our approach is to prepare data in a Wikibase and then help coordinate getting the data into Wikidata once the Wikidata community has built consensus on how much nutrient data will be appropriate for Wikidata.<br /> <br /> == Response to other project ==<br /> ::: Happy to clarify these points! <br /> <br /> ::: First, we want to emphasize that our focus in this project is the '''re-organization''' and '''standardization''' of the '''existing databases''' and we will do our best to classify and store every bit of information from each database. We will not limit the amount of data to be included in our Wikibase instance, with the hope that different communities, such as Wikidata, can pick and choose what to include in their own databases. Our Wikibase instance can serve as a place to sort data from all the databases that are available on the Internet, including ones you mentioned, into one place, so that users can pull, combine, and analyze necessary data more easily. We believe that this project will be able to offer something these FCDs currently do not/cannot do, by harnessing the power of peer production. <br /> <br /> ::: I use many of these FCDs as a nutritional epidemiologist for research, and also as a migrant individual who records dietary information related to foods from my native country as well as other countries. The current situation of having multiple incomplete databases in various formats is much less than ideal for meeting the needs of diverse communities and individuals. For instance:<br /> <br /> :::: * I keep my daily dietary records in English. The software I'm using primarily uses the data from USDA.<br /> :::: * I sometimes eat foods more commonly consumed in Japan like [[:ja:クビレズタ|海ぶどう]].<br /> :::: * I look for this item in the USDA databases, using several keywords including its scientific name, but I can't find the information. Perhaps this data exists in another database but I'd need to check each database one by one.<br /> :::: * I use another algae item as a substitute in my record but the nutrient data are available in the [https://fooddb.mext.go.jp/result/result_top.pl?USER_ID=15560 Japanese database].<br /> <br /> ::: This is just one example of the kinds of problems people may run into because of the incomplete connectivity among the existing databases. A global FCD can open up ways to explore many, many more new questions and solutions not just in food and nutrition but in many other topics that Wikimedians may be interested in. We believe that having this placeholder for all FCD information is a good way to contribute to different Wikimedia projects.<br /> <br /> ::: I believe OFF and our Wikibase instance take distinct approaches. According to their website:<br /> <br /> :::: &quot;Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.&quot;<br /> <br /> ::: OFF builds up the database by individually contributing nutrient data from food product labels. Our project's approach is different. We will be using existing databases and compile them into a standardized and structured database. As I mentioned before, OFF and our Wikibase instance are complimentary. On of the strengths of OFF is the ability to have product nutrient data that are not yet in larger public databases like USDA databases. Combining OFF and our Wikibase instance, we can have a more comprehensive FCD than any single database. <br /> <br /> ::: Great point about the plan beyond data importation. Like any Wikimedia project, peer production has the potential to actually keeping information more up-to-date than any working groups with limited numbers of participants. Any methods we employ to import, check updates, and maintain this database will be documented, so that future participants can easily learn how to do each of these activities and start contributing to the project. We will engage in outreach activities to involve diverse participants and we hope that documentation/tutorials and community engagement will increase the chance of frequent updates of the data. <br /> <br /> ::: In short, there is an enormous amount of food composition data on the Internet already. There have been several attempts to combine some of these databases but none has succeeded in creating a comprehensive and easy-to-use global database. Open and collaborative peer production has its advantage over smaller working groups in compiling these databases into one place and maintain the data. We believe that this is a powerful solution to the decades-old problem in nutrition and will benefit a wide range of audiences, from Wikidata users to FCD developers. <br /> <br /> ::: We hope this clarifies your questions. Thanks for engaging in this topic!<br /> <br /> == Response to alex ==<br /> ::: Thank you so much for the comments and suggestions! I love the idea of digital community engagement through content drive/contest. We will work with the community engagement intern to plan and incorporate this idea. We will edit the proposal to add this point. <br /> <br /> ::: We hope that our wikibase will be useful to OFF and we'll be able to demonstrate seamless data exchanges between the two. I completely agree that this is an important knowledge base for several SDGs and we look forward to working with diverse communities to improve it. <br /> <br /> ::: Thanks again for the feedback!<br /> <br /> =Wikibase for prototyping wikiFCD=<br /> <br /> =Testing Wikibase=<br /> * Main page [https://wikifcd.wiki.opencura.com/wiki/Main_Page]<br /> * List of properties [https://wikifcd.wiki.opencura.com/wiki/Special:ListProperties]<br /> =Inventory of FCTs=<br /> * We may want to consider using our wikibase to inventory the FCTs of interest.<br /> * example item for FCT [https://wikifcd.wiki.opencura.com/wiki/Item:Q13]<br /> :: YES!!! [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 22:18, 12 March 2020 (CET)<br /> * Here is a [https://tinyurl.com/y7rlxbn4 Query for all food composition tables listed the wikifcd wikibase]. I also created some additional properties as we discussed in our last meeting. Properties 55-58 mentioned on [https://wikifcd.wiki.opencura.com/w/index.php?title=Special:ListProperties/&amp;limit=500&amp;offset=0 this page] can also be used now. [[User:Hweyl|Hweyl]] ([[User talk:Hweyl|talk]]) 00:45, 21 April 2020 (CEST)<br /> :: Great! Thank you. As I work through the list on FAO/INFOODS, I'm discovering more and more about the complexity of these databases. Regional databases are tricky as they combine existing databases from different countries and sometimes add new information. The easiest thing to do may be to stick with single-country databases that are not based off another database...will keep you posted. [[User:Mika|Mika]] ([[User talk:Mika|talk]]) 03:12, 21 April 2020 (CEST)<br /> <br /> =Useful links=<br /> *[https://www.nutrientdirectory.org/indd/index.cfm? International Nutrition Databases] at Harvard; not clear if it's still maintained but it has information on some databases (overlapping information with FAO/INFOODS).<br /> * [https://www.cropcomposition.org/query/sources.html Crop Composition Database] by ISLI/Agriculture and Food Systems Institute.<br /> * [https://foodsystems.org/ meta data on nutrition databases] by ISLI/Agriculture and Food Systems Institute.<br /> * [http://www.langual.org/Default.asp Framework for food description].<br /> <br /> =response to the committee=<br /> <br /> Thank you so much for reviewing our proposal! We wanted to respond to the three major lines of criticism raised in our proposal.<br /> <br /> 1. Relationships with existing initiatives<br /> <br /> The most important issue raised by reviewers were concerns that our project overlapped with existing initiatives, [https://world.openfoodfacts.org/ Open Food Facts] (OFF) in particular. This issue was raised on the [[Talk:FIXME|talk page for our proposal]] during the discussion face but reviewers felt that our response there was not convincing.<br /> <br /> We have made major changes to the text our proposal to try to explain in more detail how this project and OFF are complementary and to try to explain the difference between nutrition-label data (OFF's domain) and food composition data (WikiFCD's). Although they are related, they are different data with different sources, different audiences, different challenges, and different uses. Food composition dat is best understood as a &quot;downstream&quot; source of granular data for projects like OFF as well for Wikibase instances like Wikidata. <br /> <br /> We have spoken in depth with Stéphane Gigandet (the founder and leader of OFF) who has in turn spoken about our project with the OFF board. Gigandet is excited about our project and, with support of the OFF board, and has graciously added his name to the list of advisors for our proposal. Part of the reason Gigandet is excited about proposal is that OFF has previously attempted to incorporate some fruit and vegetable FCD from [[:wikipedia:USDA|USDA]] and [[:wikipedia:CIQUAL|CIQUAL]]. After running into some of the issues our team discusses in this proposal, like extremely granular data and divergent and shifting formats, OFF decided ''not'' to move forward with supporting the types of FCD our proposal targets. With Gigandet's advice, we will work closely with OFF to ensure that WikiFCD not only does not compete or duplicate effort with OFF but that it provides a useful resource that they can draw from.<br /> <br /> Although it was not raised in the reviews, a new Wikibase instance like WikiFCD will reduce the burden for other communities such as Wikidata as well so that they can focus on their main project aims. We have updated our proposal to make it clear that we will work closely with Gigandet to integrate our work with OFF as a way of demonstrating how other Wikibase instances can incorporate data from WikiFCDs. <br /> <br /> 2. Community engagement <br /> <br /> A number of reviewers raised concerns about our ability to build an engage a community. Although we agree that this reflects the biggest challenge and risk for this project, we believe that, with a WMF grant, we will have the resources we need to succeed.<br /> <br /> If our project has less in the way of existing community support than some other grant proposals, it is because our goal is to engage ''new'' groups of experts in the WMF ecosystem rather than calling upon already overtaxed Wikimedia volunteers. The type of outreach we are proposing will involvement building partnerships which could lead to expansion and diversification of Wikimedia contributors. Our approach is to provide domain experts experiences with Wikimedia systems and tooling that they find valuable. This strategy for engaging domain experts is consistent with the findings and recommendations of the [https://tools.wmflabs.org/scholia/topic/Q5531528 GeneWiki program of work].<br /> <br /> [[User:Hackfish]] is an established academic expert in global health and nutrition. She is currently working at both [[:wikipedia:San Francisco State University|San Francisco State University]] and [[:wikipedia:Harvard School of Public Health|Harvard School of Public Health]] and will be starting as an Assistant Professor at the [[:wikipedia:Johns Hopkins University|Johns Hopkins University]] in September 2020. [[User:Hackfish]] is well positioned to user her deep connections in the academic nutrition community to help this project succeed and this engagement project will be the large part of what the intern will work on. We have already garnered strong interest in this project among the nutritionists at both Harvard and Johns Hopkins and will be working with teams at both places to contribute to WikiFCD and to engage broader communities.<br /> <br /> 3. Budget Question<br /> <br /> There was a lack of clarity in our proposal about what the intern would be doing for 8 hours each week. We have edited the proposal to clarify that the intern will be working with the project manager to create online learning tools and seminars to develop and deliver curriculum focused on teaching nutrition professionals to use and contribute to WikiFCD as well as about peer production, WM projects, and ways to contribute to Wiki-based projects in general. We aim to employ a student who is interested in working with online communities who can support us one day a week for two semesters so as not to create burden on their workload and interfere with their academic work. We are confident that we will be able to identify such an individual.<br /> <br /> ------<br /> <br /> <br /> Integrate into the proposal:<br /> <br /> Based on the in-depth discussion both with various WM members and nutrition researchers over the past year, we decided that it would be most useful to '''diverse communities''' with '''diverse needs''' if we build a comprehensive Wikibase instance for all existing FCDs from around the world. While some communities may be only interested in the most common nutrient information (e.g. total fat, trans fat, calories, sugar), other communities may want more specific information (e.g. Phytic acid (by HPLC/HPAE) : Zinc ratio) or other information related to the food item (e.g. scientific names, varieties of fruits, geo-locations). <br /> <br /> WikiFCD can make significant contributions to various WM communities with interests in nutritional data. The Wikibase we are creating will be an expert-curated data set that is mapped to Wikidata. The Wikidata community as well as any other wikibase community will be able to reuse entity schemas or entity data (or both!) from our system. Data will flow back into Wikidata if desired. We strongly believe that WikiFCD will make a positive impact on WM and other communities by accommodating their varying needs and bringing more equitable access to an easily usable database. This Wikibase instance is a new and innovative approach that addresses problems that need to be, and yet have not been, solved. <br /> <br /> <br /> <br /> <br /> -----<br /> <br /> Comments from reviewers<br /> <br /> 1. Relationships with existing initiatives<br /> * I am encouraged to see this project proposal suggesting looking into another aspect of knowledge gaps. I am concerned there are already existing initiatives and wondering why the proposers have chosen to not to align with those.<br /> * It does fit with Wikimedia's strategic priorities. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different.<br /> * I do not find this project to be critical to the current state of knowledge. There are existing resources they could join to do this work.<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food Facts&quot;, another free software and open data but the concept of both projects is different (from the proposal &amp; answers to questions on the proposal talk page). <br /> * A lot of concerns about the creation of another instance. The answers in the discussion page don't convince me.<br /> <br /> 2. Community engagement <br /> * A new instance of Wikibase? Without a community?<br /> * It does fit with Wikimedia's strategic priorities and the budget is reasonable but not enough community engagement. There is a need for a wider or major community discussion to ascertain whether this could be hosted on Wikidata or not. However, it looks like a competitor to &quot;Open Food <br /> * Interesting concept but some concerns about the methodology.<br /> * They really need to become more involved since they were not able to get any endorsements.<br /> * The proposal has very little community engagement with current Wikipedia communities.<br /> <br /> * This seems iterative, but minimally so.<br /> <br /> =Grant proposal edits=<br /> <br /> Through this pilot project, we will write schemas to describe our data model based on five large food composition datasets that are already available online and develop good documentation for both project development and use. '''The focus on equity and global nature of the project requires diverse participants''', which is very much in line with the missions of projects supported by the Wikimedia Foundation. Through this pilot project, we hope to show how peer production can contribute to the improvement in data/knowledge disparities in global nutrition. We believe that WikiData is an awesome way to build connections between a range of free culture related nutrition projects like [https://fosdem.org/2020/schedule/event/open_food_facts/ Open Food Facts] that might do the same.<br /> <br /> We will test several automated and manual methods to populate the wikibase with nutrient data from 5 food composition databases from around the world (see the Project Plan section section for details). We will write schemas to describe our data model. We will map our properties to Wikidata properties.<br /> <br /> '''2. Why is this a good idea?'''<br /> <br /> * First, this Wikibase instance will '''significantly improve the usability of FCD from different sources for diverse users''' - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Food_and_drink WikiProject food and Drink] on English Wikipedia and [https://www.wikidata.org/wiki/Q8485990#sitelinks-wikipedia its equivalents in other languages] are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> : Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> * Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to '''incorporate data from heterogeneous data sources'''. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> ; '''UPDATE''': In our recent communication with the Open Food Facts community, they discussed that OFF is in fact interested in using data from USDA (USA) and CIQUAL (France). However, it is burdensome having to deal with diverse and dynamic formats - they mentioned three separate format changes in USDA since they started looking at this. Having another project like WikiFCD can help each community focus on their main project goals instead of each having to deal with these issues we raised in this proposal. This conversation with OFF reinforces our belief that WikiFCD will be helpful to diverse peer production communities.<br /> <br /> * Finally, we will '''complete this project with diverse communities from around the world''' as these FCD can be translated into/from many languages. The design of Wikibase will allow us to more easily support additional languages in the data itself, as well as in user interfaces.<br /> <br /> = Brown Bag Seminar =<br /> <br /> Some ideas for things to present:<br /> <br /> * slides on the project idea, our vision for the overall process<br /> * slides on the process of &quot;automated&quot; reading of databases - maybe USDA API example or SMILING excel sheet example<br /> * slides on connecting to other Wikibase instances - wikidata example<br /> * demo on how a contributor can easily identify which language is missing the food item in wikidata, which makes it easier for contributors to see where they can make contributions in terms of translation. (I forgot the name of this tool...)<br /> * demo on making a query based on research questions (e.g. nutrient content differences across databases; comparison of numbers of nutrients included in different databases; Wikipedia articles mentioning a particular food item etc etc)<br /> * demo on how to convert data to csv or other formats that can be used in R, STATA etc.<br /> * slides on our plans for the community expansion and engagement (what we will do initially ourselves; what peer contributors will do once the system is ready)<br /> <br /> <br /> = Questions =<br /> <br /> * Some databases are published in multiple languages. How should we deal with any discrepancies between the translation by the organizations and by Wikidata? (e.g Japanese databases are available in English and Japanese)<br /> * Some databases state that they borrow data from other databases but do not specify exactly which items were borrowed. How should we deal with this? (e.g. Bangladesh FCT 2013)<br /> ** This is my current working model [https://wikifcd.wiki.opencura.com/wiki/Item:Q135079]. I am using stated in for the FCT where we found the value and based on for the source they note. How does this seem to you?<br /> * In Wikidata, we can add values without adding references but is it possible to make it a requirement to have at least one reference on WikiFCD?<br /> * We discussed this before but I forgot the conclusion - do we create different items for the same fruits from different databases (e.g. Apple)?<br /> * Is it easy to have Recoin on WikiFCD?<br /> * Perhaps we can add some of the identifier properties from [https://www.wikidata.org/wiki/Wikidata:WikiProject_Medicine/Properties WikiProject_Medicine]?</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/FCTtemplate&diff=56635 Mika/Temp/WikiFCD/FCTtemplate 2020-08-24T17:10:50Z <p>Hweyl: Describe purpose of page</p> <hr /> <div>=Describing FCTs in our Wikibase=<br /> When we learn of an FCT that we'd like to keep track of we can add them to our spreadsheet of FCTs.<br /> <br /> Here is some guidance on how to structure the data to make it easy to import into our Wikibase.</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56598 Mika/Temp/WikiFCD/Grants 2020-08-21T11:34:53Z <p>Hweyl: /* Outputs */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> The primary output of this work is a knowlege graph of structured data published in a publicly-avilable instance of the Wikibase platform. Wikibase is a set of extensions to the MediaWiki software platform and is developed by the Wikimedia Foundation as free software [https://www.mediawiki.org/wiki/Wikibase]. Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. The Query Service is a SPARQL endpoint which supports querying the data in the knowledge graph via the SPARQL query language [https://www.w3.org/TR/sparql11-query/]. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term. These aliases serve several disambiguation functions. They allow the use of common names as well as scientific names and they allow multilingual indexing. They also allow us to store historic names, whether scientific or common, that are no longer used, but may be found in the literature or in historical sources. For example, this is the record for Jugulans regia in our knowledge graph [https://wikifcd.wiki.opencura.com/wiki/Item:Q82650]. In addition to the species name we also support the aliases 'common walnut', 'Old World Walnut','Walnut, 'Persian Walnut' and 'Juglans fallax' for this item. A more extensive example is Vaccinium vitis-idaea [https://wikifcd.wiki.opencura.com/wiki/Item:Q117098], for which we provide 13 aliases beyond the species name. <br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56597 Mika/Temp/WikiFCD/Grants 2020-08-21T11:31:27Z <p>Hweyl: /* Outputs */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> The primary output of this work is a knowlege graph of structured data published in a publicly-avilable instance of the Wikibase platform. Wikibase is a set of extensions to the MediaWiki software platform and is developed by the Wikimedia Foundation as free software [https://www.mediawiki.org/wiki/Wikibase]. Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term. These aliases serve several disambiguation functions. They allow the use of common names as well as scientific names and they allow multilingual indexing. They also allow us to store historic names, whether scientific or common, that are no longer used, but may be found in the literature or in historical sources. For example, this is the record for Jugulans regia in our knowledge graph [https://wikifcd.wiki.opencura.com/wiki/Item:Q82650]. In addition to the species name we also support the aliases 'common walnut', 'Old World Walnut','Walnut, 'Persian Walnut' and 'Juglans fallax' for this item. A more extensive example is Vaccinium vitis-idaea [https://wikifcd.wiki.opencura.com/wiki/Item:Q117098], for which we provide 13 aliases beyond the species name. <br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56583 Mika/Temp/WikiFCD/Grants 2020-08-20T11:41:47Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term. These aliases serve several disambiguation functions. They allow the use of common names as well as scientific names and they allow multilingual indexing. They also allow us to store historic names, whether scientific or common, that are no longer used, but may be found in the literature or in historical sources. For example, this is the record for Jugulans regia in our knowledge graph [https://wikifcd.wiki.opencura.com/wiki/Item:Q82650]. In addition to the species name we also support the aliases 'common walnut', 'Old World Walnut','Walnut, 'Persian Walnut' and 'Juglans fallax' for this item. A more extensive example is Vaccinium vitis-idaea [https://wikifcd.wiki.opencura.com/wiki/Item:Q117098], for which we provide 13 aliases beyond the species name. <br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56582 Mika/Temp/WikiFCD/Grants 2020-08-20T11:41:32Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term. These aliases serve several disambiguation functions. They allow the use of common names as well as scientific names and they allow multilingual indexing. They also allow us to store historic names, whether scientific or common, that are no longer used, but may be found in the literature or in historical sources. For example, this is the record for Jugulans regia in our knowledge graph [https://wikifcd.wiki.opencura.com/wiki/Item:Q82650]. In addition to the species name we also support the aliases 'common walnut', 'Old World Walnut','Walnut, 'Persian Walnut' and 'Juglans fallax' for this item. A more extensive example is Vaccinium vitis-idaea [https://wikifcd.wiki.opencura.com/wiki/Item:Q117098], for which we provide 13 aliases beyond the species name. <br /> <br /> <br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56581 Mika/Temp/WikiFCD/Grants 2020-08-20T11:28:55Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term. These aliases serve several disambiguation functions. They allow the use of common names as well as scientific names and they allow multilingual indexing. They also allow us to store historic names, whether scientific or common, that are no longer used, but may be found in the literature or in historical sources. <br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56580 Mika/Temp/WikiFCD/Grants 2020-08-20T11:24:45Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2020/nsf20591/nsf20591.htm Information Integration and Informatics (III)] under CISE<br /> | NEW: no deadlines for SMALL projects (submit anytime after Oct 1, 2020); September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009). The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties, and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56568 Mika/Temp/WikiFCD/Grants 2020-08-18T19:24:53Z <p>Hweyl: </p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> In many FCTs food items are identified with a single label. Our approach supports searching across multiple aliases for a single resource. This broadens search options so that lookups are not constrained to a single search term.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009).<br /> <br /> The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56564 Mika/Temp/WikiFCD/Grants 2020-08-18T12:15:11Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009).<br /> <br /> The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56563 Mika/Temp/WikiFCD/Grants 2020-08-18T12:14:21Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009).<br /> <br /> The Wikidata knowledge base fulfils the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56562 Mika/Temp/WikiFCD/Grants 2020-08-18T12:14:03Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet. The World Wide Web Consortium (W3C) published the following definition of the Semantic Web in 2009. &quot;Semantic Web is the idea of having data on the Web defined and linked in a way that it<br /> can be used by machines not just for display purposes, but for automation, integration, and<br /> reuse of data across various applications.” (W3C Semantic Web Activity, 2009).<br /> <br /> The Wikidata knowledge base fulfills the requirements outlined by the W3C in that each resource has a unique identifier, is liked to other resources by properties and that all of the data is machine actionable as well as editable by both humans and machines. <br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56552 Mika/Temp/WikiFCD/Grants 2020-08-17T11:59:30Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> Our decision to build this knowledge base using the infrastructure of the Wikimedia Foundation means that other researchers will be able to access this data for reuse in their own projects in a variety of formats. Results from our SPARQL endpoint are available for download as JSON, TSV, CSV and HTML. Preformatted code snipits for making requests to our SPARQL endpoint are available in PHP, jQuery, JavaScript, Java, Perl, Python, Ruby, R and Matlab. These options allow researchers to more quickly integrate data from our knowledge base into their existing projects using the tools of their choice.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56518 Mika/Temp/WikiFCD/Grants 2020-08-14T10:43:03Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The knowledge graph approach also facilitates expansion of this project into related domains. We could look at food chemistry and metabolic processes by combining this with subsets of Wikidata. We could combine this data with research literature about health benefits of plant-derived medicines and extend our data models to include plant components that have been tested for medicinal efficacy. <br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56517 Mika/Temp/WikiFCD/Grants 2020-08-14T10:38:21Z <p>Hweyl: /* Methods */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> <br /> These ShEx schemas will also reduce work for anyone looking to combine data from our knowledge graph with other data sets. For example, if researchers would like to explore our data, rather than writing exploratory SPARQL queries to find out what data can be found and the details of our data models, they can simply review our ShEx schemas to quickly understand our data models. <br /> <br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Talk:Mika/Temp/WikiFCD/Grants&diff=56513 Talk:Mika/Temp/WikiFCD/Grants 2020-08-13T11:52:17Z <p>Hweyl: </p> <hr /> <div>* moved the content to [[Mika/Temp/WikiFCD/Grants|the main page]]!<br /> =Sources to Consider=<br /> The Rodale Institute has conducted research into how farming practices impact nutrient composition. Ex: Hepperly, Paul, et al. &quot;Compost, manure and synthetic fertilizer influences crop yields, soil properties, nitrate leaching and crop nutrient content.&quot; Compost Science &amp; Utilization 17.2 (2009): 117-126. [https://www.tandfonline.com/doi/abs/10.1080/1065657X.2009.10702410]<br /> <br /> We may want to see if anyone has doe a literature review of their research.</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56512 Mika/Temp/WikiFCD/Grants 2020-08-13T11:46:09Z <p>Hweyl: /* Outputs */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources. In this way we will be able to document the location of harvest and combine that with the nutrient composition way at the level of the statement of each fact. Each harvesting episode for which we have nutrient composition data will be modeled individually, as we acquire additional data for wild harvests, we will be able to compare the nutritional information across spaces as well as time.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56501 Mika/Temp/WikiFCD/Grants 2020-08-12T11:54:20Z <p>Hweyl: /* Methods */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> We used the wbstack platform to create an instance of Wikibase for testing\footnote{\href{https://www.wbstack.com/}{https://www.wbstack.com/}}. The wbstack service provides a hosted version of Wikibase that users can load with their own data. Wikibase is the software used to support Wikidata itself. <br /> <br /> WikidataIntegrator (WDI) is a python library for interacting with data from Wikidata \cite{waagmeester2020science}. WDI was created by the Su Lab of Scripps Research Institute and shared under an open-source software license via GitHub\footnote{\href{https://github.com/SuLab/WikidataIntegrator}{https://github.com/SuLab/WikidataIntegrator}}. Using WDI as a framework, we wrote bots to transfer data from FCTs to our Wikibase.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56498 Mika/Temp/WikiFCD/Grants 2020-08-11T12:09:22Z <p>Hweyl: /* Outputs */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> * Visualizations of this data<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> Multiple data visualization options are available via the Query Service of our Wikibase instance. Graphs, charts, network diagrams, and maps are some of the visualizations we will be able to offer end-users of this knowledge base. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56497 Mika/Temp/WikiFCD/Grants 2020-08-11T12:05:25Z <p>Hweyl: /* Methods */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> Using the SPARQL query language, we can also write tailored queries to extract subgraphs supported by a single source. In this way we support views of the data across multiple sources as well as views of the data drawn from individual sources. Researchers will not need to separate data manually, the provenance metadata is machine actionable and stored at the level of individual statements in the graph.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56496 Mika/Temp/WikiFCD/Grants 2020-08-11T12:02:15Z <p>Hweyl: /* Methods */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can connect our sources to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56495 Mika/Temp/WikiFCD/Grants 2020-08-11T12:00:57Z <p>Hweyl: /* Outputs */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community. The output of this research will be a knowledge graph of structured data in the form of a Wikibase instance populated with data from heterogeneous food composition tables. <br /> <br /> * Case Study One: Fermented foods<br /> The nutrient composition of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. For example, we have designed our knowledge base to accommodate nutrient composition data for a single varietal of a species grown on the same farm that is re-analyzed yearly for nutrition information. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can assign provenance to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56491 Mika/Temp/WikiFCD/Grants 2020-08-10T12:21:05Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community.<br /> <br /> * Case Study One: Fermented foods<br /> The nutrient compositions of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can assign provenance to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The ability to federate SPARQL queries between our Wikibase and Wikidata allows us to combine our data with resources from the media repository of the Wikimedia Foundation, Wikimedia Commons. The ability to quickly locate images, videos and sound files related to the resources in our Wikibase allows us to provide interactive multi-media interactions in applications we build on top of our Wikibase. Wikimedia Commons has images of many of the taxa of which our food items are products. <br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Grants&diff=56448 Mika/Temp/WikiFCD/Grants 2020-08-07T12:25:41Z <p>Hweyl: /* Impact */</p> <hr /> <div>This page is for writing down ideas for grants.<br /> <br /> =List of potential grants and deadlines=<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Organization !! Category !! Deadline !! Funding Aims !! Amount<br /> |-<br /> | NSF<br /> | [https://www.nsf.gov/pubs/2019/nsf19589/nsf19589.htm: Information Integration and Informatics (III)] under CISE<br /> | October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects<br /> | &quot;The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:<br /> <br /> * General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.<br /> <br /> * Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.<br /> <br /> * Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.<br /> <br /> * Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.<br /> <br /> * Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques.&quot;<br /> <br /> | up to $500,000 total budget with durations up to three years<br /> |}<br /> <br /> =Project Aims=<br /> Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.<br /> <br /> We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system. <br /> <br /> First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.<br /> <br /> Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.<br /> <br /> Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.<br /> <br /> =Outputs=<br /> * list products and other outputs<br /> * Wikibase instance with FCD data from multiple sources<br /> * SPARQL query code to combine this data with subsets of Wikidata data<br /> * Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas<br /> <br /> Wikibase is a novel infrastructural platform for data management suitable for data from many domains. This is the first application built on Wikibase tailored to the needs of the epidemiological community.<br /> <br /> * Case Study One: Fermented foods<br /> The nutrient compositions of fermented foods commonly changes as the fermentation process progresses. We will select 15 fermented foods to use in a case study of modeling nutrient composition that changes over time. We will develop an algorithm for use in our Wikibase for converting a set of food items into a fermented food recipe that will result in accurate nutrient information for the dish. <br /> <br /> * Case Study Two: Time Series Data<br /> Agricultural practices, local conditions, and global weather patterns all influence nutrient density in food crops. Designing a data model to represent time series data will allow us to track changes in nutrient density over time. <br /> <br /> * Case Study Three: Georeferenced Data<br /> Wild food is food that is gathered from the environment rather than cultivated agriculturally. The nutrient composition of wild foods are determined by the ecology of their location. Building a data model for georeferenced data will allow us to track the coordinate locations of wild food item sources.<br /> <br /> =Methods=<br /> * Data Acquisition<br /> We worked from the FAO's list of food composition tables [http://www.fao.org/infoods/infoods/tables-and-databases/en/] to identify existing FCDs that we could add to our Wikibase. We then found copies of these FCTs where possible. We then extracted the data from these tables. The FCDs were originally published as CSV or as tabular data encoded in a PDF. <br /> *<br /> * Database Design and Population<br /> We will create a database model that can represent heterogeneous food composition tables. We will use this model to map multiple food composition tables so that we can then import them into a Wikibase instance.<br /> <br /> Our alignment of food composition table data with Wikidata will allow us to leverage the sum of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able to easily reuse images of food items, molecular structure models, and food dishes alongside our projects. <br /> This query from our SPARQL endpoint [https://tinyurl.com/y99qtk7p] lists all of the food items in our project Wikibase that have an associated image in Wikimedia Commons.<br /> <br /> * Ontology Engineering<br /> We will write schemas for the data models related to food composition data and food items. These schemas will serve as the ontology for our knowledge graph. Our Wikibase has a schema namespace that support the Shape Expressions (ShEx) language [http://shex.io/shex-semantics/index.html]. ShEx is a data modeling a data validation language for RDF graphs. We provide an example below of a ShEx schema describing how food composition tables are modeled in our Wikibase. Defining ShEx schemas for our data models allows us to communicate the expected structure of data for a food composition table to others who may like to contribute data to our public Wikibase. We have published the schema in the Schema namespace [https://wikifcd.wiki.opencura.com/wiki/EntitySchema:E1]. <br /> <br /> &lt;code&gt;PREFIX wd: &lt;http://www.wikidata.org/entity/&gt;<br /> PREFIX wbt: &lt;http://wikifcd.wiki.opencura.com/prop/direct/&gt;<br /> PREFIX wb: &lt;http://wikifcd.wiki.opencura.com/entity/&gt;<br /> PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;<br /> start = @&lt;#food_composition_table&gt; <br /> &lt;#food_composition_table&gt; EXTRA p:P1{<br /> wbt:P1 [wb:Q12] ;<br /> wbt:P22 IRI ? ;<br /> wbt:P58 xsd:string ? ;<br /> wbt:P68 xsd:string * ;<br /> wbt:P65 @&lt;#P65_country&gt; *;<br /> wbt:P56 xsd:string *;<br /> wbt:P69 xsd:string *;<br /> wbt:P70 xsd:string *;<br /> }<br /> &lt;#P65_country&gt; {<br /> wbt:P31 [wb:Q127865]<br /> }&lt;/code&gt;<br /> * Validating RDF Graphs<br /> ShEx can be used to validate RDF graphs for conformance to a schema. This allows us to create forms for data contributors that will ensure data consistency. Data contributors will not need to familiarize themselves with our data models, the form-based contribution interaction will guide curation.<br /> <br /> Our ShEx schemas will also be useful when integrating additional RDF data sets as the project matures. When we encounter new RDF data sources we can explore them with the use of our ShEx schemas to determine where they overlap with our existing data models. We will also be able to extend our schemas as the need for greater expressivity or complexity arises. <br /> <br /> * Data Provenance<br /> Our emphasis on reusing data from multiple published sources requires precision in data provenance. The structure of references in the Wikibase data model allows us to assert provenance at the level of the statement. Simply put, we can assign provenance to individual statements of fact in our knowledge graph. In this way we can always be sure of where data was originally found should we need to communicate that to others or follow up with the reference material.<br /> <br /> =Impact=<br /> This project will provide a new FCD knowledge graph that will support queries across multiple FCDs with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data. This knowledge graph will support federated queries with Wikidata and other public SPARQL endpoints that will allow researchers to ask questions of this data in combination with other linked open datasets. The data in the knowledge graph is structured data. Due to the fact that many of these tables were published as PDFs, getting the data into a more readily accessible structured format increases ease of reuse. <br /> <br /> This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names for foods items and plant and animal species in as many human languages as possible.<br /> <br /> Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask. For example, we can now ask questions such as &quot;show me all recipes that call for one or more ingredients containing proanthocyanidins&quot;. <br /> <br /> We will connect scientific publications about the nutritional components of foods with the food items. This is possible because of the existence of roughly 50,000,000 scientific publications in Wikidata. Many of the publications in PubMed are already represented in Wikidata, thus our domain is adequately represented. We will create new Wikidata items for publications we would like to reference if they do not yet exist. Connecting publications with food items in our knowledge graph will allow us to provide additional evidence for researchers to reuse, investigate, and extend.<br /> <br /> The knowledge graph approach allows us to combine food composition data and recipes in the same database, which will enable us to create novel user interfaces for people interested in the nutritional components of home-cooked dishes.<br /> <br /> The Wikibase infrastructure supports both human and algorithmic curation. Thus we can programmatically ingest data from external sources and also support crowdsourced recipes from anyone with access to the internet.<br /> <br /> =People=<br /> * Project manager/nutritional epidemiologist (volunteer) - [https://en.wikipedia.org/wiki/User:Hackfish Mika Matsuzaki]<br /> * Data scientist - [https://www.wikidata.org/wiki/User:YULdigitalpreservation Kat Thornton ]<br /> * Software Engineer- [https://www.wikidata.org/wiki/User:KSN72 Kenneth Seals-Nutt ]<br /> * Food composition advisor/nutritional epidemiologist (volunteer) - Sabri Bromage</div> Hweyl https://wiki.mako.cc/index.php?title=Mika/Temp/WikiFCD/Course&diff=56447 Mika/Temp/WikiFCD/Course 2020-08-06T12:06:37Z <p>Hweyl: /* Tools */</p> <hr /> <div>This page is for planning a course on how to contribute to and use data from WikiFCD.<br /> <br /> = About the course=<br /> <br /> This course aims to introduce students to 1) the issues surrounding FCDs in global nutrition research and 2) the concept of peer production. Students will gain knowledge and practical skills to engage directly in building a global food composition database. <br /> <br /> The course will have two parts. During the first term, the students will learn about peer production and how to contribute to Wiki-based projects. In the second term, students will learn to use the data from WikiFCD to explore their own research questions while they continue to also contribute various FCDs to WikiFCD.<br /> <br /> =Schedule=<br /> <br /> * JHSPH has 4 terms (7 weeks each; 39-40 instruction days) + 1 summer term (8 weeks) and 1 winter intersession term (2 weeks):<br /> : Summer Term: Wednesday, July 1 - Wednesday, August 26 (39 class days)<br /> : 1st Term: Monday, August 31 - Monday, October 26 (40 class days)<br /> : 2nd Term: Tuesday, October 27 - Wednesday, December 23 (39 class days)<br /> : Winter Intersession: Monday, January 4 - Friday, January 15<br /> : 3rd Term: Monday, January 25 - Friday, March 19 (39 class days)<br /> : 4th Term: Monday, March 29 - Friday, May 21 (40 class days)<br /> <br /> * If we can teach in the 1st and/or 2nd term, that would be nice for the grant application to be submitted in November.<br /> * It seems it's about 1 hour instruction time per credit. Students register 16 credits per semester (2 terms).<br /> * Aim for meet once a week for 2 hours = 2 credits?<br /> <br /> =Course Aims=<br /> <br /> # To learn about issues with FCDs in global nutrition research<br /> # To learn about peer production<br /> # To learn about Wikidata and Wikibase<br /> # To learn how to contribute to WikiFCD<br /> # To learn how to use data from WikiFCD<br /> <br /> =Structure of the course=<br /> <br /> * Lecture<br /> * Lab<br /> * Individual projects<br /> <br /> =Course schedule=<br /> <br /> == 1st term: Aim 1 through 4 ==<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Week !! Aim/Learning objectives !! Content !! Assignment !! Assignment Due date<br /> |-<br /> | Week 1 <br /> | Aim 1: Learn about FCDs<br /> | <br /> # Learn about FCDs<br /> # Learn about issues related to FCDs and history of global databases<br /> # Discuss potential solutions<br /> | <br /> # Readings on peer production<br /> # Create an account on Wikimedia<br /> # Wiki Education Wikidata Professional Development Training Modules: Orientation, Introduction to Wikidata<br /> | Week 2<br /> |-<br /> | Week 2<br /> | Aim 2/3:<br /> # Learn about peer production<br /> # Learn about Wikimedia projects, Wikidata, and Wikibase<br /> # Learn how to participate in wiki-based projects<br /> |<br /> # Lecture on peer production<br /> # Discussion on pros and cons of peer production for global FCD development<br /> # Lecture on Wikimedia projects, Wikidata, and Wikibase - pull materials from Wiki Education on Wikidata?<br /> * [https://commons.wikimedia.org/wiki/File:WikidataHumanists.pdf Example of material we could adapt that I created for another event]<br /> # Edit an Wikipedia article<br /> |<br /> # Readings<br /> # Databases and Linked Data<br /> # Make Edits on Wikipedia or Wikidata<br /> # Create an account on WikiFCD<br /> |<br /> |-<br /> | Week 3<br /> | Aim 4:<br /> # Learn more about wiki editing<br /> # Learn about structured data<br /> |<br /> # Lecture on Wiki markup<br /> # Edit Wikidata <br /> |<br /> # Readings<br /> # Edit food/nutrition related pages on Wikidata<br /> |<br /> |-<br /> | Week 4<br /> | Aim 4:<br /> # Learn about WikiFCD and Open Food Facts<br /> # Learn about different ways to import data from elsewhere<br /> # Learn about licensing<br /> |<br /> # Lecture on structured data and how this could help our research<br /> # Examples of good and bad Wikidata pages<br /> # Edit meta data (FCD information)<br /> | <br /> # Readings<br /> # Search and add missing FCD meta data<br /> |<br /> |-<br /> | Week 5<br /> | Aim 4:<br /> * Learn about online communities and collaboration<br /> * OR<br /> * Learn how nutrient data are produced (field trip to USDA chemical analysis lab)<br /> |<br /> * Lecture on successful online communities<br /> * Discussion on how to build a successful global WikIFCD community<br /> * OR<br /> * Field trip.<br /> |<br /> # Readings<br /> # Learn a script to import example FCD csv (develop for this class)<br /> |<br /> |-<br /> | Week 6<br /> | Aim 4: <br /> # Explore ways to connect knowledge bases<br /> | <br /> # Learn about other Wikimedia projects and Wikibase based projects<br /> # Discuss potential research questions that can be answered by connecting these projects<br /> |<br /> # Choose your research proposal topi<br /> |<br /> |-<br /> | Week 7<br /> | individual project consultation<br /> |<br /> |<br /> |<br /> |-<br /> | Week 8<br /> | Presentation on research proposals<br /> |<br /> |<br /> |<br /> |}<br /> <br /> == 2nd Term: Aim 4/5 ==<br /> <br /> * Students will select their own research questions and try to answer them analyzing data from WikiFCD.<br /> * Skills: SPARQL<br /> * Optional skills: statistical software<br /> <br /> {| class=&quot;wikitable&quot;<br /> ! Week !! Aim !! Content !! Assignment !! Assignment Due date<br /> |-<br /> | Week 1 <br /> | Aim 4/5<br /> | <br /> |<br /> |<br /> |-<br /> | Week 2<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 3<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 4<br /> | individual project consultation<br /> |<br /> |<br /> |<br /> |-<br /> | Week 5<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 6<br /> | Aim 4/5<br /> | <br /> |<br /> |<br /> |-<br /> | Week 7<br /> | Aim 4/5<br /> |<br /> |<br /> |<br /> |-<br /> | Week 8<br /> | Presentation<br /> |<br /> |<br /> |<br /> |}<br /> <br /> =Evaluation=<br /> <br /> =Tools=<br /> * Software to extract tabular data from PDF files: [https://www.wikidata.org/wiki/Q96774878]<br /> * WikidataIntegrator: [https://github.com/SuLab/WikidataIntegrator]<br /> * OpenRefine: [https://openrefine.org/]</div> Hweyl