Mika/Temp/WikiFCD/Grants: Difference between revisions

From WikiDotMako
Line 47: Line 47:


This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names in as many human languages as possible.
This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names in as many human languages as possible.
Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask.


=People=
=People=

Revision as of 21:29, 20 July 2020

This page is for writing down ideas for grants.

List of potential grants and deadlines

Organization Category Deadline Funding Aims Amount
NSF Information Integration and Informatics (III) under CISE October 29, 2020 - November 12, 2020 for SMALL projects; September 7, 2020 - September 14, 2020 for MEDIUM projects "The III program supports innovative research on computational methods for the full data lifecycle, from collection through archiving and knowledge discovery, to maximize the utility of information resources to science and engineering and broadly to society. III projects range from formal theoretical research to those that advance data-intensive applications of scientific, engineering or societal importance. Research areas within III include:
  • General methods for data acquisition, exploration, analysis and explanation: Innovative methods for collecting and analyzing data as part of a scalable computational system.
  • Domain-specific methods for data acquisition, exploration, analysis and explanation: Work that advances III research while leveraging properties of specific application domains, such as health, education, science or work. Note that projects that simply apply existing III techniques to particular domains of science and engineering are more appropriate for funding opportunities issued by the NSF directorates cognizant for those domains.
  • Advanced analytics: Novel machine learning, data mining, and prediction methods applicable to large, high-velocity, complex, and/or heterogenous datasets. This area includes data visualization, search, information filtering, knowledge extraction and recommender systems.
  • Data management: Research on databases, data processing algorithms and novel information architectures. This topic includes representations for scalable handling of various types of data, such as images, matrices or graphs; methods for integrating heterogenous and distributed data; probabilistic databases and other approaches to handling uncertainty in data; ways to ensure data privacy, security and provenance; and novel methods for data archiving.
  • Knowledge bases: Includes ontology construction, knowledge sharing, methods for handling inconsistent knowledge bases and methods for constructing open knowledge networks through expert knowledge acquisition, crowdsourcing, machine learning or a combination of techniques."
up to $500,000 total budget with durations up to three years

Project Aims

Food, nutrition, and health are some of the most highly engaged topics in the Wikimedia ecosystem, and around the world. Food Composition Data (FCD) is a key piece connecting those three topics, providing nutrient data for each food item. There is a need for an open and structured database for a global FCD and Wikimedia - especially Wikidata - is a perfect place to accommodate some of these data. Due to the diversity and complexity of the existing FCDs, it would be helpful to have a placeholder that can accommodate all the details from the existing FCDs, from which Wikidata project editors can pull information deemed appropriate for Wikimedia. Accommodating all the details are important as the needs within Wikimedia projects can change over time.

We believe that Wikimedia is an appropriate venue to pursue for this project. Many FCDs - which currently come in various different formats (e.g. PDF, CSV) - include varying degrees of details. Nutrient content of unprocessed food items (e.g. apples) can also vary for the same item from different areas and times because of changing characteristics such as climate and terroir. However, the current FCDs are not well-suited for reflecting these changes. In fact, research institutes and intergovernmental agencies have attempted to create a global FCD in the past and none has succeeded to this date. Development and maintenance of such database are difficult if the contributors are limited to small/closed groups of researchers and employees in this field. Importantly, even though there are also wide regional variations in foods that are commonly consumed, some places lack access to regionally appropriate FCD, up-to-date FCD, or FCD in their own languages, leading to disparities in data availability and accessibility and ultimately, in scientific evidence in health research. We need a more open and collaborative system.

First, this Wikibase instance will significantly improve the usability of FCD from different sources for diverse users - from WikiProjects and Wikipedia editors and viewers to academic researchers to public health workers. WikiProject food and Drink on English Wikipedia and its equivalents in other languages are universally popular WikiProjects among editors and likewise, many articles on food and drink are within the top 10% of any Wikipedia's articles by pageviews. This new project can contribute to a topic that is of high interest to many people.

Building a structured dataset is also a key step in identifying most appropriate data to borrow in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not readily available. This new database will also open up ways to explore new research questions to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product, depending on the climate conditions of the year), which can potentially make substantial advances in nutrition and health research.

Secondly, by creating an instance of Wikibase for this project, we will be able to design our own data models, with input from Wikidata, to incorporate data from heterogeneous data sources. If subsets of the data are appropriate for Wikidata, we will be able to provide machine-actionable ShEx schemas that will help us prepare data for other systems. In this way the data will be readily-available for incorporation into Wikidata if desired.

Outputs

  • list products and other outputs
  • Wikibase instance with FCD data from multiple sources
  • SPARQL query code to combine this data with subsets of Wikidata data
  • Data models for food items, food composition tables, recipes, and other resources encoded as ShEx schemas

Methods

Impact

This project will provide a knowledge graph that will support queries across multiple food composition tables with a single search. This will reduce the time that epidemiologists, nutritionists and other researchers spend searching for food composition data.

This project will support multilingual data, reducing barriers to data reuse for speakers of many languages beyond English. Users will be able to query using any of the supported human languages, and see results in the language of their choice. Through the reuse of data from Wikidata, a multilingual knowledge base, we will add common names as well as scientific names in as many human languages as possible.

Our choice to use Wikibase allows us to access the data serialized as RDF. The SPARQL endpoint we have created allows us to ask questions of this data that previously were not possible to ask.

People