Wikidata
10 hours
Keywords
-
WikiData
-
Database
-
Knowledge Graph
-
Wikipedia
-
Community
-
CC0
-
Free licences
-
Collaboration
-
SPARQL
Learning Objectives
- A solid contextual background to understand the benefits of Wikidata and how it relates to the other Wikimedia projects but also how it relates to other databases available on the Internet
- The participant will acquire digital skills in being able to add content to Wikidata but also in being able to write queries in Sparql to get answers on difficult questions
- The participant will be be able to present Wikidata to others and will have enough background to identify how it could be used in professional background
- The participant will better understand which quality processes and evaluating systems favor the constitution of quality data that every person and every machine can use
Materials
- PC with internet connexion
- Internet connexion
- Beamer
- Paper et pen
- Flipchart
- Wikidata
Introduction
WikiData is an open data platform that belongs to the Wikimedia family of websites. It hosts 57 millions items as of June 2019. Wikidata is a free and open knowledge base that contains various data types (eg, text, images, quantities, coordinates, geographical shapes, dates...). The basic entity in Wikidata is an Item. An item can be a thing, a place, a person, an idea, or anything else. What’s specific to Wikidata is that the information is stored in a rigidly structured way to makes it possible for both humans and machines to process. Although it is now a knowledge base fairly widely used by machines, it is still largely unknown from the public as well as from its potential instructors, least understood. The intent of this module is to provide a Train the Trainer course that will provide them with a solid overview of Wikidata and opportunities that are attached to this open source project.
Context
The goal of the session is to practically demonstrate and engage learners on:
- A solid contextual background to understand the benefits of Wikidata and how it relates to the other Wikimedia projects but also how it relates to other databases available on the Internet.
- The participant will acquire digital skills in being able to add content to Wikidata but also in being able to write queries in Sparql to get answers on difficult questions.
- The participant will be be able to present Wikidata to others and will have enough background to identify how it could be used in professional background.
- The participant will better understand which quality processes and evaluating systems favor the constitution of quality data that every person and every machine can use.
Sessions
First session: Discovering and understanding WikiData
This session will provide an overview of what Wikidata is. It will cover its history and allow the participants to understand the why and the benefits of its existence, for the wikimedia projects in particular, and for the Internet in general. The session will cover Wikidata basics, data structure, elements of vocabulary, and essential design features, but also wikidata community and governance overview so that participants capture the fact wikidata, as technical as it seems, is primarily a social construction. Participants will explore Wikidata by themselves and create an account.
Second session: Editing WikiData
The second session will aim to get participants to add content to Wikidata on their own and thus discover some of the editing fundamental processes. It will also take a deep dive into data quality and data evaluation systems, as well as discover most popular tools that allow to fill the gaps in depth and wide, as well as tools for mass editing and import.
Third session: Querying WikiData
This session is meant to get participants discover and experiment the query service for simple queries, then discover sparql querying language for more complex in depth requests through a step-by-step methodology, incrementally increasing the difficulty. The session will include hands-on activities.
Fourth session: Applying WikiData
This session will help participants to understand the applied value of Wikidata, particularly for applications like visualization of complex data, but also to identify potential applications for Wikidata in their own work, whether through visualization, embedding Wikidata in their own software, or using Wikidata on Wikimedia projects. The second part of the session will teach them how to connect with the Wikidata community for a more effective collaboration.
Presentation
-
Discovering and understanding Wikidata
180 minutes
Learning Objectives
- Provide an overview of what Wikidata is
- Explain how the data is stored in the project and why
- Discover the wikidata community and some key elements of community governance
- Understand the interface and create an account
Introduction
Group discussion
Trainer will start the introduction to the module by asking participants about their current and previous experience with wikidata or any wikimedia project. They will also be invited to share information about the context of their participation to the session.
Query opening
The goal would be to help them see how WikiData may practically impact their daily lives and attracts their attention. Show how Wikidata can help “answer” questions that previously would be very hard to do in either domain-specific or non-Wikimedia projects. Think of sharing some examples of questions that Wikipedia theoretically could answer, but would require reading large amounts of Wikipedia (or other sources) to collect that information. Adapt example queries to the audience.
The results of one or two queries will be proposed.
- First demo Ask the audience about the mayor of their city. Is she a woman ? If they do not know, where can they find the information ? mention google, but also personal assistants such as Siri Ask the audience about cities in their country with women mayors. Where or how can they find the information ? Ask them list of countries ordered by the number of their cities with female mayor. Where or how can they find the info ? Run the query. "list of countries ordered by the number of their cities with female mayor" to demonstrate use of data hosted in wikidata.
- Second demo (targetted for a professional audience)
Run the query list of scholarly articles with the word "zika" in the title.
Use to demonstrate link of WikiData with other databases : (here PubMed_ID)
Show third party reuse on scholia.
Overview of Wikidata
Introduction to Wikidata
This section typically ensures that participants will be able to:
- Understand the goals of Wikidata
- Understand how Wikidata relates to other Wikimedia Projects, especially Wikipedia
- Understand why they should be motivated to contribute or use Wikidata, especially in the context of their own work
- Elements it would include are:
- Key concepts: Describing the basic characteristics of Wikidata (free cc0, collaborative, multilingual and language agnostic, interconnected) and it’s goals (repository of human knowledge, support for the Wikimedia projects and 3rd parties, hub in the linked open data web) + general content stats + general access stats.
- How Wikidata serves Wikimedia projects, and how it fills important gaps in infrastructure for the community. I.e. Interlanguage links, or helping prevent Wikipedia Infoboxes go out of date in local language Wikis, especially for knowledge not “local” to the language (i.e. a politician from another language/culture). This should include the history of WikiData creation, and examples of impacts on Wikipedia and Wikimedia Commons (structured data for Commons).
- The WikiData community (WM DE, who runs it, governance brief overview).
- Why WikiData is important to the Internet (examples: CC0 licence allow WD to be the knowledge base for many AI such as Siri or Alexa; data from cultural institutions and civic information more accessible, transparent and neutral process).
Quizz and restitution
A few questions with multiple answers to select. Can be done in pair. After completion, collective restitution and further explanations if required.
Data Structure and how is Data stored in WikiData
Presentation of data structure in WikiData
The goal of this section typically ensures that participants will be able to:
- Understand the basic semantic data model of Resource Description Framework (RDF) and how it compares with the full Wikidata model.
- Understand different concepts relevant to editing Wikidata and how to figure out what content belong in those fields, including: label, description, statement, property, qualifier and references.
- Understand the relationship between “human readable information” (i.e. labels and descriptions) and “machine readable data” (i.e. Q#s and P#s) in the Wikidata interface.
Elements covered
- Vocabulary (types of databases, identifiers, linked data, open data).
- The benefits and disadvantages of linked data.
- Presentation of the different elements of WikiData (items, properties, statements, identifiers, qualifiers, references and rank).
- Reading WikiData. Compare a Wikipedia entry, a Wikidata entry and a Reasonator entry. Understanding the interface of WikiData.
Activity (on laptop)
Individually, participants are asked to:
- Go to Q4859840. What is it ? What could it be confused with?
- Now go to Q1492. In how many places is the information about W1492 used?
- Use the search bar to find a WikiData item you are interested in. Explore its properties and references. Note how it is organized, what could be missing, whether there are sources etc. Think how to improve it.
Tips
- One of the best ways to introduce the RDF/Triple data model is outside of the Wikidata interface. For example from statements that describe the world in language (i.e Earth → highest-point → Mount Everest) transitioning it into statements with labels (i.e. Earth (Q2) → highest-point (P610) → Mount Everest (Q513) ) and showing the final machine-readable statements (Q2 → P610 → Q513). Adapt to the audience.
- Introducing the data model with a quality item (many presentations use Q42 for the Geek culture joke associated with the item, other quality items can be found in Showcase_items.
Sources
- https://www.wikidata.org/wiki/Wikidata:In_one_page
- https://www.wikidata.org/wiki/Wikidata:Showcase_items
WikiData is build through collaboration
Wikidata community
The goal of this section typically ensures that participants will be able to understand that WikiData is primarily a social construction.
- Who adds content to WikiData (humans, machines run by humans, other sources)
- Who defines rules and how. The different roles of the community.
- Behavior and social rules.
- Editorial rules. What gets in and what does not. References.
- Benefits of having an account Tips: • If your audience is interested in contributing to Wikidata at scale or for professional purposes, it’s important to explain how Wikidata decides what the data model, properties and items are. For example, it is important explain the social dynamic of “anyone can create a Q number, but P numbers undergo a tightly controlled community consultation process”.
First editing steps (on laptop)
Get students to create their own account (or plan ahead, there is a limit to daily account creation from same IP). Add a message on their talk page, with a link to the "training session page" set up in advance. Make sure to collect all username created during the session. Suggestions to users:
- create an account. Introduce yourself in the user page.
- Change the preferences to set the favorite language.
- Add gadgets in preferences: "Recoin", "Reasonator", and "labelLister"
- Drop a message on the training page
- Use the search bar to find a WikiData item you are interested in. Check its completion with Recoin.
Debrief and questions / introduction to next session
To wrap up the session, the trainer will facilitate a debriefing moment where participants are encouraged to express their questions, doubts, ideas and feelings toward the topics discussed. Next session will be about learning how to add content to WikiData.
Homework
Before the second session, the participant will be asked to think of which entries s-he would like to improve, search for information, including references and sources, to be shortly presented to the group at the beginning of next session.
References
- https://www.wikidata.org/wiki/Wikidata:Planning_a_Wikidata_workshop
- https://dashboard.wikiedu.org/training/wikidata-professional/evaluating-wikidata
- https://www.wikidata.org/wiki/Help:Contents
- https://www.wikidata.org/wiki/Wikidata:Tools
- https://www.wikidata.org/wiki/Wikidata:Wikidata_educational_resources
-
Editing Wikidata
180 min
Learning Objectives
- Understand how Wikidata content is collected and added to the database
- Learn how to add content to Wikidata, and how to do mass additions to Wikidata
- Discover how the wikidata community collaborate to build the database
- Understand how quality processes are build into the project
- Get introduced to the wonders of getting answers to complicated questions with query systems
Questions about last session and discussion about homework
Introduction to editing
Editing overview and review of data structure
The “Introduction to editing” section typically ensures that participants will be able to:
- Find the editing button and contribute to the label and description of a Wikidata item.
- Find the editing button for an existing property, modify the property with a qualifier and add reference to a Wikidata item.
Review of the fundamental part of a WikiData item (label, identifiers, statements, qualifiers, references, properties and values) Demonstration of how to edit. Discovery of Recoin tool, meant to identify missing information on an item.
Editing activity (on laptop)
- Take the WikiData tutorials : create an item and add a statement
- Then create or improve an item
- Analyze a few items with Recoin
- Further populate items (students should also add references and sources)
- Feedback : collective report by each student about what was done, difficulties met, questions, thoughts.
Tips
- You should demonstrate editing on an item that you have identified before the event, including adding a label, description and common property for that kind of item. Remember to indicate where the edit buttons are for each type of edit.
- You should demonstrate how to add a reference to an existing statement. Much of the value of Wikidata over the long term, comes from the ability to reference statements to reliable sources of material. Like when demonstrating referencing during Wikipedia editing workshops, its important to prepare the reference ahead of time. Remember to describe the common fields for references on Wikidata (if you have custom gadgets for copying references or filling in references using Citoid, make sure to describe how they change the interface).
- If you have custom gadgets or tools installed on your Wikidata account, consider disabling them or discussing how your interface is different from the standard interface for Wikidata.
Read
Wikidata editing processes
The goal of this section is to better understand Wikidata editing processes, in particular insisting on transparency and the possibility to track every edit
Editing and collaboration features presentation
Taking a tour of various features of WikiData, useful for the editor : Userpage, TalkPage, WatchList, Own Contributions. Taking a tour of features helpful to the community to track activity on the site : RecentChange, history of item page, Other users pages, Interaction with Users on their talk pages, list of other participants contributions. Presentation of WikiData strategy based on SoftSecurity and HardSecurity. Demonstration on contributions provided by the trainees during the previous activity.
Activity (on laptop)
Each student takes the time to explore what has just been described. Trainer can ask each participant to write a message on the talk page of another participant etc.
Data quality and Data evaluation
Quality and evaluation presentation
- Provide an overview of content already available on WikiData.
- How to evaluate the quality of an article.
- Using ShowCase items to build an item.
- Further exploring sources and citations (how to deal with conflicting facts, what is a good source of data, what is not).
- Discovery of tools to help fill in gaps, in depth and wide (examples can include TABernacle, Mix'n'Match, the WikiData Game, The
- Distributed game, Wikishootme, Terminator. To be chosen depending on the audience).
- Introduction to a tool for mass editing : quickstatements.
- Presentation will include discussion, questions and answers.
Tips
Showcase or demonstrations can typically include:
- Mix'n'Match: the tool list entries of some external database and allow users to match them against Wikidata items
- WikiShootMe: this is a tool to show Wikidata items, Wikipedia articles, and Commons images with coordinates, all on the same map.
It facilitates addition of images to Wikimedia projects.
- Wikidata games : A set of games to quickly add statements to Wikidata.
Read
Basic introduction to WikiData queries using Query Helper
Participants will be able to:
- Understand how to read and modify a basic Wikidata Query using both SPARQL and the Query Helper.
- Understand how to construct super basic queries that return multiple variables and labels.
- Understand what kinds of visualizations are available for using Wikidata Queries.
- What are queries and what is Sparql
Discovery of the Wikidata Query Service
Demonstration with a simple query from examples and modification of the initial query. Show different visualizations.
Activity (on laptop if time permit)
Participants play with the Query Service: exploring the examples with different visualization options, modification of the examples for the most daring.
Debrief and questions
To wrap up the session, the trainer will facilitate a debriefing moment where participants are encouraged to express their questions, doubts, ideas and feelings toward the topics discussed. Introduction to next session. The following session will be dedicated to queries.
Homework
- Search for 3 ideas of querry questions to submit at the next sessions
- Read Wikidata Glossry
References for the trainer
-
Querying Wikidata
120 min
Learning Objectives
- Examine the interest of query language and tools to get answers to complicated questions
- Motivate participants to learn how to do basic queries or even advanced queries
- Learn how to do queries in SPARQL
Queries
Understand the Query Service
The query service will have been very briefly introduced during session 2. This will be looked at more in depth.
Discover SPARQL in depth and know how to do queries
However, with many audiences in professional situations it is necessary to provide a separate environment for better understanding the Sparql languages and how to build strong queries.
Participant create their own queries (in laptop)
Let audience write their own queries with the help of the Query Service
Tips
- Remember, for many audiences, it will be the first time that they work with a Query language (SQL, SPARQL, etc). Therefore its very important that you start from the most basic elements, and incrementally increase the complexity of the information. A good tactic for doing this, is building several queries of different complexity in front of the audience. For example queries, see Wikidata:SPARQL query service/Building a query.
- Make sure to demonstrate how to take an example query and modify it to suit a separate need. Many individuals will not immediately be writing Wikidata queries from scratch, but instead modifying existing queries. The Wikidata Query helper is particularly useful for audiences modifying queries.
- Make sure to include both guided query writing, where the instructors guides the writing of the query, and a window of time for the audience to write their own queries. Having participants write their own query, reinforces the process and skills of modifying or writing a query, so that they can confidently do so in the future.
- If the audience is mostly Wikimedians, make sure to spend time showing them how to apply Wikidata Queries to various tools of used by the Wikimedia community, including Petscan, Listeriabot, and Histropedia.
Debrief and questions
To wrap up the session, the trainer will facilitate a debriefing moment where participants are encouraged to express their questions, doubts, ideas and feelings toward the topics discussed.
Homework
Set up a campaign on ISA tool and ask the participants to participate to the campaign before the next session.
References
-
Applying Wikidata
120 min
Learning Objectives
- Identify the applied value of Wikidata, particularly for applications like visualization of complex data
- Identify potential applications for Wikidata in their own work, whether through visualization, embedding Wikidata in their own software
- Know how to contact and connect to members of the Wikidata community
Questions about last session and discussion about homework
Feedback on sparql and queries. Results of the ISA campaign and relationship with the session of the day (use of wikidata info to improve image description with structured data).
(Potential) applied value of Wikidata
Overview and showcases. This section can be divided in 3 parts:
- The first could provide an overview of content already available on WikiData : figures and stats per theme.
- The second could provide examples of how Wikidata is used by Wikimedia projects.
Examples suggested include (to be selected based on the audience):
- a tool to explore the content gender gap on Wikipedia such as Denelezh.
- the use of Listeria bot to create RedLists of missing articles. See Category:Lists_based_on_Wikidata.
- the use of Wikidata to add notices of authority on Wikipedia articles. Example of Doris Day.
- the use of Wikidata to keep infoboxes on Wikipedia up to date. Example of Jack Spicer.
- the use of the parser function to directly display information from WikiData in the articles (see How_to_use_data_on_Wikimedia_projects).
- the use of magic infoboxes on Wikipedia (such as the use of {{infobox fromage}} or {{infobox biographies2}} on the French Wikipedia).
- The third would provide examples of how Wikidata is used by third parties.
Examples suggested to present include (to be selected based on the audience):
- various visualization tools such as: The Wikidata Graph Builder or Histropedia (timeline application based on Wikidata)
- crotos, a search and display engine for visual artworks based on Wikidata and using Wikimedia Commons files
- Scholia: a search and display engine for academics based on Wikidata
Read
- Denelezh
- Lists based on Wikidata
- Doris Day
- Jack Spicer
- How to use data on Wikimedia projects
- Showcases
- Wikidata knowledge as a service
- Histropedia
- Angryloki
- Crotos
- Sholia
Activity (on laptop)
Provide participants with the list of selected tools and let them explore.
Connect with the WikiData community
How to connect
The goal of this section is to know how to contact and connect to members of the Wikidata community. How to stay in touch with the latest news of the project and where to find resources. Review options to discuss with a wikidatien on a talk page. Introduce WikiProjects and explain how to join them or interact with its members. Explain the benefits of those WikiProjects, but also their governance (freedom to join etc.) Ideally choose to show a wikiproject relevant to the audience. Suggest joining the wikidata mailing list, technical mailing lists, irc channel, telegram etc... all available on WikiData main page. Interact with audience to see what would be most appropriate for them. An interesting point of entry to show how the community interacts is to show the Request for Comment page. Suggest contact with the local chapter if there is one in the country, and with the wikidata community usergroup. Point to the Wikidata annual conference, navigate the program and point to the presentations provided (at least slides, sometimes records). More generally events.
Read
- WikiData WikiProjects
- Requests for comment
- WikiDataCon 2019
- WikiData Events
- WikiData movement affiliates
- WikiData Community User Group
Homework
Search in the Zenodo open repository and find your three favourite papers on FLOSS and education. Document and/or post a comment on them.
Debriefing
To wrap up the session, the trainer will facilitate a debriefing moment where participants are encouraged to express their questions, doubts, ideas and feelings toward the topics discussed.
References