Skip to main content

Wikipedia

Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.

This notebook shows how to load wiki pages from wikipedia.org into the Document format that we use downstream.

Installationโ€‹

First, you need to install wikipedia python package.

%pip install --upgrade --quiet  wikipedia

Examplesโ€‹

WikipediaLoader has these arguments:

  • query: free text which used to find documents in Wikipedia
  • optional lang: default="en". Use it to search in a specific language part of Wikipedia
  • optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
  • optional load_all_available_meta: default=False. By default only the most important fields downloaded: Published (date when document was published/last updated), title, Summary. If True, other fields also downloaded.
from langchain_community.document_loaders import WikipediaLoader
API Reference:WikipediaLoader
docs = WikipediaLoader(query="HUNTER X HUNTER", load_max_docs=2).load()
len(docs)
docs[0].metadata  # meta-information of the Document
docs[0].page_content[:400]  # a content of the Document

Was this page helpful?


You can also leave detailed feedback on GitHub.