Close Menu
Finletix
  • Home
  • AI
  • Financial
  • Investments
  • Small Business
  • Stocks
  • Tech
  • Marketing
What's Hot

Nvidia’s AI empire: A look at its top startup investments

October 12, 2025

I Used ChatGPT to Plan a Trip to Tunisia, While My Partner Used Claude

October 12, 2025

I Turned Down NYU for a Debt-Free Community College Path

October 12, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Finletix
  • Home
  • AI
  • Financial
  • Investments
  • Small Business
  • Stocks
  • Tech
  • Marketing
Finletix
Home » New project makes Wikipedia data more accessible to AI
AI

New project makes Wikipedia data more accessible to AI

arthursheikin@gmail.comBy arthursheikin@gmail.comOctober 1, 2025No Comments3 Mins Read
Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Email

[ad_1]

On Wednesday, Wikimedia Deutschland announced a new database that will make Wikipedia’s wealth of knowledge more accessible to AI models.

Called the Wikidata Embedding Project, the system applies a vector-based semantic search — a technique that helps computers understand the meaning and relationships between words — to the existing data on Wikipedia and its sister platforms, consisting of nearly 120 million entries.

Combined with new support for the Model Context Protocol (MCP), a standard that helps AI systems communicate with data sources, the project makes the data more accessible to natural language queries from LLMs.

The project was undertaken by Wikimedia’s German branch in collaboration with the neural search company Jina.AI and DataStax, a real-time training-data company owned by IBM.

Wikidata has offered machine-readable data from Wikimedia properties for years, but the pre-existing tools only allowed for keyword searches and SPARQL queries, a specialized query language. The new system will work better with retrieval-augmented generation (RAG) systems that allow AI models to pull in external information, giving developers a chance to ground their models in knowledge verified by Wikipedia editors.

The data is also structured to provide crucial semantic context. Querying the database for the word “scientist,” for instance, will produce lists of prominent nuclear scientists as well as scientists who worked at Bell Labs. There are also translations of the word “scientist” into different languages, a Wikimedia-cleared image of scientists at work, and extrapolations to related concepts like “researcher” and “scholar.”

The database is publicly accessible on Toolforge. Wikidata is also hosting a webinar for interested developers on October 9th.

Techcrunch event

San Francisco
|
October 27-29, 2025

The new project comes as AI developers are scrambling for high-quality data sources that can be used to fine-tune models. The training systems themselves have become more sophisticated — often assembled as complex training environments rather than simple datasets — but they still require closely curated data to function well. For deployments that require high accuracy, the need for reliable data is particularly urgent, and while some might look down on Wikipedia, its data is significantly more fact-oriented than catchall datasets like the Common Crawl, which is a massive collection of web pages scraped from across the internet.

In some cases, the push for high-quality data can have expensive consequences for AI labs. In August, Anthropic offered to settle a lawsuit with a group of authors whose works had been used as training material, by agreeing to pay $1.5 billion to end any claims of wrongdoing.

In a statement to the press, Wikidata AI project manager Philippe Saadé emphasized his project’s independence from major AI labs or large tech companies. “This Embedding Project launch shows that powerful AI doesn’t have to be controlled by a handful of companies,” Saadé told reporters. “It can be open, collaborative, and built to serve everyone.”

[ad_2]

Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
Previous ArticleTelegram CEO Says This Is How He Looks for the Best Engineers
Next Article High School Dropout Founds YC Startup Nozomio, Raises $6 Million Seed
arthursheikin@gmail.com
  • Website

Related Posts

Nvidia’s AI empire: A look at its top startup investments

October 12, 2025

Ready or not, enterprises are betting on AI

October 11, 2025

It’s not too late for Apple to get AI right

October 11, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Intel cuts 15% of its staff as it pushes to make a comeback

July 24, 2025

Tesla’s stock is tumbling after Elon Musk failure to shift the narrative

July 24, 2025

Women will soon be able to request a female Uber driver in these US cities

July 24, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Finletix — Your Insight Hub for Smarter Financial Decisions

At Finletix, we’re dedicated to delivering clear, actionable, and timely insights across the financial landscape. Whether you’re an investor tracking market trends, a small business owner navigating economic shifts, or a tech enthusiast exploring AI’s role in finance — Finletix is your go-to resource.

Facebook X (Twitter) Instagram Pinterest YouTube
Top Insights

French companies’ borrowing costs fall below government’s as debt fears intensify

September 14, 2025

The Digital Dollar Dilemma: Why Central Banks Are Rushing to Create Digital Currencies

September 1, 2025

FCA opens investigation into Drax annual reports

August 28, 2025
Get Informed

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2026 finletix. Designed by finletix.
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms and Conditions

Type above and press Enter to search. Press Esc to cancel.