How Predexy Matches Markets Across Platforms

Prediction market platforms each write their own question titles. Polymarket might list “Will BTC reach

100k by 2026?" while Limitless lists "Bitcoin to hit

100,000 before 2026?” and Manifold phrases it differently again. Before Predexy can compare prices or detect arbitrage, it must first establish that these markets refer to the same outcome. That is the job of the matching system.

Why title-matching alone fails

Simple text matching — comparing words directly — breaks down quickly in practice. Synonyms, date formats, capitalization choices, and varying levels of specificity all produce titles that mean the same thing but look different to a string-comparison algorithm. Naive keyword matching also creates false positives: two questions that share words but ask about different events can appear more similar than they actually are.

How semantic matching works

Predexy uses semantic embedding vectors to compare market titles. When a new market arrives, its title is converted into a vector representation and compared against existing canonical questions. Semantically similar titles cluster together even when the exact words differ. Vector similarity is then combined with lexical and structural signals — entity hints, time window alignment, and category — to produce a composite confidence score. This hybrid approach reduces both false positives (unrelated markets incorrectly matched) and false negatives (related markets missed because the text looks different).

Three match methods

Every linked market-to-question pair carries a match_method that tells you how the connection was established:

Method	How it works
`semantic`	Matched automatically using embedding similarity and lexical signals
`manual`	Confirmed or corrected by a human reviewer
`exact`	Matched by an identical platform market ID or title string

Confidence scores

The matching engine assigns a confidence score between 0 and 1 to every proposed link. Three bands determine what happens next:

Confidence	Action
> 0.85	Auto-accepted — the link is created without human review
0.70–0.84	Queued for manual review — a human confirms or rejects
< 0.70	Auto-rejected — the markets are not linked

You can see the confidence score for each linked market in the markets[] array returned by GET /api/v1/questions/{id}. Use it to gauge how certain Predexy is that two listings represent the same event.

QuestionMarket fields

Each entry in the markets[] array of a question detail response includes matching metadata alongside pricing data:

Field	Type	Description
`confidence`	number (0–1)	Composite match confidence score
`match_method`	`semantic` \| `manual` \| `exact`	How the match was produced
`semantic_similarity`	number (0–1)	Raw cosine similarity from the embedding comparison

A semantic_similarity close to 1.0 means the market titles are nearly identical in meaning; a value closer to 0.7 indicates a borderline match that may have required human review.

Why this matters for arbitrage

Arbitrage detection runs only on questions where at least two platforms are matched. A false match — linking two markets that actually refer to different events — would create a phantom arbitrage opportunity. Because one position would resolve Yes while the other resolves No (or vice versa), acting on a false match is not a risk-free trade; it is an uncovered bet.

Always check the confidence and match_method on both legs of an arbitrage opportunity. A semantic match with confidence near 0.70 is borderline — you may want to verify the market titles manually before committing capital.

Strict matching thresholds are a risk control as much as a data-quality feature. Predexy deliberately errs on the side of rejecting uncertain matches rather than passing them downstream to the arbitrage scanner.

Get Started

Core Concepts

Guides

Developer Console

Reference

How Predexy Matches Markets Across Platforms

Why title-matching alone fails

How semantic matching works

Three match methods

Confidence scores

QuestionMarket fields

Why this matters for arbitrage

Get Started

Core Concepts

Guides

Developer Console

Reference

​Why title-matching alone fails

​How semantic matching works

​Three match methods

​Confidence scores

​QuestionMarket fields

​Why this matters for arbitrage

Why title-matching alone fails

How semantic matching works

Three match methods

Confidence scores

QuestionMarket fields

Why this matters for arbitrage