Prediction market platforms each write their own question titles. Polymarket might list “Will BTC reach 100kby2026?"whileLimitlesslists"Bitcointohit100,000 before 2026?” and Manifold phrases it differently again. Before Predexy can compare prices or detect arbitrage, it must first establish that these markets refer to the same outcome. That is the job of the matching system.
Why title-matching alone fails
Simple text matching — comparing words directly — breaks down quickly in practice. Synonyms, date formats, capitalization choices, and varying levels of specificity all produce titles that mean the same thing but look different to a string-comparison algorithm. Naive keyword matching also creates false positives: two questions that share words but ask about different events can appear more similar than they actually are.
How semantic matching works
Predexy uses semantic embedding vectors to compare market titles. When a new market arrives, its title is converted into a vector representation and compared against existing canonical questions. Semantically similar titles cluster together even when the exact words differ.
Vector similarity is then combined with lexical and structural signals — entity hints, time window alignment, and category — to produce a composite confidence score. This hybrid approach reduces both false positives (unrelated markets incorrectly matched) and false negatives (related markets missed because the text looks different).
Three match methods
Every linked market-to-question pair carries a match_method that tells you how the connection was established:
| Method | How it works |
|---|
semantic | Matched automatically using embedding similarity and lexical signals |
manual | Confirmed or corrected by a human reviewer |
exact | Matched by an identical platform market ID or title string |
Confidence scores
The matching engine assigns a confidence score between 0 and 1 to every proposed link. Three bands determine what happens next:
| Confidence | Action |
|---|
| > 0.85 | Auto-accepted — the link is created without human review |
| 0.70–0.84 | Queued for manual review — a human confirms or rejects |
| < 0.70 | Auto-rejected — the markets are not linked |
You can see the confidence score for each linked market in the markets[]
array returned by GET /api/v1/questions/{id}. Use it to gauge how certain
Predexy is that two listings represent the same event.
QuestionMarket fields
Each entry in the markets[] array of a question detail response includes matching metadata alongside pricing data:
| Field | Type | Description |
|---|
confidence | number (0–1) | Composite match confidence score |
match_method | semantic | manual | exact | How the match was produced |
semantic_similarity | number (0–1) | Raw cosine similarity from the embedding comparison |
A semantic_similarity close to 1.0 means the market titles are nearly identical in meaning; a value closer to 0.7 indicates a borderline match that may have required human review.
Why this matters for arbitrage
Arbitrage detection runs only on questions where at least two platforms are matched. A false match — linking two markets that actually refer to different events — would create a phantom arbitrage opportunity. Because one position would resolve Yes while the other resolves No (or vice versa), acting on a false match is not a risk-free trade; it is an uncovered bet.
Always check the confidence and match_method on both legs of an arbitrage
opportunity. A semantic match with confidence near 0.70 is borderline —
you may want to verify the market titles manually before committing capital.
Strict matching thresholds are a risk control as much as a data-quality feature. Predexy deliberately errs on the side of rejecting uncertain matches rather than passing them downstream to the arbitrage scanner.