back21 Jun 20217 min read
blog post image

Confidence Intervals for Market Data Oracles

  • In different markets, uncertainties will vary due to market structure.
  • Pyth data providers stream a price along with the confidence (uncertainty) that they estimate using information available to them.
  • For example, a data provider focused on a particular market could represent confidence as the bid / ask spread on that venue.
  • Alternatively, focused on multiple venues, a data provider could represent confidence as a function of where they expect trades to occur.

Measurement Uncertainty

In scientific and engineering fields, an observation or measurement is almost always accompanied by a measurement uncertainty.

The distance between two points might be measured to be 10.12m ± 0.01m. The time it took for a car to travel that distance might be measured to be 1.23s ± 0.05s. The measurement uncertainty is the observer’s best estimate of how far off from the “true” value their measurement is likely to be, given the precision of their measuring equipment, the difficulty in making the measurement, and potentially other factors the observer deems important.

Why Do Measurement Uncertainties Matter?

1. They indicate how much precision or uncertainty there is in a single observation.

2. They allow one to combine multiple independent observations of the same thing in a more “optimal” way, giving more weight to observations with smaller uncertainty and less weight to observations with greater uncertainty.

3. When measurements are combined to make derived quantities (distance and time measurements in the example above combined to derive a speed estimate, for example), uncertainties in the raw measured values can be propagated to make uncertainties in the derived quantities (speed estimate in this example would be 8.2 ± 0.3 m/s).

4. Independent measurements of the same thing that do not agree with each other to within their respective uncertainties are an indication of a problem with one or more of the measurements.

Market participants are typically not accustomed to thinking about measurement uncertainty when it comes to price in financial markets, but the concept can easily be extended there.

Multiple data providers on the Pyth network are reporting a measurement of the price of an asset. Each of them is considering different data sources (which they are legally entitled to use for these purposes), and reporting at most once per Solana slot (~400ms) and asynchronously (so quoter A and quoter B might report prices they observed 200ms apart from each other in the same Solana slot).

Let’s look at some examples of different kinds of markets and what kind of uncertainties different quoters might report.

US Equities

Consider a single stock (say TSLA) at a single stock market. “What is the price of TSLA?” is a seemingly simple question, but there are subtleties even here. At any given moment in time, there is no “one price” for TSLA. There is the best bid price (the price you could currently sell TSLA at) and the best ask price (the price you could currently buy TSLA at). They are typically close together, but the difference between them (the so-called bid-ask spread) could certainly be viewed as an uncertainty of what the “price” of TSLA currently is, with the midpoint (halfway between the best bid and best ask) being an estimate of the price itself.

Another “price” that could be considered the proper price are actual transaction prices. “Last traded price” is a price that is often reported, and especially in a thinly traded stock, might be even more representative of the true price than the current best bid and best ask prices (where the bid-ask spread might be very wide).

Many people don’t realize that the US stock market includes 13 different exchanges (such as NYSE, Nasdaq, BATS, IEX, and BSX) and a larger number of alternative trading venues. An individual stock like TSLA might be “listed” on Nasdaq, but it trades on all of these different venues. Each exchange has its own order book where buy and sell orders are placed, and trades can take place on any of them. US regulations (Reg NMS) prohibit display of locking or crossing orders (an order that would immediately match against an equal or better priced order at another trading venue). This generally keeps the various exchanges from having differing prices, though some of the smaller exchanges will often have a wider bid-ask spread than the larger more liquid exchanges.

Price and Uncertainty by Pyth Data Providers

Suppose quoter A exclusively uses data from one of the US exchanges, and they are basing their Pyth quote on the order book of that exchange. They may well decide that their best estimate of TSLA price is the midpoint of that exchange’s bid and ask price, with an uncertainty of half the bid-ask spread. If the best bid on that exchange was $10.25 and the best ask was $10.35, they would report TSLA price as $10.30 ± $0.05.

Suppose quoter B is using data exclusively from a different exchange, where TSLA is more heavily traded and the bid-ask spread is narrower. If that bid was $10.26 and that best ask was $10.28, they would report the TSLA price as $10.27 ± $0.01.

Suppose quoter C is a proprietary trading firm that actively trades TSLA on many different trading venues. They might be constructing their estimate of TSLA price based on the prices of their own trades at all venues. For example, they may be buying at a price of $10.27 on exchange D (who is not a quoter) and selling at a price of $10.265 (via hidden midpoint orders on a dark pool that is also not a quoter) and report a TSLA price of $10.2675 ± $0.0025.

Suppose quoter D is also a proprietary trading firm, but they are not as active in TSLA. If their last trade in TSLA was 15s ago at a price of $10.19, they might take their estimate of TSLA’s price volatility and increase their quote’s uncertainty like the sqrt of time since their last trade (treating price like a random walk). They might report a TSLA price of $10.19 ± 0.03.

All of these examples show how individual quoters might derive both a price and an uncertainty from the particular data that they have access to in the case of US equities. You will notice as well that the quotes are all consistent with each other (i.e. the difference between quotes is not big compared to the uncertainties of the quotes). The Pyth aggregation algorithm combines these quotes on-chain and gives more weight to the quotes with smaller uncertainties.

FX Markets

Unlike US equity markets (which are “all to all markets” meaning that everyone’s orders can match with everyone else’s orders and there is at least a single notion of “best price”), many FX markets are bilateral or multilateral markets, where participants can specify which counterparties they are willing to trade with (or which ones they are not willing to trade with). This results in an even more complex “fragmented market” where there is not an accepted measurement of “best price”.

Crypto Markets

Unlike the highly regulated US equity market, global crypto markets are not required to route orders to better priced markets. Different exchanges can and do “cross” each other often, with the best bid price on exchange A being higher than the best ask price on exchange B.

Furthermore, because trading fees can be quite significant, these price differences can and do persist for substantial periods of time, because they cannot be arbitraged away due to the high fees. If Bitcoin is trading (bid, ask) at (30,000.00, 30,000.05) on market A, but simultaneously (and for an extended period) trading 1.5 bps higher on market B with a (bid, ask) at (30,005.00, 30,005.05), then what is “the price of Bitcoin”? In a very real sense, there is a ~$5 uncertainty in the price of Bitcoin, because it is trading simultaneously at two different exchanges at prices that are $5 apart from each other!

Suppose quoter A was just looking at market A. They might report 30,000.025 ± 0.025.

Suppose quoter B was just looking at market B. They might report 30,005.025 ± 0.025.

Suppose quoter C was looking at both books. They might report 30,002.5 ± 2.5, taking the midpoint between the two markets as their estimate of the price, half the spread between the two markets as the uncertainty.

Note in this case that the three quoters are no longer consistent with each other. Quoter A and Quoter B’s prices differ by 5, while each claiming an uncertainty of 0.025 (a 200 sigma difference). The Pyth aggregation algorithm will automatically combine these quotes differently than in the US equity case, resulting in an aggregate quote uncertainty that reflects this inconsistency.

As an extreme example of price “uncertainty” in crypto markets, consider the price of ETC when it became difficult to transfer as a result of so-called 51% attacks. In response to the attacks, many exchanges dramatically increased the number of blockchain confirmations required before they would credit ETC deposits to trader’s accounts. This meant that it took weeks to move ETC between exchanges, effectively making arbitrage of the price differences between exchanges impossible. It was not uncommon to see prices that were as much as 10% different from each other for days or weeks at a time. In such a situation, the notion of price uncertainty becomes painfully apparent.

Beyond this dramatic ETC example, smaller magnitude and shorter duration price dislocations between venues are quite common in the crypto space, for example when exchanges have to halt withdrawal or deposits of a particular coin for a reason like a wallet upgrade, or moving inventory out of cold storage. The Pyth Network empowers data consumers by continuously publishing a consolidated estimate of these important dislocations.

We can’t wait to hear what you think! You can join the Pyth Discord and Telegram, and follow us on Twitter. You can also learn more about Pyth here.

Stay Updated with Pyth

Stay informed about Pyth network's development and upcoming events!