back16 Jun 202314 min read
blog post image

What is a Blockchain Oracle?

A blockchain oracle serves as a bridge between blockchains and the outside world, enabling smart contracts to access off-chain data to power their services.

Oracles are typically third-party services that source, verify, and transmit external information to smart contracts running on the blockchain. They expand the functionality of smart contracts by providing a mechanism to interact with off-chain data to perform valuable tasks and services. Without oracles, smart contracts would be limited to on-chain data and unable to access external information.

Oracles serve as a bridge between the external world and the world of smart contracts.

To provide a rudimentary example: in a scenario where Alice and Bob place a bet on the outcome of a horse race, both players can lock their funds in a smart contract which releases the funds to the winner based on the real-world outcome of the race. Although the smart contract cannot directly interact with the external world, a third-party oracle can query a trusted API to retrieve the result, and transmit the result to the smart contract to determine the winner, and enable the contract to distribute the funds accordingly.

Note that oracles are not the data source itself but rather a layer that retrieves, verifies, and relays external data to smart contracts. They can transmit various types of information, such as price data, payment confirmations, or sensor measurements.

Moreover, oracles must transmit this data while preserving the characteristics inherent to smart contracts: trustlessness and decentralization. This is essentially the oracle problem: ensuring the reliability, authenticity, and the trustworthiness of off-chain data served to smart contracts while also eliminating single points of failure and centralization.

Types of Oracles

There are many types of blockchain oracles for different purposes. We can categorize oracles based on the source of data (hardware or software), the direction of information (inbound or outbound), and the trust model (centralized or decentralized). Each oracle type brings unique capabilities and benefits.

  • Hardware oracles: gather data from the physical world, such as information from motion sensors or RFID sensors.
  • Software oracles: collect data from digital sources like websites, servers, or databases. Commonly used to provide real-time data like exchange rates or price variations.
  • Inbound oracles: mainly deliver off-chain or real-world data to the blockchain. Can be used to trigger specific actions based on off-chain events.
  • Outbound oracles: send blockchain data to the external world. Can provide updates on on-chain events to external systems.
  • Centralized oracles: managed by a single entity and rely on a single source of information. Can pose risks as they introduce a single point of failure, making contracts vulnerable to attacks.
  • Decentralized oracles: leverage multiple sources of information and consensus mechanisms to provide more reliable and tamper-resistant data. Can minimize counterparty risk and enhance the trustworthiness of the information used by smart contracts.
  • Human oracles: individuals with specialized knowledge who act as a source of data. They can gather information, verify its legitimacy, and convert it into smart contracts. Human oracles can use cryptographic techniques to verify their identity and provide trustworthy data.
  • Contract-specific oracles: designed for specific smart contracts and serve their individual needs. However, they require additional effort to maintain and may not be suitable for general-purpose use.
  • Computation oracles: perform complex calculations and return calculated results that are difficult or costly to perform on-chain. Can be particularly valuable in scenarios where network gas constraints and high computation costs pose limitations.

Oracles for Decentralized Finance

Blockchain oracles are essential for any sophisticated and valuable smart contract service. The use cases for oracles span across numerous industries by tracking data related to geolocation data (supply chain analysis, IoT), sporting (prediction markets), weather (travel, farming), time and interval data (automation) and—our main research focus—financial and capital markets data.

The decentralized finance (DeFi) industry promises to unlock more efficient, more transparent, and fairer markets for the global community. In order to do so, DeFi applications will need dependable, trustless access to a wide gamut of data: asset prices (from cryptocurrencies to real estate), benchmark reference data (interest rates, funding rates), volatility and market impact data, and more.

Indeed, the rapid expansion of the industry since the “DeFi Summer” of 2020 has highlighted the critical need for oracle market data that is comprehensive, available, and robust. Furthermore, oracle infrastructure needs to provide high-quality data, be seamlessly integrable with any L1/L2 blockchain, and be ready to scale with the growing demands of an increasingly sophisticated DeFi ecosystem.

Price feed oracles remain the primary and most discussed oracles in DeFi. The history of price feed oracle design is almost as long as that of smart contracts, but existing architectures are now showing their limitations.

For the remainder of this discussion, we will cover a few topical questions:

  • Why do we need blockchain and price feed oracles, and why are they important?
  • What do current oracle designs entail, and are they effective?
  • What alternative designs might overcome existing limitations?

It has become clear that oracles will continue playing a critical role within blockchain, but that existing oracle networks have fallen short and are insufficient in scaling DeFi to where it needs to be. Legacy solutions often depend on intermediary parties (nodes) corroborating and aggregating data, leading to time delays, opaque data sourcing, and knock-on scaling issues due to costs.

A new oracle network architecture is emerging, which focuses on pull instead of push models, and incentivizing highly credible data owners and creators to share their data.


Why Do We Need Price Oracles?

The leading category of oracles are known as price feed oracles, which provide pricing data on cryptocurrency assets, equities, commodities, and more.

To help illustrate their importance, let’s consider various examples:

  • Derivative Protocols: must provide traders with accurate asset prices, and facilitate liquidations in a timely manner once positions become under-collateralized.
  • DEX Aggregators: liquidity is sourced from various decentralized exchanges, meaning that accurate oracle pricing is needed for identifying the best prices and executing trades with minimal slippage.
  • Stablecoins: crypto-collateralized stablecoins require oracle data to ensure that positions are adequately collateralized, and that they properly maintain their peg.
  • Borrowing/Lending Protocols: these often rely upon dynamic borrowing and lending rates, which are a function of current asset prices. Delayed or inaccurate prices can harm liquidity health as a whole, particularly during volatile periods.

We can’t solely rely on a single party to provide this data, as that would represent a centralized point of failure which contrasts with the ethos of DeFi. Instead, we require data that is immune to tampering and yet timely.

This is easier said than done, as their importance within DeFi frequently makes oracles major targets for exploits. Yet, having trustworthy and robust data sources is crucial for any project in DeFi. This is why oracles are commonly referred to as the backbone of DeFi. As the DeFi space continues to evolve and expand at an accelerating pace, the need for quick and reliable access to data that’s immune to exploits will become increasingly paramount.

Now that we have a background on oracles, let’s examine existing architectures.


The Current State of Price Oracles

A common oracle network design is known as a Reporter Oracle Network, which relies upon multiple independent nodes acting as intermediaries between data sources and blockchain-based applications (the end users).

In Reporter Networks, intermediary nodes are responsible for retrieving data from off-chain sources, such as market data specialists or public APIs, and then relaying that information “over the last mile” to its final destination—the blockchain. These nodes would also be responsible for performing data aggregation, validation, and verification.

For example, let’s imagine that 100 nodes are tasked with retrieving the price of BTC at a given point in time. They would retrieve the price from various data sources (for example, they might use 30 data sources each on average) and then aggregate their responses to generate a single average or median output. The majority of nodes might end up with the correct price, whereas a fraction of the nodes may have used poor data sources and provided an incorrect response. The oracle network would then aggregate the responses of the majority, and submit that as the correct response.

Economic incentives are often put in place to keep these nodes working and honest. Nodes that are accurate could receive rewards in the form of token incentives, whereas those that were inaccurate could be penalized through mechanisms like slashing.

This oracle design has a few key advantages:

  1. Security: having various data sources and intermediary nodes means it is challenging for any given party to manipulate the network and influence the final price output
  2. Data Sources: a wide range of data sources are available which ensures a wide range of information is available to the oracle, often increasing accuracy and reliability
  3. Blockchain Agnostic: any blockchain network may incorporate this design, as they already rely upon nodes for block validation

However, this design also has several drawbacks.

It is inefficient to have many nodes corroborating data with one another and then aggregating the data and performing consensus. This process can be slow, and existing deployments can update their data approximately every 15 minutes, which is insufficiently slow for scaling blockchain on a global level. Associated network costs (e.g. ETH gas fees) can also quickly add up for frequent updates across many asset pairs, reducing the number of available asset pairs without the need for extremely high subsidies and workarounds for network congestion.

The increasing gas fees necessary to support a growing network of nodes need to be passed down to the end user or subsidized. This limitation inhibits the scalability of a Reporter Network in terms of supporting more data or users.

Furthermore, data sources in Reporter Networks are often opaque. In these networks, data is usually aggregated off-chain and brought on-chain in a non-transparent manner—in direct contrast to blockchain’s goal of transparency. So while the node entity providing the data is known, their ultimate data sources are not.

This is particularly concerning during highly volatile times when various data sources may update infrequently or with a lack of granularity. In fact, the data sources upstream may not even be aware that their data is being used to secure smart contract value, resulting in further issues of data quality and accountability. This does not even broach the issue of data legality: some data suppliers do not permit their data to be reported to public ledgers, as they wish to limit distribution to subscribers.

The Reporter Network design is specialized for publicly obtainable data on-chain—and such solutions have played a significant role in advancing DeFi to its current state. However, as we strive to bring DeFi to billions of users on a global scale, it is essential to address the limitations of legacy oracle architectures.

In a previous article) comparing Reporter Oracle Networks with a newer architecture, we’ve highlighted the need for oracle solutions which are more transparent, cost-efficient, and scalable. Price oracles of the future need to be ready to scale to all the trading pairs we’re used to in the TradFi realm and support the blockchains developers choose to build on.

The Pyth oracle network introduces a Publisher Oracle Network design which rethinks the type of data a price oracle should retrieve, data selected data sources, and the relationship between data owners and data users. Let’s examine this alternative.


Rethinking Price Oracles

The financial data industry is massive. The largest US exchanges turnover several billions in dollars on revenue for selling market data alone. Given this observation, it may be prudent to change some of our fundamental assumptions on where price oracles should source their data.

There is, for example, publicly available price data available on the internet, reported by free price aggregator services such as Yahoo! Finance or Google Finance. This data has no need to be extremely granular and, in the cases of US equities prices for example, is often delayed by 15 minutes or more due to regulation.

There is also valuable data in the world guarded closely by various institution: accurate and timely information holds immense value. Exchanges and data terminal services like Bloomberg or Refinitiv know this and charge substantial subscriptions for this edge.

Reporter Oracle Networks operate under the implicit assumption that all the data required by blockchains is freely available on the internet. By incentivizing intermediary nodes to collect, verify, and transmit this data, DeFi can track the markets and movements of the world.

In reality, valuable financial data is restricted to a few privileged parties and is not easily accessible. Compensating nodes for retrieving and passing down data works for certain types of data, but not for capital markets data where speed matters and information is an edge. This approach also suffers under the quality, gas efficiency, and even legal constraints of supporting a larger node network.

The Pyth Network takes a fundamentally different approach: an oracle network can incentivize highly credible parties—owners and creators of valuable data—to voluntarily and directly share their data to the network. An on-chain program performs price aggregation to remove outlier influence, while a multi-chain bridge signs and verifies all prices sent to their destination blockchains.

In this Publisher Oracle Network, the data provider parties spin up their own nodes to publish data directly on-chain. This design eliminates the reliance on middlemen nodes, resulting in higher quality data, greater gas efficiency, and ultimately, more scalability for the oracle network to expand to thousands of symbols.

First-Party Data Sourcing

The credible institutions who contribute data to the Pyth Network are known as data providers or publishers. Pyth data providers are typically well-established institutions who possess a wealth of high-quality data, including global exchanges, market makers, and trading firms. Some of the most recognizable names include Cboe, Jane Street, Optiver, Binance, OKX, QCP Capital, Two Sigma, Wintermute and CMS. There are currently more than 100 data providers in the network.

All of these data providers are first-party sources: they create and therefore own the price data they contribute, as they are either trading venues receiving orders (price at which traders intend to trade) or are traders themselves (and executing trades at specified prices). In the Reporter Network, the nodes must scrape or purchase data from other middlemen or first-party sources; that makes them third-party sources.

First-party data means assurance in data quality and network security. The contributions by all data providers for any Pyth data feed means that individual data sources can be held accountable for the quality of their inputs. Furthermore, the reputations of these data providers, and the knowledge that a malicious attack on their end would have a detrimental impact on their business as a whole. This is a powerful and additional deterrent layer against traditional oracle attack vectors.

There is also the obvious point that these institutions possess much higher quality data than is available from simple web scraping or collecting from public aggregators and service providers. Furthermore, because these data sources are the owners of their data, the data can be distributed to blockchain applications without intellectual property concerns.

Deep Dive: How Pyth Works

The Pyth Network protocol allows first-party data providers to publish their proprietary price information on-chain for the public to use.

The protocol is as an interaction between these three parties:

  1. Data Providers: Reputable institutions submit price data directly to the Pyth on-chain oracle program. For any price feed product (e.g. BTC/USD), there are multiple providers publishing towards it to ensure accuracy and robustness.
  2. Pyth Oracle Program: The Pyth oracle program runs on the Pythnet appchain. The program securely and transparently aggregates submitted data to produce an aggregate output.
  3. Users: Pyth’s data users consume the aggregate price data. Users are typically decentralized applications, such as Synthetix, Aevo, or Ethena.

Pythnet Appchain

In August 2022, Pyth Network released Pythnet, an application-specific blockchain which enables Pyth data to be aggregated and shared to other chains via the Wormhole Bridge.

Pythnet is built on Solana technology but is ultimately separate from Solana mainnet. Data providers submit data to Pythnet for aggregation; the aggregate price outputs can then reach more than 20 blockchains thanks to Wormhole. There are incredible scalability benefits that emerge from this architectural choice.

New price feeds which launch on the Pyth Network are instantaneously available on all 20+ Pyth-supported blockchains. This is beneficial to builders looking to expand their applications to new blockchains while offering the same markets as asset support as on their original blockchain.

Furthermore, Pyth’s unique architecture allows it to quickly deploy onto new networks which are supported by Wormhole—a rate of approximately one new network per month. In comparison, competing oracle networks often suffer from technical delays that limit their expansion into new networks. For example, one oracle network’s launch on Solana took two years since its original announcement.

Pull, Don’t Push

Pyth Network operates on a "pull" oracle model, where users can actively request or “pull” the data they need from Pyth to their native blockchain environment.

In contrast, traditional oracle solutions operate a “push” model, where price data is automatically “pushed” at a set frequency on-chain, even if no one is using those price updates.

The Pyth pull oracle design brings the following benefits:

  • Gas Efficiencies: users only pay for data when needed or “on demand”. Gas is not wasted on unused price updates. Furthermore, if another entity pulls a Pyth price on-chain, everyone in that chain can utilize that new price.
  • High Update Frequencies: Pyth price feeds update faster than once per second—faster than most block times. Updates this frequent would be impossible if every single price had to be pushed on-chain.
  • Low Latency: Users can rely on the most recently pulled price instead of being forced to rely on the last pushed price.
  • Reliability: During market volatility, push updates may compete with other transactions for network bandwidth. Pyth’s pull updates are incorporated into the user’s valuable transactions themselves.
  • Scalability: Pyth can scale out to thousands of new price feeds without added gas costs. Costs are only incurred when users pull data.

The list of benefits is extensive, but it is become clear that the pull oracle (on-demand update) model brings the scalability benefits needed for the future of DeFi.


Addressing Concerns

In spite of Pyth’s proven ability to consistently provide high-quality data to over 20 blockchain networks, a recurring criticism notes that the described architecture is potentially too centralized due to its reliance on institutional data sources.

It’s important to note that Pyth uses a wide variety of data providers, meaning that failure on behalf of any given data provider would have a minimal impact on any price feed. In order for the price feed to be manipulated, a substantial majority of the contributing data providers would need to be wrong. The whitepaper discusses the network’s resiliency against data provider collusion in greater detail.

While it is a valid observation that Pyth Network relies on “trusted” institutions, Pyth’s approach brings major advantages to DeFi while protecting against oracle manipulation or collusion by data sources. The Pyth contributors continue to advocate for oracle solutions to continue innovating and improving on performance, security, and decentralization—striking this balance is no easy task—and the contributors hope to continue leading the charge on this initiative.


The Path Forward

Price feed oracles are the backbone of DeFi, responsible for providing accurate and timely data to enable mission-critical applications to transact, secure, and transfer value securely and precisely. Past designs are built on the premise that one can incentivize intermediary nodes to collect and agree upon public information in a trustless manner and submit an aggregated result. This approach has its merits, but also presents tradeoffs such as transmission delays, opaque data sourcing, concerns over distribution rights, and overall, constraints on the oracle network’s ability to scale.

Decentralized finance continues to innovate (even if it takes time for the general public to realize what the industry has cultivated). DeFi infrastructure especially has come a long ways. The Pyth Network introduces a faster, more reliable, and more secure method of sourcing financial data that is otherwise unavailable for most blockchain developers. Already, the Pyth Network has experience substantial growth with:

  • 500+ price feeds available
  • 25M+ price updates per day
  • 300B+ dollars in volumes secured
  • 300+ active data user apps
  • 50+ blockchains supported

Pyth Price Feeds are permissionless. Developers can begin integrating by starting with the docs and exploring case studies such as how Synthetix Perps utilize Pyth prices. Other notable users of Pyth include Ribbon Finance, Venus, and CAP Finance.

As the DeFi ecosystem continues to evolve, the Pyth Network’s role in providing trusted and real-time data becomes increasingly vital for ensuring the security and stability of these networks and scaling the industry as a whole.


We want to hear your feedback. Join the Pyth Discord and Telegram, and follow Pyth on X and LinkedIn. You can also learn more about Pyth here.

Stay Updated with Pyth

Stay informed about Pyth network's development and upcoming events!

Recommended For You

all posts