Extended Providers: Updating IPNI without re-advertising
This is the fourth blog post in a series of posts on IPNI. The first blog post introduced IPNI as a concept, the second post dived deeper into how it works internally in order to enable finding content providers for billions of addressable content out there and the third explained how to become an Index Provider.
Building on top of the previous articled, this blog post will discuss a new feature that has recently been added to IPNI - Extended Providers.
How does data get into IPNI?
IPNI builds its index by processing Advertisements. The Advertisement construct allows a Storage Provider to offer their CIDs to IPNI, which would make these CIDs available for fast lookups once the Advertisement has been processed. Apart from CIDs, Advertisements also contain the provider’s peer id, their multiaddresses and a protocol that the data can be fetched over. If to look up a CID in IPNI, one can instantly see all the required fields for establishing a connection and downloading the data.
Advertisements are IPLD objects, that are linked into a chain. When IPNI sees an Advertisement, it can walk back the chain to the last known entry and index all the new data starting from there.
This workflow is depicted at the diagram below.
The challenge
Building a full index for a large Storage Provider like web3.storage or nft.storage is hard. It consumes a lot of time and computational resources. For example, at the point of writing cid.contact - one of a few IPNI deployments - had about 1.3 trillion CIDs indexed. Rebuilding such index from a scratch would take a few weeks and a beefy server running 24/7.
Now what if a Storage Provider wants to scale out horizontally? A typical way would be to add a new node with a new libp2p identity, maybe a different transfer protocol and start serving the same data from there. But how would IPNI know that all data of the provider “A” now is also available at the provider “B”? Re-advertising all CIDs for a new identity would be an extremely inefficient way of achieving that! Extended Providers is the answer.
Extended Providers
The Extended Providers feature allows Storage Providers to add an extra information to all their past and future Advertisements or to a single Advertisement with a specific ContextID. More importantly - that can be done by sending just a single Advertisement without having to re-publish the whole Advertisement chain.
Usecases
- Scale out data retrievals by adding new nodes with their own libp2p identities that all serve the same dataset. This usecase was driven by Estuary - a large data onboarding platform onto Filecoin, who will use Extended Providers to improve their data retrieval capabilities;
- Offer new data transfer protocols on a new set of multiaddresses that as you might have guessed serve the same dataset too. This usecase was driven by Boost - a replacement for
go-fil-markets
package in lotus.
How does it work?
Extended Providers (EPs) is a backward compatible extension to the Advertisement protocol. It is defined as a new ExtendedProvider
field that can optionally be add to Advertisement
.
type Advertisement struct {
...
ExtendedProvider optional ExtendedProvider
...
}
type ExtendedProvider struct {
Providers [Provider]
Override bool
}
type Provider struct {
ID String
Addresses [String]
Metadata optional Bytes
Signature Bytes
}
Extended Providers can be chain-level or contextual. Chain-level Extended Providers are applied to all past and future Advertisements of the provider. Contextual Extended Providers are applied only to one Advertisement with a specific Context ID.
When encountered, IPNI would interpret the ExtendedProvider
field as follows:
- If an Advertisement has no
ContextID
- thoseProviders
will be considered chain-level. Otherwise they will be considered contextual and returned for thatContextID
only. Some additional rules are:- If
Override
is set on anExtendedProvider
entry on an advertisement with aContextID
, it indicates that any specified chain-level set of providers should not be returned for thatContextID
.Providers
will be returned Instead. - If
Override
is not set on an entry for an advertisement with aContextID
, it will be combined as a union with any chain-levelExtendedProvider
s (Addresses, Metadata).
- If
For the full set of rules please refer to the specification.
For example if a provider Max would like to start serving his data from his new peer id 12D3KooWB1b3qZxWJanuhtseF3DmPggHCtG36KZ9ixkqHtdKasdfh
over the Bitswap protocol, he could publish the following Advertisement:
{
Provider: "12D3KooWHHzSeKaY8xuZVzkLbKFfvNgPPeKhFBGrMbNzbfwwkpqu", // Max's original peer id
Addresses: ["/ip4/224.96.85.246/tcp/1481", "/ip4/23.49.80.75/tcp/3339"], // Max's original multiaddresses
ContextID: "", // Empty ContextID so that Extended Providers are chain-level
ExtendedProvider: {
Providers: [
{
ID: "12D3KooWB1b3qZxWJanuhtseF3DmPggHCtG36KZ9ixkqHtdKasdfh", // Max's new peer id that he wants to use for Bitswap transfers
Addresses: ["/ip4/224.96.85.246/tcp/1481"], // Max's new addresses that he wants to serve Bitswap over
Metadata: "gBI=" // Metadata that tells that this is a Bitswap protocol
}
]
}
}
Once that Advertisement has been processed - Max’s new provider info will be additionally returned to all lookups for any of his CIDs. From the API point of view Extended Provider results are indistinguishable from regular provider records.
Advertisements with Extended Providers have to be signed in a special way too, that is defined by specification.
Go SDK Example
Support for Extended Providers has been added to the latest version of the index-provider library. Such Advertisements can be constructed using xproviders.AdBuilder
and then published using the familiar Engine
interface.
adv, err := xproviders.NewAdBuilder(providerID, priv, addrs).
WithContextID(contextID).
WithMetadata(metadata).
WithOverride(override).
WithExtendedProviders(extendedProviders).
WithLastAdID(lastAdId).
BuildAndSign()
if err != nil {
//...
}
engine.Publish(ctx, *adv)
Resources
If you are interested in participating in IPNI or learning more about it, you may find these resources helpful:
- cid.contact (one of the managed IPNI deployments)
- storetheindex (Filecoin slack channel)
- IPNI implementation
- IPNI Specification