Standard Reader
atproto

Record Versioning

The people, they want post editing!

two slices, with a bite taken

T has always supported record updates: the clearest example is updating bsky profile records. And "strong refs" are a way to detect if a record has changed, by including both an AT URI and the hash (CID) of the exact version being referenced.

But these are different from proper record versioning, which would mean keeping multiple versions of a record around in the repo. This would let you do things like inspect changes or the history of a record.

The Proposal

Add another path segment in both AT URIs and repo record "keys" for version CIDs.

Eg, a full versioned AT URI would look like:

at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.actor.profile/self/bafyreiarlrgo3wgrpetjottkvjepio7nt2x6yc4jtb3f56kif7r4nmm7q4

In the repo (MST), the current version of the record would still be stored at <collection>/<rkey> , pointing to the current CID. If the record is updated, the old version could be retained under a new repo key like <collection>/<rkey>/<cid>. These would be stored in the MST just like other keys. Fetching a record with a specific version (CID) which is the current primary version would "just work", even though the CID would not be part of the repo path.

To make all this work, the com.atproto.repo.* lexicons would need to be updated to work with versioned records. Eg, when updating or deleting a record, there would be a flag to indicate whether the old version should be retained, or fully removed. Maybe a default could be indicated in record lexicon declarations (eg, that records of a given type should be versioned by default, or not). Getters would need support for fetching a specific version, and listers would need the ability to list only current records, or also enumerate all versions. We'd probably also want a method to fetch all versions of a given record.

Thoughts

Regardless of whether we support record versioning in this way, I think it would be nice to support versioned AT URIs. It would remove the need for "strong refs" as a typed object. Perhaps we would want a new lexicon string format to clarify things: at-uri-versioned or something like that.

This system would still require some work from lexicon designers and app devs. The CIDs are just CIDs, with basically random ordering. If you wanted to track ordering of history or timestamps, those would need to be stored in the records themselves. They would also be user-controlled: a basic AT principle is that users have full control over their repositories, which means they could fake any ordering or timestamps in the data itself. Integrity in the network ends up being enforced by other mechanisms, such as strong refs from other accounts, or services that remember "indexed at" timestamps and flag anomalies.

It probably is worth considering alternative designs and syntax. Versions could instead be indicated by an incrementing integer, or a timestamp (TID).

This proposal would require changes to the repo specification, core XRPC methods, and client SDKs. Probably an increment of the repo version number (indicated in commits) and take some time. Are there simpler things that could be done sooner? Folks could encode versions in record keys. Eg, app.bsky.actor.profile/self.v3 or app.bsky.actor.profile/self-bafyreiarlrgo3wgrpetjottkvjepio7nt2x6yc4jtb3f56kif7r4nmm7q4. In many cases this would make the records "invalid" from a lexicon validation standpoint (because the record key would not match the declared key format), but that might actually be desirable, to prevent duplicate processing.

Did this enjoy this document?

Give it a heart — Standard Reader surfaces well-loved writing to more readers across the network.

at:// pizza thoughts
at:// pizza thoughts
@bnewbold.net
Across the AtmosphereDiscussions
fig (aka:[phil])
fig (aka:[phil])@bad-example.com

oh wow so this would be protocol-level support, vs today where you could have multiple records at separate paths to keep multiple versions (whether in an aux lexicon or whatever)

re ordering: multi-record (app-level) versioning would strongly enforce ordering if they link to the prev version

1 reply on Bluesky
Eli Mallon
Eli Mallon@iame.li

if i follow the proposal correctly, i think your first example is missing an rkey segment in the uri?

1 reply on Bluesky
Breadbrowser
Breadbrowser@handle.invalid

Bluesky used to have post editing and then they removed it

1 reply on Bluesky
Ryan
Ryan@snarfed.org

interesting!

first though let me shed a tear for non-null prev. big "we have versioning at home" energy from the old 2023 commit design. real deletes are important, but also, 🫗

1 reply on Bluesky
verdverm
verdverm@verdverm.com

Two questions

1. version ordering is important, how does that work? 2. thoughts on version info in query parameters? It's a really small change to the at-uri and breaks less code

1 reply on Bluesky
Kuba Suder 🇵🇱🇺🇦
Kuba Suder 🇵🇱🇺🇦@mackuba.eu
2 replies on Bluesky
Everett Bogue 🔵
Everett Bogue 🔵@evbogue.com

"the old version could be retained under a new repo key" why not just append mutations and deletes to the repo and sort out the final view later? this would make notifying the view about cleanups/rerenders easier i imagine.

0 replies on Bluesky
business goose🪿
business goose🪿@goose.art
0 replies on Bluesky
Paul Rohr
Paul Rohr@pevohr.bsky.social

To be clear, this 🍕 thought is about versioning the contents of a record, correct?

( new versions of a lexicon schema are expressed via a different mechanism )

1 reply on Bluesky
akshay oppiliappan
akshay oppiliappan@oppi.li

yesss! this is cool. when PRs are revised on tangled, we update the PR record. however, its not possible to enumerate all versions of a PR by looking at a CAR export, which may require a rework of the lexicon itself to support multiple revisions.

1 reply on Bluesky
Ronen Tamari
Ronen Tamari@ronentk.me
0 replies on Bluesky
山貂
山貂@yamarten.bsky.social
1 reply on Bluesky
Emelia
Emelia@thisismissem.social

This seems like a reasonable proposal: bnewbold.leaflet.pub/3m5jsx7qrws2n

I'd say the lexicon should define the default for a given collection (versioned or not) and then just like we have validate today, we'd have a versioned boolean on com.atproto.repo.createRecord

cc @bnewbold.net

1 reply on Bluesky
Mike Beasley
Mike Beasley@mikebeas.com
0 replies on Bluesky
Laurens
Laurens@laurenshof.online

Bluesky protocol engineer Bryan Newbold talks about record versioning, and some technical considerations on how this could be implemented.

0 replies on Semble
Roscoe Rubin-Rottenberg
Roscoe Rubin-Rottenberg@knotbin.com

Found it!

0 replies on Bluesky
bryan newbold (⛱️ sabbatical mode)
bryan newbold (⛱️ sabbatical mode)@bnewbold.net

but for that matter, the current full AT-URI string syntax allows a whole bunch of weird stuff, like at://did:plc:abc123/com.example.record (just an NSID, no record key).

a change like the blob one could happen at the same time as record versioning. I think the IETF process could be an opportunity

0 replies on Bluesky
山貂
山貂@yamarten.bsky.social

単に既存のハックを仕様に昇格して終わりだろうと思ってたけど、ここでrecord履歴の話が活きてくる可能性もゼロではない……?

0 replies on Bluesky