Standard Reader
atproto

Expanding the Lexicon Language

Proposes some small additions to lexicon schema language.

pizza with apple slices cooking in an oven

've got a bunch of protocol pizza thoughts bottled up. Time to get writing!

This one is about some simple additions to the lexicon schema language to make it more useful. These would not impact the data model, just schema validation. I think these would are probably un-controversial and would not be a heavy lift to get in. On the other hand they aren't aligned with current roadmap priorities (permissioned data and account experience), so I don't know when these would actually get consensus and make their way in to specs and reference implementations.

map Schema Definition Type

The lexicon language can currently define objects with specific field names, and has an escape hatch for unknown objects with arbitrary nested fields. It also allows defining arrays with elements of a known type. But there isn't a way to define an object with arbitrary fields but a constrained value type.

Eg, to have internationalized variants of a string value, you could today define an array of objects:

[
  { "lang": "ja", "text": "「こんにちは世界」" },
  { "lang": "pt-BR", "text": "Olá, Mundo!" },
  { "lang": "en", "text": "Hello, World!" }
]

That's a bit messy because you could end up with multiple elements with the same lang field. It would be nice to instead have the lang code be the key:

{
  "ja": { "text": "「こんにちは世界」" },
  "pt-BR": { "text": "Olá, Mundo!" },
  "en": { "text": "Hello, World!" }
}

The value could even be simple strings:

{
  "ja": "「こんにちは世界」",
  "pt-BR": "Olá, Mundo!",
  "en": "Hello, World!"
}

To support this, I propose a new map definition type:

{
  "type": "map",
  "description": "Internationalized salutations",
  "keys": {
    "type": "string",
    "format": "language"
  }
  "values": {
    "type": "object",
    "properties": {
      "text": {
        "type": "string"
      }
    }
  }
}

Keys would always need to have a string representation, and would usually be type: string with optional constraints (format, size, known values, etc).

Values could be almost any lexicon definition type: object, string, boolean, unknown, union, etc.

The map itself could also have min/max size restrictions, which would apply to the number of fields. There could be a flag to specify whether values are nullable or not.

at-uri Schema Definition Type

AT URIs (at://) can already be represented in lexicons using the string format: at-uri constraint. But there is a fair amount of optionality in AT URI syntax, and all of these are considered valid:

at://handle.example.com
at://handle.example.com/com.example.blog.profile
at://handle.example.com/com.example.blog.profile/self
at://did:plc:abc123/com.example.blog.profile/self

If we adopt my record versioning proposal, there might also be "strong refs" with the record CID as an extra field:

at://did:plc:abc123/com.example.blog.profile/self/bafyreiarlrgo3wgrpetjottkvjepio7nt2x6yc4jtb3f56kif7r4nmm7q4

The strong norm for AT URIs inside records, referencing other records, is to use a DID in the authority place, and have a full record reference (including collection and rkey). But this is not enforced by lexicon validation! At the same time, in XRPC endpoint parameters, it can be helpful to keep things flexible and allow handles in the authority section (so that the calling client does not need to do handle-to-DID resolution locally).

Sometimes you also only want to allow references to specific record types (eg, a specific collection NSID), or to at least hint which collection types are expected.

To support all this flexibility, I propose a new at-uri lexicon definition type. These would get represented in the data model as strings, and there is a minorly-breaking transition path where existing format: at-uri string definitions could be switched to at-uri.

{
  "type": "at-uri",
  "description": "Reference to parent of a bsky post",
  "allowAuthorityHandle": false,
  "specificity": "record",
  "collections": [
    "app.bsky.feed.post",
  ]
}

I'm not sure if the default should be as flexible as the current string format, or more conservative (require DID in authority, and require collection and rkey).

Having the collection array be fixed and closed might be too brittle: maybe there will be an app.bsky.feed.postV2 in the future, and it would be allowed in this place. Maybe it should be "open" by default, or called knownCollections.

More Record Key Types

Record keys can currently have the following format types:

  • tid
  • nsid
  • literal:<value>
  • any

We could extend that with some other sting formats:

  • did
  • cid
  • language

Would be good to double-check that the record key generic syntax is compatible with all these first.

The motivation is to allow more flexibility in the design of record key-spaces. For example, if "follow" graph relationships had did record key format, and there was a requirement to have the subject DID match the record key, then the "double follow" constraint would be much easier to enforce.

Did this enjoy this document?

Give it a heart — Standard Reader surfaces well-loved writing to more readers across the network.

at:// pizza thoughts
at:// pizza thoughts
@bnewbold.net
Across the AtmosphereDiscussions
Ryan
Ryan@snarfed.org

Interesting!

Do you have real world example use cases for maps, where the fields are arbitrary but the values are all the same type? (And maybe have additional consistent constraints?)

2 replies on Bluesky
Bumblefudge
Bumblefudge@bumblefudge.com

Sure but also floats 😏

1 reply on Bluesky
Nelind
Nelind@nel.pet

oh gosh yes please! quite literally all of these are things ive been wanting and have been meaning to make proposals for

0 replies on Bluesky
Orual
Orual@nonbinary.computer

omg yes plz these are very nice changes.

0 replies on Bluesky
山貂
山貂@yamarten.bsky.social
1 reply on Bluesky
Paul Rohr
Paul Rohr@pevohr.bsky.social

language --> lang

to match familiar usage in HTML, unless you're planning to support values which violate RFC 5646

(Presumably you'd support a sane subset of 5646, but not require 4647's entire matching mechanism)

0 replies on Bluesky
Aram Zucker-Scharff
Aram Zucker-Scharff@chronotope.aramzs.xyz

👍

0 replies on Bluesky
manoo
manoo@manoo.dev

yeah these would be great additions

0 replies on Bluesky
fig (aka:[phil])
fig (aka:[phil])@bad-example.com

oof map type is going to complicate lexicon-unaware canonical-link-source defining, or i guess just make it a small step worse

(i accept that people want this instead of arrays but)

0 replies on Bluesky
Matthieu 🦋
Matthieu 🦋@matthieu.bsky.team

I'd rather allow { type: string, format: at-uri } to allow format specific options than introducing a new type. First because it will lead to better compatibility. Second because it paves a better future if other format also need specific options (e.g dates beyond year 9999, cid version, etc.)

0 replies on Bluesky
Matthieu 🦋
Matthieu 🦋@matthieu.bsky.team

Semantically, I personally feel that a query param (<at-uri>?cid=<cid>) would be better suited to describe a record at a particular version, rather than a sub path. First because the version is not really a sub resource of the record. Second because this locks this syntax for any future use.

2 replies on Bluesky
Tim Burks
Tim Burks@timburks.me

I was just looking for a map type to represent localized strings

0 replies on Bluesky
山貂
山貂@yamarten.bsky.social
1 reply on Bluesky
Faine Greenwood
Faine Greenwood@faineg.bsky.social

pizza thoughts is a pretty charming way to describe these snippets I must say

0 replies on Bluesky
Matthieu 🦋
Matthieu 🦋@matthieu.bsky.team

pie(zza) thought

0 replies on Bluesky
Eli Mallon
Eli Mallon@iame.li

@bnewbold.net disagrees with you 🤷‍♂️ bnewbold.leaflet.pub/3mimokt4o5c2f

I've also defended tid-based rather than did-based follow rkeys ("it makes the merkle tree structure better or whatever!") but maybe having primary keys with uniqueness constraints is good actually

1 reply on Bluesky