I've got a bunch of protocol pizza thoughts bottled up. Time to get writing!
This one is about some simple additions to the lexicon schema language to make it more useful. These would not impact the data model, just schema validation. I think these would are probably un-controversial and would not be a heavy lift to get in. On the other hand they aren't aligned with current roadmap priorities (permissioned data and account experience), so I don't know when these would actually get consensus and make their way in to specs and reference implementations.
map Schema Definition Type
The lexicon language can currently define objects with specific field names, and has an escape hatch for unknown objects with arbitrary nested fields. It also allows defining arrays with elements of a known type. But there isn't a way to define an object with arbitrary fields but a constrained value type.
Eg, to have internationalized variants of a string value, you could today define an array of objects:
[
{ "lang": "ja", "text": "「こんにちは世界」" },
{ "lang": "pt-BR", "text": "Olá, Mundo!" },
{ "lang": "en", "text": "Hello, World!" }
]That's a bit messy because you could end up with multiple elements with the same lang field. It would be nice to instead have the lang code be the key:
{
"ja": { "text": "「こんにちは世界」" },
"pt-BR": { "text": "Olá, Mundo!" },
"en": { "text": "Hello, World!" }
}The value could even be simple strings:
{
"ja": "「こんにちは世界」",
"pt-BR": "Olá, Mundo!",
"en": "Hello, World!"
}To support this, I propose a new map definition type:
{
"type": "map",
"description": "Internationalized salutations",
"keys": {
"type": "string",
"format": "language"
}
"values": {
"type": "object",
"properties": {
"text": {
"type": "string"
}
}
}
}Keys would always need to have a string representation, and would usually be type: string with optional constraints (format, size, known values, etc).
Values could be almost any lexicon definition type: object, string, boolean, unknown, union, etc.
The map itself could also have min/max size restrictions, which would apply to the number of fields. There could be a flag to specify whether values are nullable or not.
at-uri Schema Definition Type
AT URIs (at://) can already be represented in lexicons using the string format: at-uri constraint. But there is a fair amount of optionality in AT URI syntax, and all of these are considered valid:
at://handle.example.com
at://handle.example.com/com.example.blog.profile
at://handle.example.com/com.example.blog.profile/self
at://did:plc:abc123/com.example.blog.profile/selfIf we adopt my record versioning proposal, there might also be "strong refs" with the record CID as an extra field:
at://did:plc:abc123/com.example.blog.profile/self/bafyreiarlrgo3wgrpetjottkvjepio7nt2x6yc4jtb3f56kif7r4nmm7q4The strong norm for AT URIs inside records, referencing other records, is to use a DID in the authority place, and have a full record reference (including collection and rkey). But this is not enforced by lexicon validation! At the same time, in XRPC endpoint parameters, it can be helpful to keep things flexible and allow handles in the authority section (so that the calling client does not need to do handle-to-DID resolution locally).
Sometimes you also only want to allow references to specific record types (eg, a specific collection NSID), or to at least hint which collection types are expected.
To support all this flexibility, I propose a new at-uri lexicon definition type. These would get represented in the data model as strings, and there is a minorly-breaking transition path where existing format: at-uri string definitions could be switched to at-uri.
{
"type": "at-uri",
"description": "Reference to parent of a bsky post",
"allowAuthorityHandle": false,
"specificity": "record",
"collections": [
"app.bsky.feed.post",
]
}I'm not sure if the default should be as flexible as the current string format, or more conservative (require DID in authority, and require collection and rkey).
Having the collection array be fixed and closed might be too brittle: maybe there will be an app.bsky.feed.postV2 in the future, and it would be allowed in this place. Maybe it should be "open" by default, or called knownCollections.
More Record Key Types
Record keys can currently have the following format types:
tidnsidliteral:<value>any
We could extend that with some other sting formats:
didcidlanguage
Would be good to double-check that the record key generic syntax is compatible with all these first.
The motivation is to allow more flexibility in the design of record key-spaces. For example, if "follow" graph relationships had did record key format, and there was a requirement to have the subject DID match the record key, then the "double follow" constraint would be much easier to enforce.