Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify abstract data model to be more concrete #855

Open
msporny opened this issue Jul 1, 2024 · 10 comments
Open

Simplify abstract data model to be more concrete #855

msporny opened this issue Jul 1, 2024 · 10 comments
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created

Comments

@msporny
Copy link
Member

msporny commented Jul 1, 2024

It has been suggested that the abstract data model in DID Core creates unnecessary complexity and that a more concrete data model should be selected, based on implementation experience over the past two years. This issue is to track the discussion of how that simplification might occur.

@msporny msporny added the class 2 Changes that do not functionally affect interpretation of the document label Jul 1, 2024
@decentralgabe
Copy link
Contributor

chair hat off
I am in favor of a concrete data model but I am interested in maintaining compatibility among DID methods that do not currently make use of JSON-LD for extensibility.

DID DHT does not use JSON-LD for extensibility for a few reasons:

  • To save space, since we must stay within 1000 bytes.
  • To make sure all terms are defined, and all terms have DNS-record mappings ahead of time

I believe DID DHT could potentially be adjusted to add processing rules to transform the document to one with a context, and register LD term definitions alongside registered properties. That said, it would be a breaking change.

I am curious how other DID Methods leverage the abstract data model, and it would be good to get a sense of the variety of implementations out there before seeing if it's feasible to define a concrete representation.

Separately, I am not sure this type of change is permitted, as it might fall under the Class 4 definition:

Changes that add new functionality, such as new elements, new APIs, new rules, etc.

Since I believe this could be considered a "new feature" by introducing new rules for representing DID Documents.

@peacekeeper
Copy link
Contributor

I generally agree with the direction of simplifying the specification by removing the abstract data model and replacing it with a concrete one (which can then be converted to different representations like YaML, CBOR, etc.)

@msporny
Copy link
Member Author

msporny commented Aug 1, 2024

I also agree that it is possible to remove the abstract data model in a way that does not affect existing implementation conformance and that we should make an attempt at doing this. To provide a concrete proposal, this would entail:

  1. Noting that the core data model and serialization for DID Core is JSON-LD.
  2. Enable the ability to NOT specify context by noting that DID Method specifications MAY provide rules on the proper way to inject a context for their DID Method if one is missing and JSON-LD processing is desired (this enables did:dht to keep doing its thing, the only modification would be on "how to inject a context if one is desired").
  3. Note that any other serialization is allowed as long as it can be losslessly converted to/from the base data model.

To be clear, if any of the steps above would result in a conforming DID Method becoming non-conformant, we'd clearly have to figure out how to fix the spec text so that doesn't happen. The goal here is to simplify the specification while not invalidating any currently conforming DID Methods.

@msporny
Copy link
Member Author

msporny commented Aug 1, 2024

@decentralgabe wrote:

To save space, since we must stay within 1000 bytes.

Hmm, the DID Core URL is 28 characters, a did:dht one would be maybe twice to three times that? Trading 75 characters for no deterministic way to do extensibility doesn't seem like a good trade off to me.

To make sure all terms are defined, and all terms have DNS-record mappings ahead of time

I don't understand these statements?

IOW, the approach ensures that NO terms are defined (except for maybe in did:dht, and who knows if those definitions are going to conflict with definitions in other DID Methods). It feels like a recipe for guaranteed term conflicts in the future. I also don't understand "all terms have DNS-record mappings ahead of time" -- what does that mean?

To be clear, I think did:dht could continue to do what its doing post-change to remove abstract data model, but am interested in learning more from the above.

@OR13
Copy link
Contributor

OR13 commented Aug 1, 2024

I fully support a normative requirement on JSON-LD only core data model, and to eliminate the JSON and abstract data models from the next version of the technical recommendation.

We've seen substantial confusion caused by this, and there is needless complexity and interoperability problems created by having an abstract data model, that is for the most part, just RDF... sometimes broken RDF.

I think the W3C VCWG did the right thing, by clarifying that W3C VCs are always JSON-LD, and allowing alternative serialization of digital credentials such as ISO mDoc, OAUTH SD-JWTs, attribute certs and other formats to be developed elsewhere.

I would recommend that the DID WG take a similar approach.

Do JSON-LD based DIDS as well as they can be done at W3C.

Do not attempt to define multiple serializations of the data model.

Provide concrete resolution guidance based on the JSON-LD ecosystem, such as document loaders, which can handle either URNs or URLs, and which are already supported well in JSON-LD tooling.

Address the @vocab issue up front, decide if the core data model is really RDF, and make sure that JSON-LD productions can always be converted to RDF with normative text, if that is a desired property of the core data model.

If people want to do "did like things" in CBOR or YAML, let them do that... but make it clear that DIDs are JSON-LD, just like its now clear that W3C VCs are JSON-LD.

@decentralgabe
Copy link
Contributor

@msporny it gets into the specifics of how did:dht works and there is more detail here but the short version is as a size saving mechanism the spec leverages a DID Document -> DNS Packet mapping, and then using DNS packet compression the result is saved on the DHT. We did an analysis of a number of compression formats (plain bytes, json, cbor, a custom binary serialization, and DNS) and found that DNS balanced an efficiency/already existing software tradeoff.

Without a known mapping (or reverse mapping) between a property in the DID Doc and packet representation we cannot effectively store the record on the DHT, so these must be registered in the spec or a well known registry to reduce inconsistencies across implementations. The spec itself has a registry for this purpose.

Leveraging the existing DID registry is likely the best process--noting properties supported by did:dht linked to their DID registry reference.

This is the approach we've taken so far, but are open to other alternatives while maintaining the goal of saving as many bytes as possible.

@msporny
Copy link
Member Author

msporny commented Aug 3, 2024

@msporny it gets into the specifics of how did:dht works and there is more detail here but the short version is as a size saving mechanism the spec leverages a DID Document -> DNS Packet mapping, and then using DNS packet compression the result is saved on the DHT. We did an analysis of a number of compression formats (plain bytes, json, cbor, a custom binary serialization, and DNS) and found that DNS balanced an efficiency/already existing software tradeoff.

Ah, I see. I skimmed those sections and haven't tried to put the whole problem in my head to think about it more deeply. My gut reaction is that the "custom Domain-Specific Language for DNS encoding of DID Documents" thing feels a bit fraught, but that's a completely orthogonal issue.

Based on what I saw in the spec, however, it feels like it would be fairly trivial for the DID Resolution process for did:dht to have a step in there where you inject a context value to continue to be conformant with whatever change we make to remove the abstract data model while not requiring those details to be encoded in the DNS Packet mapping. For example, you could key off of the v=M value to figure out what context to inject.

This is the approach we've taken so far, but are open to other alternatives while maintaining the goal of saving as many bytes as possible.

I would imagine that CBOR-LD applied to a did:dht document would result considerably better compression (but understand that the community developing did:dht probably has no desire to go that route). It also seems like you're double-base encoding values if you're using JWK? I don't remember seeing this "DNS" section when I last read about did:dht... I had thought did:dht was a pure mainline DHT implementation w/ no requirement for DNS records. The section about the DNS records doesn't explain why it's used and/or necessary (but I did skim the document, so I probably missed the justification for the DSL).

In any case, with respect to changes to the abstract data model, I would expect that there wouldn't be an issue for did:dht as it exists today. All you would need to do is add some text to the spec to inject a context when resolving and that will cost you zero bytes to do in the storage format in DNS.

@iherman
Copy link
Member

iherman commented Aug 4, 2024

Just to make it clear: this comment is with my W3C staff member's hat put down.

TL;DR: my preference is to keep the abstract data model (ADM) as is.

I have several reasons:

  • The problems why we decided to introduce the ADM are still valid: the need for communities to create DID documents that are not in JSON-LD. The creation of the ADM was the result of long, and sometimes acrimonious, discussions. If we roll back on it, we may reopen wounds, and we may open the door for formal objections both within and outside the group. This WG has already had more than its share of formal objections😒, I do not think we need new ones…

  • The proposed alternative is, essentially, to do what was done in the VC WG. That was not a walk in the park either; it alienated a lot of people in that WG who decided to walk away. I would prefer not to see the same happening in this WG.

  • I remember that changing the DID Core specification to introduce the VCDM was a major editorial work. Rolling that back would probably require a major document surgery again. This WG has a major specification/editorial work ahead for the DID Resolution, as well rethinking the Registry. Also, let us not forget that we have a major personal overlap with the VC WG (including in terms of document editors), and that WG still has a busy few months ahead. All this makes the manpower issue for this Working Group challenging. Let us spend our resources wisely; I do not think this work is the best usage of our time and energy.

  • It is a debatable issue whether we are allowed to do such a change in the first place. At the moment, the issue is labeled as class 2 change, but I am not sure that I agree. First of all, as @decentralgabe put it in Simplify abstract data model to be more concrete #855 (comment), we probably do not know how methods out there made use of the ADM; any change of it might lead to breaking change for them (which means class 3). However, while I realize that the definition of class 4 changes (that we are not allowed to do) in the W3C Process Document is fairly terse, and that we may get by claiming that we do an editorial change only, I think that such a change might be o.k. by the letter of the “law”, but not by its spirit. It is a major conceptual change of the specification.

    Bottom line: If we do such a change we may be facing a series of disagreeable questions by the W3C Management as well as the AC members.

  • My last comment/question may be the most important one: Why? What do we want to achieve? Is it really worth getting into possible arguments with the community, the AC, W3C Management, etc.? The only argument I saw was “simplification”. First of all, it may not be all that bad if we have 100+ methods registered which all claim, I presume, to be conform to the model. Also, the discussions in the VC group have shown that putting JSON-LD in the center does not make it simple for those who are not experts in JSON-LD, so we may end up flipping one type of complication for another. I.e, I cannot really buy that “simplification” argument. Let alone the fact that if we accept to refer to the VC Controller Document (see Normatively reference Controller Document #854), many things will be taken out of the DID Core specification which, by itself, will make the specification much simpler without any controversy…

@decentralgabe decentralgabe added the discuss Needs further discussion before a pull request can be created label Aug 5, 2024
@jandrieu
Copy link
Contributor

@wip-abramson Asked me to clarify my take on the abstract data model problem. He had suggested a different issue, but after re-reading this one, I think this is the better place to continue the thread.

I have two main points.

First, the ADM has proven to be a nightmare for interoperability. Instead of a concrete over-the-wire standard, we have a vague, confusing standard that requires dangerous warning signs like "Any interpretations of the specification (or its profiling specifications) that lead to different interpretations of JSON and JSON-LD, is either an implementation bug or a specification bug." The only reason we need this kind of guidance is because we failed to standardize a single over-the-wire representation for DID Documents.

Second, the ADM should never have been in the last specification--and shouldn't be in this one--because there were not multiple implementations that tested it. As an untested feature, it should not have been allowed through to Recommendation. Rather than upset the apple cart at the time, I held my tongue to await the inevitable, expected objections from those < 1% of the organization that opposes our work.

The fact is, an abstract data model is not testable. It must be made concrete to test it.

There is some legitimate question about whether or not this is allowed under our current charter. Making the ADM testable by requiring a normative serialization would clarify a point of ambiguity in the existing standard, making this a legitimate Class 3 change. It would also let us stop calling it an Abstract Data Model and just call it a data model, with a normative serialization that we can actually test.

It may be worth noting that during the re-chartering process, and in particular, in the resolution of the council in part due to my formal objection, I was assured that, if the WG wanted to clarify the ambiguity of the abstract data model, it falls within our scope.

@decentralgabe
Copy link
Contributor

@jandrieu thanks for your comments.

First, the ADM has proven to be a nightmare for interoperability

I am curious how is this substantiated? I have generally seen no problems with interop and have myself been a part of a few different efforts across companies using various DID methods without issue. That's not to say there aren't issues, I just want to be clear about what they are.

The only reason we need this kind of guidance is because we failed to standardize a single over-the-wire representation for DID Documents.

It is your opinion that this is a failure. I have heard from a number of those involved in the ADM effort that it was a hard-fought compromise—and a success. Many of those involved have since left the group because they feel their work was done. Revising it without their presence is a dangerous practice.

As an untested feature, it should not have been allowed through to Recommendation. Rather than upset the apple cart at the time, I held my tongue to await the inevitable, expected objections from those < 1% of the organization that opposes our work.

The existing specification, as a published Recommendation with a test suite, has demonstrated sufficient testability to meet W3C requirements.

I was assured that, if the WG wanted to clarify the ambiguity of the abstract data model, it falls within our scope.

I would appreciate references here as I was unaware of this assurance. Regardless, I am less concerned with debating whether this is a Class 3 or Class 4 change, and more concerned with understanding whether this is a worthwhile use of the group's time, effort, especially given the substantial and inevitable knock-on effects.


Unequivocally: I am strongly against this can of worms. Our chartered work on DID Resolution provides a more appropriate venue for addressing testability and interoperability concerns.

I will heavily +1 @iherman's comment

My last comment/question may be the most important one: Why? What do we want to achieve? Is it really worth getting into possible arguments with the community, the AC, W3C Management, etc.? The only argument I saw was “simplification”. First of all, it may not be all that bad if we have 100+ methods registered which all claim, I presume, to be conform to the model. Also, the discussions in the VC group have shown that putting JSON-LD in the center does not make it simple for those who are not experts in JSON-LD, so we may end up flipping one type of complication for another. I.e, I cannot really buy that “simplification” argument.

I suggest we focus our group's limited resources on our chartered deliverables. We can explore ways to improve interoperability through DID Resolution specification work and some implementation guidance such as the context injection approach @msporny mentioned.

Many have left the W3C work around VCs and DIDs because of our constant infighting. Spending substantial amounts of time and effort arguing about data formats is not fun, nor what standards work should be. The effect of this infighting has been pushing others out to different standards bodies. This isn't something that should be celebrated; it fractures our industry and gives our work, and way of working, an incredibly negative reputation. We have an opportunity to overcome that reputation, but that requires closing this issue, and moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class 2 Changes that do not functionally affect interpretation of the document discuss Needs further discussion before a pull request can be created
Projects
None yet
Development

No branches or pull requests

6 participants