My first thought for persistent IDs was a hash of a semantic object. Such IDs are commonly used in systems such as git
, which generate large hashes with generally unique prefixes. Such persistent IDs are one-way in that they are easy to generate from a semantic representation but the opposite is not true. We cannot derive the semantics from a git identifier.
Then I started thinking about why git identifiers are so useful. And then I realized that they are useful because they can be mnemonically shortened. One often only needs 5-7 characters to uniquely refer to a single change in any git repository.
We could certainly emulate git identifiers. They are quite unique since they are guids. But what I realized upon further thought is that their human value is that they can be abbreviated. And that is quite important for transcription. Persistent IDs such as driver license IDs need to be voice transcribable. Git identifier abbreviations are indeed voice transcribable.
That realization came from tinkering with BASE64. I had expected it to be shorter than the semantic reference, but it turned out to be longer. That in fact was what clued me in to the importance of transcription. I’ll admit that insight was idiosyncratic () but I think that the requirement of transcription is still relevant and critical.
It’s just awful, isn’t it?