Lev Walkin → ASN.1 Exposed → Question: Extensibility and dropping fields

Question: Extensibility and dropping fields

Date: 2010-September-21 @ 16:42
Tags: extensibility, support

Today's email was asking me about an interesting case of ASN.1 extensibility. Usually, extensibility means adding new members to existing structures. In this case, the user was asking whether ASN.1 supports removing structure members.

Hello, Mr. Walkin.
I am using the ASN.1 compiler and for an application I am developing. I am using this to store the datastructure in a file, for persistence. I am trying to find out whether forward compatibility is support. By this I mean the following.
In the version 1 I have the following structure:
Struct a {
	Int x;
	Int y;
	Int z;
}
I store the data x = 111; y = 222 and z = 333;
Now I upgrade the software to the version 2 where in the new struct is:
Struct a {
	Int x;
	// int y is removed
	Int z;
}
Now when I read the existing file (generated with the older version of ASN.1 specification) with the newer version of the software, I get the following data back: x = 111, and z = 222.
Can you please tell me whether it is possible to get x = 111 and z = 333. If yes, then what ASN syntax should I follow? I am using der_encode and ber_decode.

The ASN.1 has a fair support for backward-compatibility and forward-compatibility.

If you change the ASN.1 type definition in the ASN.1-prescribed way, you end up with the new versions of the software compatible with whatever was generated by the old versions. That's backward-compatibility.

However, here's a catch: the ASN.1 type must be marked as extensible in the first place. If it isn't, that would mean that future extensions are prohibited, and essentially the structure is fixed, or sealed. Simply put, there is no backward-compatibility for you if you change a non-extensible structure.

Also, if the ASN.1 type is not marked as extensible, the decoder of the older version of the software will not be prepared to recognize future extensions, possibly returning an error during decoding. Here dies your forward-compatibility as well. (The forward-compatibility for non-extensible structures is theoretically possible for BER and XER, but it really depends on a particular ASN.1 compiler's implementation. It is not possible for PER.)

The ASN.1 type may be explicitly marked as extensible by inserting an extension marker, ..., like shown:

Non-extensible type Struct:

TestModule DEFINITIONS ::= BEGIN

Struct ::= SEQUENCE {
	x	INTEGER,
	y	INTEGER,
	z	INTEGER
}


END

Extensible type Struct:

TestModule DEFINITIONS ::= BEGIN

Struct ::= SEQUENCE {
	x	INTEGER,
	y	INTEGER,
	z	INTEGER,
	...	-- Extension marker
}

END

Also, there's a standard way to specify that the compiler should treat all structures in a given module as extensible, even if they don't contain an explicit extension marker. This is enabled by the EXTENSIBILITY IMPLIED module option:

Extensible type Struct:

TestModule DEFINITIONS EXTENSIBILITY IMPLIED ::= BEGIN

Struct ::= SEQUENCE {
	x	INTEGER,
	y	INTEGER,
	z	INTEGER -- Extension marker is implied after `z`
}

END

If the ASN.1 type is not marked as extensible in this way or another, there is no guarantee that the older software will be able to cope with files containing future extensions. This is a compiler-dependent thing, and I urge you not to count on it.

So, my advise is to consider enabling EXTENSIBILITY IMPLIED, or using an explicit extension marker (...) in the structures you have a faintest chance of extending in the future. That is the first step to ensuring both backward-compatibility and forward-compatibility.

Now, let's get back to the case described in email. Suppose we have created an extensible ASN.1 type Struct, like shown:

Struct ::= SEQUENCE {
    x INTEGER,
    y INTEGER,
    z INTEGER,
    ...
}

Would it work if we removed the field from the “extension root” (the set of Struct members above the ellipsis mark)? The ASN.1 standard says “no”: you can only change members that are part of “extension additions” (the members after the ellipsis mark).

The way it works in BER, the x, y and z are encoded using the same tag (a tag of an INTEGER type) and a corresponding value (111, 222, 333). It can be schematically described as I<111>I<222>I<333>, where I is a specific tag denoting the INTEGER value. You see, there is no way to determine what's x, y and z, other than by looking at the integer value's position in the encoding.

I<111>	I<222>	I<333>
x	y	z

If you remove y from the subsequent Struct type definition, the new decoder will have no way of knowing that the value 222 (of I<111>I<222>I<333>) still corresponds to z rather than y. It looks so much like the second INTEGER in the sequence, and we know that the second INTEGER in a sequence is y...

I<111>	I<222>	I<333>
x	z	[ignored]

This is why you were seeing z = 222 instead of z = 333 in your example.

So, what is the proper way to produce a structure which you can subsequently delete members from?

The general approach is to make sure that the members are encoded using distinct tags. (If they are, there's even a possibility to recover from not describing the structure properly in the first place, but let's not go down this road yet.) This way, the encoder will be able to associate the encoded data with a particular member in the ASN.1 structure.

One way of tagging members is to specify tags directly:

Struct ::= SEQUENCE {
    x [0] INTEGER,
    y [1] INTEGER,
    z [2] INTEGER,
    ...
}

This will bloat the BER encoding a little bit, but at least the values might be associated with the structure member using the member tags, rather than making the decoder to infer the correct field by looking at the value's position in the stream. Here's how it would look to the decoder:

[0]I<111>

[1]I<222>

[2]I<333>

However, this new Struct type definition is not ready for dropping the members yet.

Struct ::= SEQUENCE {
    x [0] INTEGER,
    z [2] INTEGER,
    ...
}

If you drop the member y now and try to decode the old file, the new BER decoder might balk when it sees the unexpected tag [1] (and a value 222) in the middle of decoding of an “extension root”, where it expects tags [0] immediately followed by [2].

In short, the ASN.1 expects all members of the “extension root” to be present in the blob being decoded before extensions start, and if you want to treat the superfluous value of the deceased y as an unused extension, it should have appeared after ellipsis in the first place.

This suggests that we should put y after ellipsis in the first version of the structure. Actually, you should put everything you expect to subsequently drop after ellipsis! If you can't predict what are you going to drop, you can put everything after ellipsis:

Struct ::= SEQUENCE {
    ...,
    x [0] INTEGER,
    y [1] INTEGER,
    z [2] INTEGER
}

However, there lies one last problem. The ASN.1 specification is a bit hard to parse in that respect, so I'll just describe in what happens in asn1c: the asn1c's BER decoder is constructed in a way which expects extensions of a SEQUENCE to appear in the order of definition. Thus, removing y will cause tag [1] to break BER decoding of a new type which expects [2] instead. One way to solve that would be to add OPTIONAL marker to each extension member, but a better way to just switch the semantic gear and replace the SEQUENCE with a SET altogether. The order of elements in SETs is not important. Consequently, a BER decoder is prepared to handle them in any order, sacrificing a bit of SEQUENCE's efficiency. Let's use that.

Here's the final definition you should be using to be able to subsequently drop any field you like. It also includes an anti-bloat magic (IMPLICIT), to keep the encoded size down.

Struct ::= SET {
    ...,
    x [0] IMPLICIT INTEGER,
    y [1] IMPLICIT INTEGER,
    z [2] IMPLICIT INTEGER
}

In the final definition, we use SET instead of SEQUENCE, place the extension marker in the beginning of a type, use manual tagging to make elements distinct (you won't be able to get it past compiler checks without it, anyway), and mark tags IMPLICIT to save a couple of bytes per each field.

Testing the suggestion

The module used in the test follows. Compile it with asn1c -fnative-types.

Test DEFINITIONS IMPLICIT TAGS ::= BEGIN

OldStruct ::= SET {
	...,
	x [0] INTEGER,
	y [1] INTEGER,
	z [2] INTEGER
}

NewStruct ::= SET {
	...,
	x [0] INTEGER,
	z [2] INTEGER
}

END

The test code:

#include <stdio.h>
#include <assert.h>
#include "OldStruct.h"
#include "NewStruct.h"

int
main() {
    char buf[42];
    OldStruct_t *a;
    NewStruct_t *b = 0;
    asn_enc_rval_t er;
    asn_dec_rval_t dr;

    a = calloc(1, sizeof a);
    *(a->x = malloc(sizeof *a->x)) = 111;
    *(a->y = malloc(sizeof *a->y)) = 222;
    *(a->z = malloc(sizeof *a->z)) = 333;

    er = der_encode_to_buffer(&asn_DEF_OldStruct, a, buf, sizeof(buf));
    assert(er.encoded > 0);

    dr = ber_decode(0, &asn_DEF_NewStruct, (void **)&b, buf, er.encoded);
    assert(dr.code == RC_OK);
    assert(dr.consumed == er.encoded);

    printf("a = {x = %ld, y = %ld, z = %ld}\n", *a->x, *a->y, *a->z);
    printf("b = {x = %ld,          z = %ld}\n", *b->x,        *b->z);

    return 0;
}

The test run:

> ./progname 
a = {x = 111, y = 222, z = 333}
b = {x = 111,          z = 333}
>

Question: Extensibility and dropping fields

Testing the suggestion

Comments: