[ Prev ] [ Index ] [ Next ]

Desc File Format draft 2

Created Sunday 01/07/2007

This document describe the desc file format. This file aims at describind another file, especially during transfers like attachements in e-mail or any other kind of transfert. This has been made by Mildred <mildred593(at)online.fr> who was inspired bu the application/applefile files attached by Apple Mail. Maybe another file format is standardized with the same goal, this document is only a draft and presents an idea.

This is the draft number 1 of this specification released the 1st of July 2007.

Description

The desc file includes description for a file that can be included or not. If the file is included the content type is application/descfile. If the file is not included the content type can also be text/descfile.

The frst part of the file is always presend and is textual data. It is called the headers. There must not be any single white line in the body and the first white line begins the second part, that is the included file.

The headers part

The headers host metadata about the file. It is composed of multiples physical lines separated by the character LF (\n). It finnishes when two LF characters are found or at the end of the file. So blank lines are not authorized.

Logical lines are defines. A logical line is generally the same as a physical line except when multiples physical lines are used inside a singlo logical line. A logical line is composed of a finite positive and non null number of physical lines. The first physical line included must not begin by a whitespace character (defined as the ASCII space or the ASCII tabulation \t) and the following physical lines must begin with a whitespace.

The complete header is then split into logical lines except when the first lines of the headers starts with a whitespace character. These first lines are then completely ignored.

Each logical line must be either a header or a comment. The logical line include the line feeds between the physical lines that compose it but do not include the last line feed.

A Comment

A comment is a logical line that starts wth the character '#'. It is completely ignored. Note that comments are logical lines so they can span across multiple physical lines.

A Header

A header is a line that starts with a header name and continue with a content. The header name must contains only characters from A to Z, from a to z, from 0 to 9, the character '-' and the character '.'. The header name and the content are separated by a colon ':' followed by optional whitespaces characters (space or tabulation) that are not part of the content.

The content can span across multiple physical lines. If there is non whitespace characters after the colon on the first physical line, then the content is defined as the binary data from the first non whitespace character after the colon following the header name until the end of the logical line. if the content does not start on the first physical line then it starts after the first whitespace on the second line. This changes from the first draft and permit to store binary data in headers that start with a whitespace character.

We can also define the stripped content that is the same as the content except that the first whitespace character after any line feed is removed.

The included file part

The included file part is optional and is defines as the binary data that follow the first occurence of two LF characters.

Valid files

A desc file is valid only if it matches the description given above and if all the headers are valid and if all required headers are present.

A header is valid only if either :

If a header specifies that is must be present only once that means that if the headers is present more than once, the file is invalid.

If the name contains dots, it is required that the name of the header follow a reverse DNS notation that will prevent conflicts in header names. For example one could use the com.example.attribute name if it owns the example.com DNS domain.

Valid headers

The valid headers are described below :

Version

not required, only once. It defines the version of the file specification that must be followed. if not present, the version of the specification that should be taken is the version 1.0 or the latest draft if the version 1.0 of this specification does not exists.

Filename

required sometimes, only once. The stripped content associated must contains a file path (relative to the location where the desc file is) that points to the file(s) described by this desc file. If the desc file contains the included file, the header is forbidden. If the desc file is linked with the target file externaly, then this header is optional. If not, this header is required. If the desc file is linked with a file externaly and if the Filename header is present, then there may be a conflict. In case of conflict, the external linked is the one to trust.

Content-Type

not required, only once. The Content-Type header is case-insensitive and holds the content-type of the file described.

Interaction with extended atributes

Some filesystems support the possibility to have extanded attributes. On such systems it is recommanded that the desc file is included in the extended attributes. The desc file can be included in two ways :

Be careful because this specification allow to have multiple headers to have the same name. So if the system do not allow multiple extended attributes with the same name, then care must be taken for the translation process and eventually if multiples headers with the same name are found, a way must be specified to include them in the extended attributes. This can be done by including the whole desc file in an attribute or by using a header name to attribute name conversion that allow multiples headers with the same name to have different attributes names.

For example if the system allow extended attributes to have the # character in their names, as the desc file do not allow it it would be possible to convert two headers with the same name with two attributes names, one with #1 appended, the other with #2 appended.

The system must take care so the extended attributes are kept when the file is copied or moved.

This is system dependant and must be specified separately.

Examples of desc files

First example :

Content-Type: text/plain; charset=utf-8
X-Summary: summary ...
 continue across multiple lines
 as many as you want
  even with space at beginning
 of lines
com.example.attribute: value
X-Creation-Date: 2007/07/01
This is a text file
encoded in UTF-8
Here the desc file is useful because it can specify the encoding of the text file

In that example the summary is exactly :

summary ...
continue across multiple lines
as many as you want
 even with space at beginning
of lines

Notice that the line feed are kept but that the first whitespace in each line is removed.

Example of uses

The desc file may be used to transfert metadata along with a file in a protocol when this is not usually permitted. For example we can imagine a mail client that will create a desc file for each attached file in an electronic mail. The desc file will host metadata and will reference the file with its name using the Filename header.

it can also be useful to store files and keep informations such as their content-type on filesystems that do not store this information.

We can imagine that on such systems, applications can open desc files and read the included content. So it would be possible to store files and not loose any metadata such as the content type. Thus not being forced to rely on magic numbers (note that magic numbers are not always relyable, for example with test files or ZIP files that can be also Jar archives or openDocuments).


Backlinks: :Computer:Ideas:Spec:Desc file format