Created Sunday 01/07/2007
This document describe the desc file format. This file aims at describind another file, especially during transfers like attachements in e-mail or any other kind of transfert. This has been made by Mildred <mildred593(at)online.fr> who was inspired bu the application/applefile files attached by Apple Mail. Maybe another file format is standardized with the same goal, this document is only a draft and presents an idea.
This is the draft number 1 of this specification released the 1st of July 2007.
The desc file includes description for a file that can be included or not. If the file is included the content type is application/descfile. If the file is not included the content type can also be text/descfile.
The frst part of the file is always presend and is textual data. It is called the headers. There must not be any single white line in the body and the first white line begins the second part, that is the included file.
The headers host metadata about the file. It is composed of multiples physical lines separated by the character LF (\n). It finnishes when two LF characters are found or at the end of the file. So blank lines are not authorized.
Logical lines are defines. A logical line is generally the same as a physical line except when multiples physical lines are used inside a singlo logical line. A logical line is composed of a finite positive and non null number of physical lines. The first physical line included must not begin by a whitespace character (defined as the ASCII space or the ASCII tabulation \t) and the following physical lines must begin with a whitespace.
The complete header is then split into logical lines except when the first lines of the headers starts with a whitespace character. These first lines are then completely ignored.
Each logical line must be either a header or a comment. The logical line include the line feeds between the physical lines that compose it but do not include the last line feed.
A comment is a logical line that starts wth the character '#'. It is completely ignored. Note that comments are logical lines so they can span across multiple physical lines.
A header is a line that starts with a header name and continue with a content. The header name must contains only characters from A to Z, from a to z, from 0 to 9, the character '-' and the character '.'. The header name and the content are separated by a colon ':' followed by optional whitespaces characters (space or tabulation) that are not part of the content.
The content can span across multiple physical lines. The content is defined as the binary data from the first non whitespace character after the colon following the header name until the end of te logical line.
We can also define the stripped content that is the same as the content except that the first whitespace character after any line feed is removed.
The included file part is optional and is defines as the binary data that follow the first occurence of two LF characters.
A desc file is valid only if it matches the description given above and if all the headers are valid and if all required headers are present.
A header is valid only if either :
if a header specifies that is must be present only once that means that if the headers is present more than once, the file is invalid.
The valid headers are described below :
This header is not required. It defines the version of the file specification that must be followed. if not present, the version of the specification that should be taken is the version 1.0 or the latest draft if the version 1.0 of this specification does not exists. This header must be present only once.
This header is required only if the desc file does not hold the included file part. it contains a relative path to the filename that the desc file describes. The path is relative to the desc file location. This header must be present only once.
The Content-Type header is case-insensitive and holds the content-type of the file described. This header must be present only once and is not required.
--------------------------------- Content-Type: text/plain; encoding=utf-8
This is a text file encoded in UTF-8 Here the desc file is useful because it can specify the encoding of the text file ---------------------------------
The desc file may be used to transfert metadata along with a file in a protocol when this is not usually permitted. For example we can imagine a mail client that will create a desc file for each attached file in an electronic mail. The desc file will host metadata and will reference the file with its name using the Filename header.
it can also be useful to store files and keep informations such as their content-type on filesystems that do not store this information.
We can imagine that on such systems, applications can open desc files and read the included content. So it would be possible to store files and not loose any metadata such as the content type. Thus not being forced to rely on magic numbers (note that magic numbers are not always relyable, for example with test files or ZIP files that can be also Jar archives or openDocuments).