gavo.formats.common module

Common code for generation of various data formats.

The main function here is formatData. It receives a string format id, a data instance and a destination file (binary mode). It dispatches this to formatters previously registered using registerDataWriter.

The data writers must take a data instance and a file instance; their effect must be that a serialized representation of data, or, if the format does not support this, the data’s primary table is written to the file instance.

exception gavo.formats.common.CannotSerializeIn(format)[source]

Bases: Error

class gavo.formats.common.FORMATS_REGISTRY[source]

Bases: object

a registry for data formats that can be produced by DaCHS.

This works by self-registration of the respective modules on their input; hence, if you want to rely on some entry here, be sure there’s an import somewhere.

extensionToKey = {'.csv': 'csv_bare', '.geojson': 'geojson', '.html': 'html', '.json': 'json', '.tsv': 'tsv', '.txt': 'txt', '.vot': 'votable', '.vot1': 'votabletd1.1', '.vot2': 'votabletd1.2', '.vot5': 'vodml', '.votb2': 'votableb2', '.vottd': 'votabletd'}
formatToLabel = {'csv': 'CSV with column labels', 'csv_bare': 'CSV without column labels', 'geojson': 'GeoJSON', 'html': 'HTML', 'json': 'JSON', 'tsv': 'Tab separated values', 'txt': 'Fixed-column plain text', 'vodml': 'VOTable version 1.5, tabledata', 'vodmlb': 'VOTable version 1.5', 'votable': 'Default VOTable', 'votable1.1': 'Tabledata VOTable version 1.1', 'votableb2': 'Binary2 VOTable', 'votabletd': 'Tabledata VOTable', 'votabletd1.1': 'Tabledata VOTable version 1.1', 'votabletd1.2': 'Tabledata VOTable version 1.2'}
formatToMIME = {'csv': 'text/csv;header=present', 'csv_bare': 'text/csv', 'geojson': 'application/geo+json', 'html': 'text/html', 'json': 'application/json', 'tsv': 'text/tab-separated-values', 'txt': 'text/plain', 'vodml': 'application/x-votable+xml;serialization=TABLEDATA;version=1.5', 'vodmlb': 'application/x-votable+xml;version=1.5', 'votable': 'application/x-votable+xml', 'votable1.1': 'application/x-votable+xml;version=1.1', 'votableb2': 'application/x-votable+xml;serialization=BINARY2', 'votabletd': 'application/x-votable+xml;serialization=TABLEDATA', 'votabletd1.1': 'application/x-votable+xml;serialization=TABLEDATA;version=1.1', 'votabletd1.2': 'application/x-votable+xml;serialization=TABLEDATA;version=1.2'}
classmethod getAliasesFor(formatName)[source]

returns alternate names for a DaCHS format key.

Don’t modify what you get back. This will return the DaCHS format key if it is not the mime itself.

classmethod getKeyFor(formatName)[source]

returns a DaCHS format key for formatName (DaCHS key or MIME).

If formatName is a mime type with parameters, we’ll also try to get a format with the parameters stripped and silently succeed if that works.

classmethod getLabelFor(formatName)[source]

returns a label for formatName (DaCHS key or MIME type).

classmethod getMIMEFor(formatName, orderedFormat=None)[source]

returns a simple MIME type for our formatName (some incoming MIME or an alias).

Some magic, reserved mimes that need to be preserved from the input are recognised and returned in orderedFormat. This is for TAP and related DALI hacks.

classmethod getTAPIdFor(formatName)[source]

returns a TAPRegExt ivoid for a DaCHS format key.

This will return None if TAPRegExt does not prescribe such a key.

classmethod getTypeForExtension(extension)[source]

returns the media type first registered for extension.

extension must begin with a dot. None is returned for extensions no format has (yet) claimed.

classmethod getWriterFor(formatName)[source]

returns a writer for formatName.

writers are what’s registered via registerDataWriter; formatName is a MIME type or a format alias. This raises CannotSerializeIn if no writer is available.

classmethod iterFormats()[source]

iterates over the short names of the available formats.

keyToAliases = {'csv': ['csv'], 'csv_bare': ['csv_bare'], 'geojson': ['geojson'], 'html': ['html'], 'json': ['json'], 'tsv': ['tsv'], 'txt': ['txt'], 'vodml': ['vodml'], 'vodmlb': ['vodmlb'], 'votable': ['votable'], 'votable1.1': ['text/xml', 'votable1.1'], 'votableb2': ['votable/b2', 'votableb2'], 'votabletd': ['text/xml', 'votable/td', 'votabletd'], 'votabletd1.1': ['text/xml', 'votabletd1.1'], 'votabletd1.2': ['text/xml', 'votabletd1.2']}
keyToExtension = {'csv_bare': '.csv', 'geojson': '.geojson', 'html': '.html', 'json': '.json', 'tsv': '.tsv', 'txt': '.txt', 'vodml': '.vot5', 'votable': '.vot', 'votableb2': '.votb2', 'votabletd': '.vottd', 'votabletd1.1': '.vot1', 'votabletd1.2': '.vot2'}
keyToTAPId = {'votable': 'ivo://ivoa.net/std/TAPRegExt#output-votable-binary', 'votableb2': 'ivo://ivoa.net/std/TAPRegExt#output-votable-binary2', 'votabletd': 'ivo://ivoa.net/std/TAPRegExt#output-votable-td'}
mimeToKey = {('application', 'geo+json', frozenset()): 'geojson', ('application', 'json', frozenset()): 'json', ('application', 'x-votable+xml', frozenset()): 'votable', ('application', 'x-votable+xml', frozenset({('serialization', 'binary2')})): 'votableb2', ('application', 'x-votable+xml', frozenset({('serialization', 'tabledata')})): 'votabletd', ('application', 'x-votable+xml', frozenset({('version', '1.1')})): 'votable1.1', ('application', 'x-votable+xml', frozenset({('version', '1.1'), ('serialization', 'tabledata')})): 'votabletd1.1', ('application', 'x-votable+xml', frozenset({('version', '1.2'), ('serialization', 'tabledata')})): 'votabletd1.2', ('application', 'x-votable+xml', frozenset({('version', '1.5')})): 'vodmlb', ('application', 'x-votable+xml', frozenset({('version', '1.5'), ('serialization', 'tabledata')})): 'vodml', ('text', 'csv', frozenset()): 'csv_bare', ('text', 'csv', frozenset({('header', 'present')})): 'csv', ('text', 'html', frozenset()): 'html', ('text', 'plain', frozenset()): 'txt', ('text', 'tab-separated-values', frozenset()): 'tsv', ('text', 'xml', frozenset()): 'votabletd1.2', ('votable', 'b2', frozenset()): 'votableb2', ('votable', 'td', frozenset()): 'votabletd'}
classmethod registerDataWriter(key, writer, mainMime, label, extension, *aliases, tapId=None)[source]

adds a writer to the formats registry.

Key is a short, unique handle for the format, writer is a writer function(data, outputFile) -> None (where data can be an rsc.Data or an rsc.Table instance), mainMime is the preferred media type, label is a human-readable designation for the format (shown in selection widgets and the like), extension is a suggested extension for the format (lower-case only), and aliases are other strings that can be used to select the format in DALI FORMAT or similar.

Where keys, mainMime, and aliases clash, previous entries are silently overwritten. For extensions, the first registered format wins.

writerRegistry = {'csv': <function <lambda>>, 'csv_bare': <function writeDataAsCSV>, 'geojson': <function writeTableAsGeoJSON>, 'html': <function writeDataAsHTML>, 'json': <function writeTableAsJSON>, 'tsv': <function renderAsText>, 'txt': <function renderAsColumns>, 'vodml': functools.partial(<function format>, tablecoding='td', version=(1, 5)), 'vodmlb': functools.partial(<function format>, version=(1, 5)), 'votable': <function format>, 'votable1.1': functools.partial(<function format>, tablecoding='binary', version=(1, 1)), 'votableb2': functools.partial(<function format>, tablecoding='binary2'), 'votabletd': functools.partial(<function format>, tablecoding='td'), 'votabletd1.1': functools.partial(<function format>, tablecoding='td', version=(1, 1)), 'votabletd1.2': functools.partial(<function format>, tablecoding='td', version=(1, 2))}
gavo.formats.common.formatData(formatName, table, outputFile, acquireSamples=True, **moreFormatterArgs)[source]

writes a table to outputFile in the format given by key.

Table may be a table or a Data instance. formatName is a format shortcut (formats.iterFormats() gives keys available) or a media type. If you pass None, the default VOTable format will be selected.

This raises a CannotSerializeIn exception if formatName is not recognized. Note that you have to import the serialising modules from the format package to make the formats available (fitstable, csvtable, geojson, jsontable, texttable, votable; api itself already imports the more popular of these).

If a client knows a certain formatter understands additional arguments, it can hand them in as keywords arguments. This will raise an error if another formatter that doesn’t understand the argument is being used.

gavo.formats.common.getAliasesFor(formatName)

returns alternate names for a DaCHS format key.

Don’t modify what you get back. This will return the DaCHS format key if it is not the mime itself.

gavo.formats.common.getExtensionFor(mediaType)[source]

returns a suggested extension for files of mediaType.

mediaType can be an RFC 2045 media type, or one of DaCHS’ internal format codes.

As a fallback, .dat will be returned.

gavo.formats.common.getFormatted(formatName, table, acquireSamples=False)[source]

returns a string containing a representation of table in the format given by formatName.

This is just wrapping the `function formatData`_; se there for formatName. This function will use large amounts of memory for large data.

gavo.formats.common.getKeyFor(formatName)

returns a DaCHS format key for formatName (DaCHS key or MIME).

If formatName is a mime type with parameters, we’ll also try to get a format with the parameters stripped and silently succeed if that works.

gavo.formats.common.getLabelFor(formatName)

returns a label for formatName (DaCHS key or MIME type).

gavo.formats.common.getMIMEFor(formatName, orderedFormat=None)

returns a simple MIME type for our formatName (some incoming MIME or an alias).

Some magic, reserved mimes that need to be preserved from the input are recognised and returned in orderedFormat. This is for TAP and related DALI hacks.

gavo.formats.common.getMIMEKey(contentType)[source]

makes a DaCHS mime key from a content-type string.

This is used for retrieving matching mime types and is a triple of major and minor mime type and a set of parameter pairs.

contentType is a string-serialized mime type.

We also normalise everything to lower case. I don’t think that’s quite standards-compliant, but with all the other case-insensitivity nonsense, anything else will get really ugly.

gavo.formats.common.getTAPIdFor(formatName)

returns a TAPRegExt ivoid for a DaCHS format key.

This will return None if TAPRegExt does not prescribe such a key.

gavo.formats.common.getWriterFor(formatName)

returns a writer for formatName.

writers are what’s registered via registerDataWriter; formatName is a MIME type or a format alias. This raises CannotSerializeIn if no writer is available.

gavo.formats.common.guessMediaType(fName)[source]

returns a media type plausible for a file named fName.

This first uses the extension map inferred by our formats registry, has some built-in safety catches in case the formatters haven’t been imported, and then falls back to built-in python mimetypes.guess_type If nothing matches, it returns application/octet-stream.

Extensions are used case-insensitively. We don’t do any encoding inference (yet). We may, though, so by all means shout if you’re using this in DaCHS-external code.

gavo.formats.common.iterFormats()

iterates over the short names of the available formats.

gavo.formats.common.registerDataWriter(key, writer, mainMime, label, extension, *aliases, tapId=None)

adds a writer to the formats registry.

Key is a short, unique handle for the format, writer is a writer function(data, outputFile) -> None (where data can be an rsc.Data or an rsc.Table instance), mainMime is the preferred media type, label is a human-readable designation for the format (shown in selection widgets and the like), extension is a suggested extension for the format (lower-case only), and aliases are other strings that can be used to select the format in DALI FORMAT or similar.

Where keys, mainMime, and aliases clash, previous entries are silently overwritten. For extensions, the first registered format wins.