gavo.grammars.directhdf5 module

Helpers for generating boosters for HDF5 data.

HDF5 is fairly complex, and directgrammar is too long as is. Also, I don’t want to require h5py as a fixed dependency of DaCHS; if it’s not there, you should still be able to use other sorts of direct grammars.

TODO: Most of the hdf interface functions return 0 on success, != on failure; we’d like a macro to catch these.

class gavo.grammars.directhdf5.BaseHDF5CodeGenerator(grammar, tableDef)[source]

Bases: _NumpyMetaCodeGenerator

An abstract base for generating boosters for various sorts of HDF5 files.

Our strategy when parsing from them is to read CHUNK_SIZE items at a time into the corresponding arrays, and then iterating over these chunks, building the records.

The difference between the various flavours is where the column metadata is taken from and where the data is then pulled from.

getColumns()[source]

has to return a sequence of (name, type) pairs for the columns of the relation encoded in the HDF5 file.

getFooter()[source]

returns the code for the createDumpfile method.

You want to use the C fragments above for that.

The default returns something that bombs out.

getItemParser(item, index)[source]

returns code that parses item (a Column instance) at column index index.

You’re free to ignore index.

getPreamble()[source]

returns a list of lines that make up the top of the booster.

getPrototype()[source]

returns the prototype of the getTuple function.

class gavo.grammars.directhdf5.HDF5rootarrCodeGenerator(grammar, tableDef)[source]

Bases: BaseHDF5CodeGenerator

A code generator for boosters importing HDF5 files having their columns in arrays in the root dataset.

getColumns(hdf)[source]

has to return a sequence of (name, type) pairs for the columns of the relation encoded in the HDF5 file.

class gavo.grammars.directhdf5.HDF5vaexCodeGenerator(grammar, tableDef)[source]

Bases: BaseHDF5CodeGenerator

A code generator for boosters importing HDF5 files in VAEX convention.

These have one array per column in a “columns” group; the actual data is in a “data” group.

Our strategy when parsing from them is to read CHUNK_SIZE items at a time into the corresponding arrays, and then iterating over these chunks, building the records.

getColumns(hdf)[source]

has to return a sequence of (name, type) pairs for the columns of the relation encoded in the HDF5 file.