gavo.grammars.common module

Base classes and common code for grammars.

NOTE: If you add grammars, you have to enter manually them in rscdef.builtingrammars.GRAMMAR_REGISTRY (we don’t want to import all the mess in this package just to make that).

class gavo.grammars.common.FileRowAttributes[source]

Bases: StructCallbacks

A mixin for grammars with FileRowIterators.

This provides some attributes that FileRowIterators interpret, e.g., preFilter.

completeElement(ctx)[source]
class gavo.grammars.common.FileRowIterator(grammar, sourceToken, **kwargs)[source]

Bases: RowIterator

is a RowIterator base for RowIterators reading files.

It analyzes the sourceToken to see if it’s a string, in which case it opens it as a file name and leaves the file object in self.inputFile.

Otherwise, it assumes sourceToken already is a file object and binds it to self.inputFile. It then tries to come up with a sensible designation for sourceToken.

It also inspects the parent grammar for a gunzip attribute. If it is present and true, the input file will be unzipped transparently. Don’t add more features like this; preFilter is a lot more flexible.

Classes using this reading binary data will want to set fileMode to rb. If they don’t what’s returned is strings.

fileMode = 'r'
finalize()[source]
class gavo.grammars.common.FilteredInputFile(filterCommand, origFile, silent=False)[source]

Bases: object

a pseudo-file that allows piping data through a shell command.

It supports read, readline, and close. Close closes the original file, too.

Warning: the command passed in will be shell-expanded (which is fair since you can pass in any command you want anyway).

If you pass silent=True, stderr will be redirected to /dev/null. This is probably only attractive for unit tests and such.

close()[source]
read(nBytes=None)[source]
readline()[source]
class gavo.grammars.common.Grammar(parent, **kwargs)[source]

Bases: Structure, GrammarMacroMixin

An abstract grammar.

Grammars are configured via their structure parameters. Their parse(sourceToken) method returns an object that iterates over rawdicts (dictionaries mapping keys to (typically) strings) that can then be fed through rowmakers; it also has a method getParameters that returns global properties of the whole document (like parameters in VOTables; this will be empty for many kinds of grammars).

RowIterators will return a reference to themselves in the raw dicts in the parser_ key unless you override their _iterRowsProcessed method (which you shouldn’t). This is used by rowmaker macros.

What exactly sourceToken is is up to the concrete grammar. While typically it’s a file name, it might be a sequence of dictionaries, a twisted web request, or whatever.

To derive a concrete Grammar, define a RowIterator for your source and set the rowIterator class attribute to it.

attrSeq = [<gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.complexattrs.StructAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.base.complexattrs.PropertyAttribute object>, <gavo.rscdef.common.RDAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.complexattrs.StructAttribute object>]
clearProperty(name)
completedCallbacks = []
getFullId()
getProperty(name, default=<Undefined>)
getSourceFields(sourceToken, data)[source]

returns a dict containing user-defined fields to be added to all results.

hasProperty(name)
isDispatching = False
managedAttrs = {'enc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'ignoreOn': <gavo.base.complexattrs.StructAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'properties': <gavo.base.complexattrs.PropertyAttribute object>, 'property': <gavo.base.complexattrs.PropertyAttribute object>, 'rd': <gavo.rscdef.common.RDAttribute object>, 'rowfilter': <gavo.base.complexattrs.StructListAttribute object>, 'rowfilters': <gavo.base.complexattrs.StructListAttribute object>, 'sourceFields': <gavo.base.complexattrs.StructAttribute object>}
name_ = 'grammar'
parse(sourceToken, targetData=None)[source]
property rd
rowIterator

alias of RowIterator

setProperty(name, value)
class gavo.grammars.common.GrammarMacroMixin[source]

Bases: StandardMacroMixin

A collection of macros available to rowfilters.

NOTE: All macros should return only one single physical python line, or they will mess up the calculation of what constructs caused errors.

macro_colNames(tableRef)[source]

returns a comma-separated list of column names for a table reference.

This is convenient if an input file matches the table structure; you can then simply say things like <reGrammar names=”\colName{someTable}”/>.

macro_dlMetaURI(dlService)[source]

like fullDLURL, except it points to the datalink metadata.

This is intended for binding to //products#define’s datalink parameter.

If you need the value in a rowmaker, grab it from @prodtblDatalink.

macro_fullDLURL(dlService)[source]

returns a python expression giving a link to the full current data set retrieved through the datalink service.

You would write fullDLURL{dlsvc} here, and the macro will expand into something like http://yourserver/currd/dlsvc/dlget?ID=ivo://whatever.

dlService is the id of the datalink service in the current RD.

This is intended for “virtual” data where the dataset is generated on the fly through datalink.

macro_inputRelativePath(liberalChars='True')[source]

returns an expression giving the current source’s path relative to inputsDir

liberalChars can be a boolean literal (True, False, etc); if false, a value error is raised if characters that will result in trouble with the product mixin are within the result path.

In rowmakers fed by grammars with //products#define, better use @prodtblAccref.

macro_inputSize()[source]

returns an expression giving the size of the current source.

macro_property(property)[source]

returns the value of property on the parent DD.

macro_rootlessPath()[source]

returns an expression giving the current source’s path with the resource descriptor’s root removed.

macro_sourceDate()[source]

returns an expression giving the timestamp of the current source.

macro_splitPreviewPath(ext)[source]

returns an expression for the split standard path for a custom preview.

As standardPreviewPath, except that the directory hierarchy of the data files will be reproduced in previews. For ext, you should typically pass the extension appropriate for the preview (like {.png} or {.jpeg}).

See the introduction to custom previews for details.

macro_srcstem()[source]

returns python code for the stem of the source file currently parsed in a rowmaker.

Example: if you’re currently parsing /tmp/foo.bar, the stem is foo.

macro_standardPreviewPath()[source]

returns an expression for the standard path for a custom preview.

This consists of resdir, the name of the previewDir property on the embedding DD, and the flat name of the accref (which this macro assumes to see in its namespace as accref; this is usually the case in //products#define, which is where this macro would typically be used).

As an alternative, there is the splitPreviewPath macro, which does not mogrify the file name. In particular, do not use standardPreviewPath when you have more than a few 1e4 files, as it will have all these files in a single, flat directory, and that can become a chore.

See the introduction to custom previews for details.

class gavo.grammars.common.MapKeys(parent, **kwargs)[source]

Bases: Structure

Mapping of names, specified in long or short forms.

mapKeys is necessary in grammars like keyValueGrammar or fitsProdGrammar. In these, the source files themselves give key names. Within the GAVO DC, keys are required to be valid python identifiers (i.e., match [A-Za-z\_][A-Za-z\_0-9]*). If keys coming in do not have this form, mapping can force proper names.

mapKeys could also be used to make incoming names more suitable for matching with shell patterns (like in rowmaker idmaps).

attrSeq = [<gavo.base.structure.DataContent object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.complexattrs.DictAttribute object>]
completedCallbacks = []
doMap(aDict)[source]

returns dict with the keys mapped according to the defined mappings.

managedAttrs = {'content_': <gavo.base.structure.DataContent object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'map': <gavo.base.complexattrs.DictAttribute object>, 'maps': <gavo.base.complexattrs.DictAttribute object>}
name_ = 'mapKeys'
onElementComplete()[source]
class gavo.grammars.common.NullGrammar(parent, **kwargs)[source]

Bases: Grammar

A grammar that never returns any rows.

attrSeq = [<gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.complexattrs.StructAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.base.complexattrs.PropertyAttribute object>, <gavo.rscdef.common.RDAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.complexattrs.StructAttribute object>]
clearProperty(name)
completedCallbacks = []
getFullId()
getProperty(name, default=<Undefined>)
hasProperty(name)
managedAttrs = {'enc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'ignoreOn': <gavo.base.complexattrs.StructAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'properties': <gavo.base.complexattrs.PropertyAttribute object>, 'property': <gavo.base.complexattrs.PropertyAttribute object>, 'rd': <gavo.rscdef.common.RDAttribute object>, 'rowfilter': <gavo.base.complexattrs.StructListAttribute object>, 'rowfilters': <gavo.base.complexattrs.StructListAttribute object>, 'sourceFields': <gavo.base.complexattrs.StructAttribute object>}
name_ = 'nullGrammar'
property rd
setProperty(name, value)
class gavo.grammars.common.REAttribute(name, **kwargs)[source]

Bases: UnicodeAttribute

is an attribute containing (compiled) RE

parse(value)[source]

returns a typed python value for the string representation value.

value can be expected to be a unicode string.

unparse(value)[source]

returns a typed python value for the string representation value.

value can be expected to be a unicode string.

class gavo.grammars.common.RowIterator(grammar, sourceToken, sourceRow=None)[source]

Bases: object

An object that encapsulates the a source being parsed by a grammar.

RowIterators are returned by Grammars’ parse methods. Iterate over them to retrieve the rows contained in the source.

You can also call getParameters on them to retrieve document-global values (e.g., the parameters of a VOTable, a global header of a FITS table).

The getLocator method should return some string that aids the user in finding out why something went wrong (file name, line number, etc.)

This default implementation works for when source is a sequence of dictionaries. You will, in general, want to override _iteRows and getLocator, plus probably __init__ (to prepare external resources) and getParameters (if you have them; make sure to update any parameters you have with self.sourceRow as shown in the default getParameters implementation).

RowIterators are supposed to be self-destructing, i.e., they should release any external resources they hold when _iterRows runs out of items.

_iterRows should arrange for the instance variable recNo to be incremented by one for each item returned.

finalize()[source]
getLocator()[source]
getParameters()[source]
notify = True
class gavo.grammars.common.Rowfilter(parent, **kwargs)[source]

Bases: ProcApp

A generator for rows coming from a grammar.

Rowfilters receive rows (i.e., dictionaries) as yielded by a grammar under the name row. Additionally, the embedding row iterator is available under the name rowIter.

Macros are expanded within the embedding grammar.

The procedure definition must result in a generator, i.e., there must be at least one yield; in general, this will typically be a yield row, but a rowfilter may swallow or create as many rows as desired.

If you forget to have a yield in the rowfilter source, you’ll get a “NoneType is not iterable” error that’s a bit hard to understand.

Here, you can only access whatever comes from the grammar. You can access grammar keys in late parameters as row[key] or, if key is like an identifier, as @key.

attrSeq = [<gavo.base.complexattrs.StructListAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.base.parsecontext.ReferenceAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.attrdef.EnumeratedUnicodeAttribute object>]
completedCallbacks = []
formalArgs = 'row, rowIter'
getFuncCode()[source]

returns a function definition for this proc application.

This includes bindings of late parameters.

Locally defined code overrides code defined in a procDef.

managedAttrs = {'bind': <gavo.base.complexattrs.StructListAttribute object>, 'bindings': <gavo.base.complexattrs.StructListAttribute object>, 'code': <gavo.base.attrdef.UnicodeAttribute object>, 'deprecated': <gavo.base.attrdef.UnicodeAttribute object>, 'doc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'name': <gavo.base.attrdef.UnicodeAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'procDef': <gavo.base.parsecontext.ReferenceAttribute object>, 'setup': <gavo.base.complexattrs.StructListAttribute object>, 'setups': <gavo.base.complexattrs.StructListAttribute object>, 'type': <gavo.base.attrdef.EnumeratedUnicodeAttribute object>}
name_ = 'rowfilter'
requiredType = 'rowfilter'
class gavo.grammars.common.SourceFieldApp(parent, **kwargs)[source]

Bases: ProcApp

A procedure application that returns a dictionary added to all incoming rows.

Use this to programmatically provide information that can be computed once but that is then added to all rows coming from a single source, usually a file. This could be useful to add information on the source of a record or the like.

The code must return a dictionary. The source that is about to be parsed is passed in as sourceToken. When parsing from files, this simply is the file name. The data the rows will be delivered to is available as “data”, which is useful for adding or retrieving meta information.

attrSeq = [<gavo.base.complexattrs.StructListAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.base.parsecontext.ReferenceAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.attrdef.EnumeratedUnicodeAttribute object>]
completedCallbacks = []
formalArgs = 'sourceToken, data'
managedAttrs = {'bind': <gavo.base.complexattrs.StructListAttribute object>, 'bindings': <gavo.base.complexattrs.StructListAttribute object>, 'code': <gavo.base.attrdef.UnicodeAttribute object>, 'deprecated': <gavo.base.attrdef.UnicodeAttribute object>, 'doc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'name': <gavo.base.attrdef.UnicodeAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'procDef': <gavo.base.parsecontext.ReferenceAttribute object>, 'setup': <gavo.base.complexattrs.StructListAttribute object>, 'setups': <gavo.base.complexattrs.StructListAttribute object>, 'type': <gavo.base.attrdef.EnumeratedUnicodeAttribute object>}
name_ = 'sourceFields'
requriedType = 'sourceFields'
class gavo.grammars.common.TransparentGrammar(parent, **kwargs)[source]

Bases: Grammar

A grammar that returns its sourceToken as the row iterator.

This only makes sense in extreme situations and never without custom code. If you’re not sure you need this, you don’t want to know about it.

attrSeq = [<gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.complexattrs.StructAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.base.complexattrs.PropertyAttribute object>, <gavo.rscdef.common.RDAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.complexattrs.StructAttribute object>]
clearProperty(name)
completedCallbacks = []
getFullId()
getProperty(name, default=<Undefined>)
hasProperty(name)
managedAttrs = {'enc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'ignoreOn': <gavo.base.complexattrs.StructAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'properties': <gavo.base.complexattrs.PropertyAttribute object>, 'property': <gavo.base.complexattrs.PropertyAttribute object>, 'rd': <gavo.rscdef.common.RDAttribute object>, 'rowfilter': <gavo.base.complexattrs.StructListAttribute object>, 'rowfilters': <gavo.base.complexattrs.StructListAttribute object>, 'sourceFields': <gavo.base.complexattrs.StructAttribute object>}
name_ = 'transparentGrammar'
parse(sourceToken, targetData=None)[source]
property rd
setProperty(name, value)
gavo.grammars.common.compileRowfilter(filters)[source]

returns an iterator that “pipes” the rowfilters in filters.

This means that the output of filters[0] is used as arguments to filters[1] and so on.

If filters is empty, None is returned.

gavo.grammars.common.wrapFileFor(fileobj, desiredMode, enc)[source]

wraps or unwraps fileobj so that it matches the open mode desiredMode.

If there’s a “b” in desiredMode, this will return fileobj.raw if it’s there. Otherwise, it’ll wrap it into a codec.getreader for enc.