gavo.grammars.freeregrammar module

A grammar based on repeated application of REs

class gavo.grammars.freeregrammar.FreeREGrammar(parent, **kwargs)[source]

Bases: Grammar

A grammar allowing “free” regular expressions to parse a document.

Basically, you give a rowProduction to match individual records in the document. All matches of rowProduction will then be matched with parseRE, which in turn must have named groups. The dictionary from named groups to their matches makes up the input row.

For writing the parseRE, we recommend writing an element, using a CDATA construct, and taking advantage of python’s “verbose” regular expressions. Here’s an example:

<parseRE><![CDATA[(?xsm)^name::(?P<name>.*)
        ^query::(?P<query>.*)
        ^description::(?P<description>.*)\.\.
]]></parseRE>
attrSeq = [<gavo.base.attrdef.UnicodeAttribute object>, <gavo.base.parsecontext.IdAttribute object>, <gavo.base.attrdef.BooleanAttribute object>, <gavo.base.complexattrs.StructAttribute object>, <gavo.base.parsecontext.OriginalAttribute object>, <gavo.grammars.common.REAttribute object>, <gavo.base.complexattrs.PropertyAttribute object>, <gavo.rscdef.common.RDAttribute object>, <gavo.grammars.common.REAttribute object>, <gavo.base.complexattrs.StructListAttribute object>, <gavo.base.complexattrs.StructAttribute object>, <gavo.base.attrdef.BooleanAttribute object>]
clearProperty(name)
completedCallbacks = []
getFullId()
getProperty(name, default=<Undefined>)
hasProperty(name)
managedAttrs = {'enc': <gavo.base.attrdef.UnicodeAttribute object>, 'id': <gavo.base.parsecontext.IdAttribute object>, 'ignoreJunk': <gavo.base.attrdef.BooleanAttribute object>, 'ignoreOn': <gavo.base.complexattrs.StructAttribute object>, 'original': <gavo.base.parsecontext.OriginalAttribute object>, 'parseRE': <gavo.grammars.common.REAttribute object>, 'properties': <gavo.base.complexattrs.PropertyAttribute object>, 'property': <gavo.base.complexattrs.PropertyAttribute object>, 'rd': <gavo.rscdef.common.RDAttribute object>, 'rowProduction': <gavo.grammars.common.REAttribute object>, 'rowfilter': <gavo.base.complexattrs.StructListAttribute object>, 'rowfilters': <gavo.base.complexattrs.StructListAttribute object>, 'sourceFields': <gavo.base.complexattrs.StructAttribute object>, 'stripTokens': <gavo.base.attrdef.BooleanAttribute object>}
name_ = 'freeREGrammar'
property rd
rowIterator

alias of RowIterator

setProperty(name, value)
class gavo.grammars.freeregrammar.RowIterator(grammar, sourceToken, **kwargs)[source]

Bases: FileRowIterator

chunkSize = 8192
getLocator()[source]