gavo.utils.plainxml module

Some XML hacks.

StartEndHandler simplifies the creation of SAX parsers, intended for client code or non-DC XML parsing.

iterparse is an elementtree-inspired thin expat layer; both VOTable and base.structure parsing builds on it.

class gavo.utils.plainxml.ErrorPosition(fName, line, column)[source]

Bases: object

A wrapper for an error position.

Construct it with file name, line number, and column. Use None for missing or unknown values.

fName = None
class gavo.utils.plainxml.StartEndHandler[source]

Bases: ContentHandler

This class provides startElement, endElement and characters methods that translate events into method calls.

When an opening tag is seen, we look of a _start_<element name> method and, if present, call it with the name and the attributes. When a closing tag is seen, we try to call _end_<element name> with name, attributes and contents. If the _end_xxx method returns a string (or similar), this value will be added to the content of the enclosing element.

Rather than overriding __init__, you probably want to override the _initialize() method to create the data structures you want to fill from XML.

StartEndHandlers clean element names from namespace prefixes, and they ignore them in every other way. If you need namespaces, use a different interface.

characters(chars)[source]

Receive notification of character data.

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

cleanupName(name)[source]
endElement(name, suppress=False)[source]

Signals the end of an element in non-namespace mode.

The name parameter contains the name of the element type, just as with the startElement event.

endElementNS(namePair, qName)[source]

Signals the end of an element in namespace mode.

The name parameter contains the name of the element type, just as with the startElementNS event.

getAttrsAsDict(attrs)[source]

returns attrs as received from SAX as a dictionary.

The main selling point is that any namespace prefixes are removed from the attribute names. Any prefixes on attrs remain, though.

getParentTag(depth=1)[source]

Returns the name of the parent element.

This only works as written here in end handlers. In start handlers, you have to path depth=2 (since their tag already is on the stack.

getResult()[source]
parse(stream)[source]
parseBytes(string)
parseString(string)[source]
processingInstruction(target, data)[source]

Receive notification of a processing instruction.

The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.

A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.

setDocumentLocator(locator)[source]

Called by the parser to give the application a locator for locating the origin of document events.

SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the DocumentHandler interface.

The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application’s business rules). The information returned by the locator is probably not sufficient for use with a search engine.

Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.

startElement(name, attrs)[source]

Signals the start of an element in non-namespace mode.

The name parameter contains the raw XML 1.0 name of the element type as a string and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.

startElementNS(namePair, qName, attrs)[source]

Signals the start of an element in namespace mode.

The name parameter contains the name of the element type as a (uri, localname) tuple, the qname parameter the raw XML 1.0 name used in the source document, and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.

The uri part of the name tuple is None for elements which have no namespace.

class gavo.utils.plainxml.iterparse(source, parseErrorClass=<class 'gavo.utils.excs.StructureError'>)[source]

Bases: object

iterates over start, data, and end events in source.

To keep things simple downstream, we swallow all namespace prefixes, if present.

iterparse is constructed with a source (anything that can read(source)) and optionally a custom error class. This error class needs to have the message as the first argument. Since expat error messages usually contain line number and column in them, no extra pos attribute is supported.

Since the parser typically is far ahead of the events seen, we do our own bookkeeping by storing the parser position with each event. The end of the construct that caused an event can be retrieved using pos.

chunkSize = 1048576

The number of bytes handed to expat from iterparse at one go.

close()[source]
getParseError(msg)[source]
property pos
pushBack(type, name, payload)[source]
gavo.utils.plainxml.traverseETree(eTree)[source]

iterates the elements of an elementTree in postorder.