gavo.protocols.oaiclient module

A simple client of OAI-http.

This includes both some high-level functions and rudimentary parsers that can serve as bases for more specialized parsers.

class gavo.protocols.oaiclient.CanonicalPrefixes(pickleName)[source]

Bases: object

a self-persisting dictionary of the prefixes we use in our OAI interface.

CanonicalPrefixes objects are constructed with the name of a pickle file containing a list of (prefix, uri) pairs.

This reproduces some code from stanxml.NSRegistry, but we want that stuff as instance method here, not as class method.

getNSForPrefix(prefix)[source]
getPrefixForNS(ns)[source]
haveNS(ns)[source]
iterNS()[source]
registerPrefix(prefix, ns, save=True)[source]
registerPrefixOrMakeUp(prefix, ns)[source]

registers prefix for ns or, if prefix is already taken, makes up a new prefix for the namespace URI ns.

exception gavo.protocols.oaiclient.FailedQuery(msg, code='?', value='?')[source]

Bases: Exception

class gavo.protocols.oaiclient.IdParser(initRecs=None)[source]

Bases: StartEndHandler, OAIErrorMixin

A parser for simple OAI-PMH headers.

Records end up as a list of dictionaries in the recs attribute.

getResult()[source]
resumptionToken = None
class gavo.protocols.oaiclient.IdentifyParser[source]

Bases: StartEndHandler, OAIErrorMixin

A parser for the result of the identify operation.

The result (an instance of ServerProperties) is in the serverProperties attribute.

getResult()[source]
resumptionToken = None
serverProperties = None
exception gavo.protocols.oaiclient.NoRecordsMatch[source]

Bases: Exception

class gavo.protocols.oaiclient.OAIErrorMixin[source]

Bases: object

class gavo.protocols.oaiclient.OAIQuery(registry, verb, startDate=None, endDate=None, set=None, metadataPrefix='ivo_vor', identifier=None, contentCallback=None, granularity=None)[source]

Bases: object

A container for queries to OAI interfaces.

Construct it with the oai endpoint and the OAI verb, plus some optional query attributes. If you want to retain or access the raw responses of the server, pass a contentCallback function – it will be called with a byte string containing the payload of the server response if it was parsed successfully. Error responses cannot be obtained in this way.

The OAIQuery is constructed with OAI-PMH parameters (verb, startDate, endDate, set, metadataPrefix; see the OAI-PMH docs for what they mean, only verb is mandatory). In addition, you can pass granularity, which is the granularity

doHTTP(**moreArgs)[source]

returns the result of parsing the current query plus moreArgs to the current registry.

The result is returned as a string.

endDate = None
getKWs(**moreArgs)[source]

returns a dictionary containing query keywords for OAI interfaces from what’s specified on the command line.

maxRecords = None
metadataPrefix = None
registry = None
set = None
startDate = None
talkOAI(parserClass)[source]

processes an OAI dialogue for verb using the IdParser-derived parserClass.

timeout = 100
class gavo.protocols.oaiclient.OAIRecordsParser(canonicalPrefixes=None)[source]

Bases: ContentHandler, OAIErrorMixin

a SAX ContentHandler generating tuples of some record-level metadata and pre-formatted XML of simple implementation of the OAI interface.

canonicalPrefixes is a CanonicalPrefixesInstance built from res/canonicalPrefixes.pickle

Note that we require that records actually carry ivo_vor metadata.

Note that this is nothing people should need in normal operation. GAVO Heidelberg needs this for infrastructure services (including OAI on RegTAP, but we needed it beyond that).

characters(stuff)[source]

Receive notification of character data.

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

endElementNS(namePair, name)[source]

Signals the end of an element in namespace mode.

The name parameter contains the name of the element type, just as with the startElementNS event.

endHandlers = {'oai:error': <function OAIRecordsParser._end_oai_error>, 'oai:identifier': <function OAIRecordsParser._end_oai_identifier>, 'oai:record': <function OAIRecordsParser._end_oai_record>, 'oai:resumptionToken': <function OAIRecordsParser._end_oai_resumptionToken>, 'oai:setSpec': <function OAIRecordsParser._end_oai_setSpec>}
endPrefixMapping(prefix)[source]

End the scope of a prefix-URI mapping.

See startPrefixMapping for details. This event will always occur after the corresponding endElement event, but the order of endPrefixMapping events is not otherwise guaranteed.

getResult()[source]
normalizeNamespace(name)[source]

fixes the namespace prefix of name if necessary.

name must be a qualified name, i.e., contain exactly one colon.

“normalize” here means make sure the prefix matches our canonical prefix and change it to the canonical one if necessary.

notifyError(err)[source]
resumptionToken = None
shipout(role, record)[source]
startElementNS(namePair, ignored, attrs)[source]

Signals the start of an element in namespace mode.

The name parameter contains the name of the element type as a (uri, localname) tuple, the qname parameter the raw XML 1.0 name used in the source document, and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.

The uri part of the name tuple is None for elements which have no namespace.

startHandlers = {'oai:error': <function OAIRecordsParser._start_oai_error>, 'oai:header': <function OAIRecordsParser._start_oai_header>, 'oai:record': <function OAIRecordsParser._start_oai_record>, 'ri:Resource': <function OAIRecordsParser._start_ri_Resource>}
startPrefixMapping(prefix, uri)[source]

Begin the scope of a prefix-URI Namespace mapping.

The information from this event is not necessary for normal Namespace processing: the SAX XML reader will automatically replace prefixes for element and attribute names when the http://xml.org/sax/features/namespaces feature is true (the default).

There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the start/endPrefixMapping event supplies the information to the application to expand prefixes in those contexts itself, if necessary.

Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each-other: all startPrefixMapping events will occur before the corresponding startElement event, and all endPrefixMapping events will occur after the corresponding endElement event, but their order is not guaranteed.

exception gavo.protocols.oaiclient.PrefixIsTaken[source]

Bases: Exception

class gavo.protocols.oaiclient.RecordParser(initRecs=None)[source]

Bases: IdParser, OAIErrorMixin

A simple parser for ivo_vor records.

This only pulls out a number of the most salient items; more will probably follow as needed.

class gavo.protocols.oaiclient.ServerProperties[source]

Bases: object

A container for what an OAI-PMH server gives in response to identify.

add(name, value)[source]
adminEmails = ()
baseURL = None
compressions = ()
deletedRecord = None
earliestDatestamp = None
granularity = None
protocolVersion = None
repositoryName = None
set(name, value)[source]
gavo.protocols.oaiclient.getCanonicalPrefixes()[source]
gavo.protocols.oaiclient.getIdentifiers(registry, startDate=None, endDate=None, set=None, granularity=None)[source]

returns a list of “short” records for what’s in the registry specified by args.

gavo.protocols.oaiclient.getRecord(registry, identifier)[source]

returns the XML form of an OAI-PMH record for identifier from the OAI-PMH endpoint at URL registry.

This uses the OAIRecordsParser which enforces canonical prefixes, and the function will add their declarations as necessary. This also means that evil registry records could be broken by us.

gavo.protocols.oaiclient.getRecords(registry, startDate=None, endDate=None, set=None, granularity=None)[source]

returns a list of “long” records for what’s in the registry specified by args.

parser should be a subclass of RecordParser; otherwise, you’ll miss resumption and possibly other features.

gavo.protocols.oaiclient.getServerProperties(registry)[source]

returns a ServerProperties instance for registry.

In particular, you can retrieve the granularity argument that actually matches the registry from the result’s granularity attribute.

gavo.protocols.oaiclient.parseRecord(recordXML)[source]

returns some main properties from an XML-encoded VOResource record.

recordXML can be an OAI-PMH response or just a naked record. If multiple records are contained in recordXML, only the first will be returned.

What’s coming back is a dictionary as produced by RecordParser.