Author: | Markus Demleitner |
---|---|
Email: | gavo@ari.uni-heidelberg.de |
Date: | 2024-12-16 |
Copyright: | Waived under CC-0 |
Contents
The following (XML) elements are defined for resource descriptors. Some elements are polymorous (Grammars, Cores). See below for a reference on the respective real elements known to the software.
Each element description gives a general introduction to the element's use (complain if it's too technical; it's not unlikely that it is since these texts are actually the defining classes' docstrings).
Within RDs, element properties that can (but need not) be written in XML attributes, i.e., as a single string, are called "atomic". Their types are given in parentheses after the attribute name along with a default value.
In general, items defaulted to Undefined are mandatory. Failing to give a value will result in an error at RD parse time.
Within RD XML documents, you can (almost always) give atomic children either as XML attribute (att="abc") or as child elements (<att>abc</abc>). Some of the "atomic" attributes actually contain lists of items. For those, you should normally write multiple child elements (<att>val1</att><att>val2</att>), although sometimes it's allowed to mash together the individual list items using a variety of separators.
Here are some short words about the types you may encounter, together with valid literals:
There are also "Dict-like" attributes. These are built from XML like:
<d key="ab">val1</d> <d key="cd">val2</d>
In addition to key, other (possibly more descriptive) attributes for the key within these mappings may also be allowed. In special circumstances (in particular with properties) it may be useful to add to a value:
<property key="brokencols">ab,cd</property> <property key="brokencols" cumulative="True">,x</property>
will leave ab,cd,x in brokencols.
Many elements can also have "structure children". These correspond to compound things with attributes and possibly children of their own. The name given at the start of each description is irrelevant to the pure user; it's the attribute name you'd use when you have the corresponding python objects. For authoring XML, you use the name in the following link; thus, the phrase "colRefs (contains Element columnRef..." means you'd write <columnRef...>.
Here are some guidelines as to the naming of the attributes:
Also note that examples for the usage of almost everything mentioned here can be found in in the GAVO datacenter element reference.
A code fragment to manipulate the result row (and possibly more).
Apply elements allow embedding python code in rowmakers.
The current input fields from the grammar (including the rowmaker's vars) are available in the vars dictionary and can be changed there. You can also add new keys.
You can add new keys for shipping out in the result dictionary.
The active rowmaker is available as parent. It is also used to expand macros.
The table that the rowmaker feeds to can be accessed as targetTable. You probably only want to change meta information here (e.g., warnings or infos).
As always in procApps, you can get the embedding RD as rd; this is useful to, e.g., resolve references using rd.getByRD, and specify resdir-relative file names using rd.getAbsPath.
May occur in Element rowmaker.
A binding of a procedure definition parameter to a concrete value.
The value to set is contained in the binding body in the form of a python expression. The body must not be empty.
May occur in Element phraseMaker, Element apply, Element job, Element processEarly, Element processLate, Element regTest, Element rowfilter, Element sourceFields, Element iterator, Element pargetter, Element makeQuery, Element dataFormatter, Element dataFunction, Element descriptorGenerator, Element metaMaker, Element coreProc.
A database column.
Columns contain almost all metadata to describe a column in a database table or a VOTable (the exceptions are for column properties that may span several columns, most notably indices).
Note that the type system adopted by the DaCHS is a subset of postgres' type system. Thus when defining types, you have to specify basically SQL types. Types for other type systems (like VOTable, XSD, or the software-internal representation in python values) are inferred from them.
Columns can have delimited identifiers as names. Don't do this, it's no end of trouble. For this reason, however, you should not use name but rather key to programmatially obtain field's values from rows.
Properties evaluated:
May occur in Element table.
A reference from a group to a column within a table.
ColumnReferences do not support qualified references, i.e., you can only give simple names.
May occur in Element group.
A query specification for cores talking to the database.
CondDescs define inputs as a sequence of InputKeys (see Element InputKey). Internally, the values in the InputKeys can be translated to SQL.
May occur in Element resource, Element dbCore, Element fancyQueryCore, Element productCore, Element scsCore, Element siapCutoutCore, Element ssapCore.
The coverage of a resource.
For now, this is attached to the complete resource rather than the table, since this is where it sits in VOResource. DaCHS could be a bit more flexible, allowing different coverages per publish element. It is not right now, though.
Note: Technically, this will introduce or amend the coverage meta element. The information given here will be masked if you define a coverage meta on the service or table level. Just do not do that.
May occur in Element resource.
A custom data function for a service.
Custom data functions can be used to expose certain aspects of a service to Nevow templates. Thus, their definition usually only makes sense with custom templates, though you could, in principle, override built-in render functions.
In the data functions, you have the names ctx for a context and data for the "current data" (i.e., what's last been set using n:data). In ctx, only use ctx.tag (the tag on which the n:render attribute sits) and, if necessary ctx.request (the t.w request object).
Also, the active renderer is visible as self; the one thing you might want to see from there is self.queryMeta, which contains, for instance, the input parameters.
You can access the embedding service as service, the embedding RD as service.rd.
You can return arbitrary python objects -- whatever the render functions can deal with. You could, e.g., write:
<customDF name="now"> return datetime.datetime.utcnow() </customDF>
You can use the request to fetch request parameters. Within DaCHS, in addition to the clumsy request.args (mapping bytes to bytes), there is also request.strargs (mapping strings to strings). So, access a query parameter order like this:
sortOrder = ctx.request.strargs.get("order", ["authors"])
May occur in Element service.
A custom render function for a service.
Custom render functions can be used to expose certain aspects of a service to Nevow templates. Thus, their definition usually only makes sense with custom templates, though you could, in principle, override built-in render functions.
In the render functions, you have the names ctx for a context and data for the "current data" (i.e., what's last been set using n:data). In ctx, only use ctx.tag (the tag on which the n:render attribute sits) and, if necessary ctx.request (the t.w request object).
Also, the active renderer is visible as self; the one thing you might want to see from there is self.queryMeta, which contains, for instance, the input parameters (but be careful: the inputTable will be None when input errors are rendered, so better to code using it like this:
if self.queryMeta["inputTable"] and self.queryMeta["inputTable"]...:
You can return anything that can be in a stan DOM. Usually, this will be a string. To return HTML, use the stan DOM available under the T namespace.
As an example, the following code returns the current data as a link:
return ctx.tag[T.a(href=data)[data]]
You can access the embedding service as service, the embedding RD as service.rd.
May occur in Element service.
A description of how to process data from a given set of sources.
Data descriptors bring together a grammar, a source specification and "makes", each giving a table and a rowmaker to feed the table from the grammar output.
They are the "executable" parts of a resource descriptor. Their ids are used as arguments to gavoimp for partial imports.
May occur in Element resource.
Defaults for macros.
In STREAMs and NXSTREAMs, DEFAULTS let you specify values filled into macros when a FEED doesn't given them. Macro names are attribute names (or element names, if you insist), defaults are their values.
May occur in Element EDIT, Element events, Element lateEvents, Element NXSTREAM, Element STREAM.
an annotation of a table in terms of data models.
The content of this element is a Simple Instance Language clause.
May occur in Element table, Element outputTable.
an event stream targeted at editing other structures.
When replaying a stream in the presence of EDITs, the elements are are continually checked against ref. If an element matches, the children of edit will be played back into it.
May occur in Element mixinDef, Element FEED, Element LFEED, Element LOOP.
An event stream as a child of another element.
May occur in Element mixinDef, Element FEED, Element LFEED, Element LOOP.
a container for calling code.
This is a cron-like functionality. The jobs are run in separate threads, so they need to be thread-safe with respect to the rest of DaCHS. DaCHS serializes calls, though, so that your code should never run twice at the same time.
At least on CPython, you must make sure your code does not block with the GIL held; this is still in the server process. If you do daring things, fork off (note that you must not use any database connections you may have after forking, which means you can't safely use the RD passed in). See the docs on Element job.
Then testing/debugging such code, use gavo admin execute rd#id to immediately run the jobs.
May occur in Element resource.
A description of a foreign key relation between this table and another one.
May occur in Element table, Element outputTable.
A group is a collection of columns, parameters and other groups with a dash of metadata.
Within a group, you can refer to columns or params of the enclosing table by their names. Nothing outside of the enclosing table can be part of a group.
Rather than referring to params, you can also embed them into a group; they will then not be present in the embedding table.
Groups may contain groups.
One application for this is grouping input keys for the form renderer. For such groups, you probably want to give the label property (and possibly cssClass).
May occur in Element condDesc, Element table, Element outputTable, Element inputTable.
An upload going with a URL.
May occur in Element url.
A condition on a row that, if true, causes the row to be dropped.
Here, you can set bail to abort an import when the condition is met rather than just dropping the row.
May occur in Element rowmaker, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar.
A specification of sources to ignore.
Sources mentioned here are compared against the inputsDir-relative path of sources generated by sources (cf. Element sources). If there is a match, the corresponding source will not be processed.
You can get ignored files from various sources. If you give more than one source, the set of ignored files is the union of the the individual sets.
fromdbUpdating is a bit special in that the query must return UTC timestamps of the file's mtime during the last ingest in addition to the accrefs (see the reference documentation for an example).
Macros are expanded in the RD.
May occur in Element sources.
A description of an index in the database.
In real databases, indices may be fairly complex things; still, the most common usage here will be to just index a single column:
<index columns="my_col"/>
To index over functions, use the character content; you will have to put parentheses when using expressions. An explicit specification of the index expression is also necessary to allow RE pattern matches using indices in character columns (outside of the C locale). That would be:
<index columns="uri">uri text_pattern_ops</index>
(you still want to give columns so the metadata engine is aware of the index). See section "Operator Classes and Operator Families" in the Postgres documentation for details.
For pgsphere-valued columns, you at the time of writing need to specify the method:
<index columns="coverage" method="GIST"/>
To define q3c indices, use the //scs#q3cindex mixin; if you're devious enough to require something more flexible, have a look at that mixin's definition.
If indexed columns take part in a DaCHS-defined view, DaCHS will not notice. You should still declare the indices so users will see them in the metadata; writing:
<index columns="col1, col2, col3"/>
is sufficient for that.
May occur in Element table, Element outputTable.
A description of a piece of input.
Think of inputKeys as abstractions for input fields in forms, though they are used for services not actually exposing HTML forms as well.
Some of the DDL-type attributes (e.g., references) only make sense here if columns are being defined from the InputKey.
Properties evaluated:
May occur in Element condDesc, Element service, Element contextGrammar, Element inputTable, Element datalinkCore.
Python code for use within execute.
The resource descriptor this runs at is available as rd, the execute definition (having such attributes as title, job, plus any properties given in the RD) as execDef.
Note that no I/O capturing takes place (that's impossible since in general the jobs run within the server). To have actual cron jobs, use execDef.spawn(["cmd", "arg1"...]). This will send a mail on failed execution and also raise a ReportableError in that case.
In the frequent use case of a resdir-relative python program, you can use the execDef.spawnPython(modulePath) function.
If you must stay within the server process, you can do something like:
mod, _ = utils.loadPythonModule(rd.getAbsPath("bin/coverageplot")) mod.makePlot()
-- in that way, your code can sit safely within the resource directory and you still don't have to manipulate the module path.
May occur in Element execute.
An event stream played back by a mixin when the substrate is being finalised (but before the early processing).
May occur in Element mixinDef.
A macro definition within an RD.
The macro defined is available on the parent; macros are expanded within the parent (behaviour is undefined if you try a recursive expansion).
May occur in Element resource.
A build recipe for tables belonging to a data descriptor.
All makes belonging to a DD will be processed in the order in which they appear in the file.
May occur in Element data.
A mapping rule.
To specify the source of a mapping, you can either
If neither source or a body is given, map uses the key attribute as its source attribute.
The map rule generates a key/value pair in the result record.
May occur in Element rowmaker.
A definition for a resource mixin.
Resource mixins are resource descriptor fragments typically rooted in tables (though it's conceivable that other structures could grow mixin attributes as well).
They are used to define and implement certain behaviours components of the DC software want to see:
Mixins consist of events that are played back on the structure mixing in before anything else happens (much like original) and two procedure definitions, viz, processEarly and processLate. These can access the structure that has the mixin as substrate.
processEarly is called at the lexical location of the mixin. processLate is executed just before the parser exits. This is the place to fix up anything that uses the table mixed in. Note, however, that you should be as conservative as possible here -- you should think of DC structures as immutable as long as possible.
Programmatically, you can check if a certain table mixes in something by calling its mixesIn method.
Recursive application of mixins, even to separate objects, will deadlock.
May occur in Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
A parameter definition for mixins.
The (optional) body provides a default for the parameter.
May occur in Element mixinDef.
A value for enumerated columns.
For presentation purposes, an option can have a title, defaulting to the option's value.
May occur in Element values.
A column for defining the output of a service.
It adds some attributes useful for rendering results, plus functionality specific to certain cores.
The optional formatter overrides the standard formatting code in HTML (which is based on units, ucds, and displayHints). You receive the item from the database as data and must return a string or t.w.template stan. In addition to the standard Functions available for row makers you have queryMeta and t.w.template's tags in T.
Here's an example for generating a link to another service using this facility:
<outputField name="more" select="array[centerAlpha,centerDelta]" tablehead="More" description="More exposures near the center of this plate"> <formatter><![CDATA[ return T.a(href=base.makeSitePath("/lswscans/res/positions/q/form?" "POS=%s,%s&SIZE=1&INTERSECT=OVERLAPS&cutoutSize=0.5" "&__nevow_form__=genForm"%tuple(data) ))["More"] ]]> </formatter> </outputField>
Within the code, in addition to data, you see rd and queryMeta.
May occur in Element outputTable.
A table that has outputFields for columns.
Cores always have one of these, but they are implicitly defined by the underlying database tables in case of dbCores and such.
Services may define output tables to modify what is coming back from the core. Note that this usually only affects the output to web browsers. To use the output table also through VO protocols (and when producing VOTables, FITS files, and the like), you need to set the service's votableRespectsOutputTable property to True.
May occur in Element service, Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro curtable, Macro decapitalize, Macro getConfig, Macro getParam, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro nameForUCD, Macro nameForUCDs, Macro qName, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro tablename, Macro test, Macro today, Macro upper, Macro urlquote
A parameter of a procedure definition.
Bodies of ProcPars are interpreted as python expressions, in which macros are expanded in the context of the procedure application's parent. If a body is empty, the parameter has no default and has to be filled by the procedure application.
May occur in Element setup.
A table parameter.
This is like a column, except that it conceptually applies to all rows in the table. In VOTables, params will be rendered as PARAMs.
While we validate the values passed using the DaCHS default parsers, at least the VOTable params will be literal copies of the string passed in.
You can obtain a parsed value from the value attribute.
Null value handling is a bit tricky with params. An empty param (like <param name="x"/>) is always NULL (None in python). In order to allow setting NULL even where syntactially something has to stand, we also turn any __NULL__ to None.
For floats, NaN will also yield NULLs. For integers, you can also use
<param name="x" type="integer"><values nullLiteral="-1"/>-1</params>
For arrays, floats, and strings, the interpretation of values is undefined. Following VOTable practice, we do not tell empty strings and NULLs apart; for internal usage, there is a little hack: __EMPTY__ as literal does set an empty string. This is to allow defaulting of empty strings -- in VOTables, these cannot be distinguished from "true" NULLs.
May occur in Element group, Element table, Element data, Element outputTable.
A reference from a group to a parameter within a table.
ParamReferences do not support qualified references, i.e., you can only give simple names.
Also note that programmatically, you usually want to resolve ParamReferences within the Table instance, not the table definition.
May occur in Element group.
A procedure application for generating SQL expressions from input keys.
PhraseMaker code must yield SQL fragments that can occur in WHERE clauses, i.e., boolean expressions (thus, they must be generator bodies). The clauses yielded by a single condDesc are combined with the joiner set in the containing CondDesc (default=OR).
The following names are available to them:
- inputKeys -- the list of input keys for the parent CondDesc
- inPars -- a dictionary mapping inputKey names to the values provided by the user
- outPars -- a dictionary that is later used as the parameter dictionary to the query.
- core -- the core to which this phrase maker's condDesc belongs
To get the standard SQL a single key would generate, say:
yield base.getSQLForField(inputKeys[0], inPars, outPars)
To insert some value into outPars, do not simply use some key into outParse, since, e.g., the condDesc might be used multiple times. Instead, use getSQLKey, maybe like this:
ik = inputKeys[0] yield "%s BETWEEN %%(%s)s AND %%(%s)s"%(ik.name, base.getSQLKey(ik.name, inPars[ik.name]-10, outPars), base.getSQLKey(ik.name, inPars[ik.name]+10, outPars))
getSQLKey will make sure unique names in outPars are chosen and enters the values there.
May occur in Element condDesc.
An embedded procedure.
Embedded procedures are python code fragments with some interface defined by their type. They can occur at various places (which is called procedure application generically), e.g., as row generators in grammars, as apply-s in rowmakers, or as SQL phrase makers in condDescs.
They consist of the actual actual code and, optionally, definitions like the namespace setup, configuration parameters, or a documentation.
The procedure applications compile into python functions with special global namespaces. The signatures of the functions are determined by the type attribute.
ProcDefs are referred to by procedure applications using their id.
May occur in Element resource.
A code fragment run by the mixin machinery when the structure being worked on is being finished.
Within processEarly, you can access:
(the context is particularly handy for context.resolveId)
May occur in Element mixinDef.
A code fragment run by the mixin machinery when the parser parsing everything exits.
Within processLate, you can access:
May occur in Element mixinDef.
An active tag that lets you selectively delete children of the current object.
You give it regular expression-valued attributes; on the replay of the stream, matching items and their children will not be replayed.
If you give more than one attribute, the result will be a conjunction of the specified conditions.
This only works if the items to be matched are true XML attributes (i.e., not written as children).
For instance, the following will filter out all elements with a name of VERB from the stream:
<PRUNE name="VERB"/>
May occur in Element mixinDef, Element FEED, Element LFEED, Element LOOP.
A request for registration of a data or table item.
This is much like publish for services, just for data and tables; since they have no renderers, you can only have one register element per such element.
Data registrations may refer to published services that make their data available.
May occur in Element table, Element data, Element outputTable.
A specification of how a service should be published.
This contains most of the metadata for what is an interface in registry speak.
May occur in Element resRec, Element service.
A suite of regression tests.
May occur in Element resource.
A regression test.
Tests are defined through url and code elements. See Regression Testing for more information.
May occur in Element regSuite.
A resource descriptor.
RDs collect all information about how to parse a particular source (like a collection of FITS images, a catalogue, or whatever), about the database tables the data ends up in, and the services used to access them.
This is the root element of all RDs.
To give your schema a utype, set a utype meta on resource.
To set the schema_index in TAP_SCHEMA, put some integer to a schema-rank meta; lower-ranked schemas are displayed further up in supporting clients (since version 2.9.3).
Macros predefined here: Macro RSTcc0, Macro RSTccby, Macro RSTccbysa, Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
A resource for pure registration purposes.
A Resource without (much) DaCHS-defined behaviour. This can be Organizations or Instruments, but possibly also external services.
All resources must either have an id (which is used in the construction of their IVOID), or you must give an identifier meta item.
You must further set the following meta items:
- resType specifying the kind of resource record. You should not use this element to build resource records for services or tables (use the normal elements, even if the actual resources are external to DaCHS). resType can be registry, organization, authority, deleted, or anything else for which registry.builders has a handling class.
- title
- subject(s)
- description
- referenceURL
- creationDate
Additional meta keys (e.g., accessURL for a registry) may be required depending on resType. See the registry section in the operator's guide.
ResRecs can also have publication children. These will be turned into the appropriate capabilities depending on the value of the render attribute.
May occur in Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
A definition of the mapping between grammar input and finished rows ready for shipout.
Rowmakers consist of variables, procedures and mappings. They result in a python callable doing the mapping. In python code within rowmaker elements, you can use a large number of functions. See Functions available for row makers in the reference documentation.
RowmakerDefs double as macro packages for the expansion of various macros. The standard macros will need to be quoted, the rowmaker macros above yield python expressions.
Within map and var bodies as well as late apply pars and apply bodies, you can refer to the grammar input as vars["name"] or, shorter @name.
To add output keys, use map or, in apply bodies, add keys to the result dictionary.
May occur in Element data, Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro dlMetaURI, Macro docField, Macro fullPath, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lastSourceElements, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro qName, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro rowsMade, Macro rowsProcessed, Macro schema, Macro sourceCDate, Macro sourceDate, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPubDID, Macro test, Macro today, Macro upper, Macro urlquote
A script, i.e., some executable item within a resource descriptor.
The content of scripts is given by their type -- usually, they are either python scripts or SQL with special rules for breaking the script into individual statements (which are basically like python's).
The special language AC_SQL is like SQL, but execution errors are ignored. This is not what you want for most data RDs (it's intended for housekeeping scripts).
See Scripting.
May occur in Element make, Element table, Element outputTable, Element resource.
A service definition.
A service is a combination of a core and one or more renderers. They can be published, and they carry the metadata published into the VO.
You can set the defaultSort property on the service to a name of an output column to preselect a sort order. Note again that this will slow down responses for all but the smallest tables unless there is an index on the corresponding column.
Properties evaluated:
May occur in Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro tablesForTAP, Macro test, Macro today, Macro upper, Macro urlquote
Prescriptions for setting up a namespace for a procedure application.
You can add names to this namespace you using par(ameter)s. If a parameter has no default and an procedure application does not provide them, an error is raised.
You can also add names by providing a code attribute containing a python function body in code. Within, the parameters are available. The procedure application's parent can be accessed as parent. All names you define in the code are available as globals to the procedure body.
Caution: Macros are expanded within the code; this means you need double backslashes if you want a single backslash in python code.
May occur in Element phraseMaker, Element apply, Element job, Element processEarly, Element processLate, Element procDef, Element regTest, Element rowfilter, Element sourceFields, Element iterator, Element pargetter, Element makeQuery, Element dataFormatter, Element dataFunction, Element descriptorGenerator, Element metaMaker, Element coreProc.
A Specification of a data descriptor's inputs.
This will typically be files taken from a file system. If so, DaCHS will, in each directory, process the files in alphabetical order. No guarantees are made as to the sequence directories are processed in.
Multiple patterns are processed in the order given in the RD.
May occur in Element data.
A definition of a space-time coordinate system using STC-S.
May occur in Element table, Element outputTable.
A definition of a table, both on-disk and internal.
Some attributes are ignored for in-memory tables, e.g., roles or adql.
Properties for tables:
If you give multiple data model names or URIs, the sequences of names and URIs must be identical (in particular, each name needs a URI). But, really, both of these are on the way out.
Somewhat inconsistently, to set a table's utype if you have to, set its utype meta.
Tables within a schema can have a rank, with lower ranks displayed first in clients that support that. So set that rank, put a positive number into the table-rank meta (since version 2.9.3).
May occur in Element data, Element resource.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro curtable, Macro decapitalize, Macro getConfig, Macro getParam, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro nameForUCD, Macro nameForUCDs, Macro qName, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro tablename, Macro test, Macro today, Macro upper, Macro urlquote
Information on where and how to update a piece of coverage information.
May occur in Element coverage.
A source document for a regression test.
As string URLs, they specify where to get data from, but the additionally let you specify uploads, authentication, headers and http methods, while at the same time saving you manual escaping of parameters.
The bodies is the path to run the test against. This is interpreted as relative to the RD if there's no leading slash, relative to the server if there's a leading slash, and absolute if there's a scheme.
The attributes are translated to parameters, except for a few pre-defined names. If you actually need those as URL parameters, should at us and we'll provide some way of escaping these.
We don't actually parse the URLs coming in here. GET parameters are appended with a & if there's a ? in the existing URL, with a ? if not. Again, shout if this is too dumb for you (but urlparse really isn't all that robust either...)
May occur in Element regTest.
Information on a column's values, in particular its domain.
This is quite like the values element in a VOTable. In particular, to accommodate VOTable usage, we require nullLiteral to be a valid literal for the parent's type.
Note that DaCHS does not validate for constraints from values on table import. This is mainly because before dachs limits has run, values may not represent the new dataset in semiautomatic values.
With HTTP parameters, values validation does take place (but again, that's mostly not too helpful because there are query languages sitting in between most of the time).
Hence, the main utility of values is metadata declaration, both in the form renderer (where they become placeholders) and in datalink (where they are communicated as VOTable values).
May occur in Element param, Element inputKey, Element column, Element outputField.
A definition of a rowmaker variable.
It consists of a name and a python expression, including function calls. The variables are entered into the input row coming from the grammar.
var elements are evaluated before apply elements, in the sequence they are in the RD. You can refer to keys defined by vars already evaluated in the usual @key manner.
May occur in Element rowmaker.
The following tags are "active", which means that they do not directly contribute to the RD parsed. Instead they define, replay, or edit streams of elements.
Enter a debugger when parsing to here.
This is probably only interesting for DaCHS developers.
An active tag that takes an event stream and replays the events, possibly filling variables.
This element supports arbitrary attributes with unicode values. These values are available as macros for replayed values.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
A ReplayedEventStream that does not expand active tag macros.
You only want this when embedding a stream into another stream that could want to expand the embedded macros.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
An active tag that replays a feed several times, each time with different values.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
An event stream that records events, not expanding active tags.
Normal event streams expand embedded active tags in place. This is frequently what you want, but it means that you cannot, e.g., fill in loop variables through stream macros.
With non-expanded streams, you can do that:
<NXSTREAM id="cols"> <LOOP listItems="\stuff"> <events> <column name="\\item"/> </events> </LOOP> </NXSTREAM> <table id="foo"> <FEED source="cols" stuff="x y"/> </table>
Note that the normal innermost-only rule for macro expansions within active tags does not apply for NXSTREAMS. Macros expanded by a replayed NXSTREAM will be re-expanded by the next active tag that sees them (this is allow embedded active tags to use macros; you need to double-escape macros for them, of course).
An active tag that records events as they come in.
Their only direct effect is to leave a trace in the parser's id map. The resulting event stream can be played back later.
The following elements are all grammar related. All grammar elements can occur in data descriptors.
A grammar that builds rowdicts from binary data.
The grammar expects the input to be in fixed-length records. the actual specification of the fields is done via a binaryRecordDef element.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A definition of a binary record.
A binary records consists of a number of binary fields, each of which is defined by a name and a format code. The format codes supported here are a subset of what python's struct module supports. The widths given below are for big, little, and packed binfmts. For native (which is the default), it depends on your platform.
The content of this element gives the record structure in the format <name>(<code>){<whitespace><name>(<code>)} where <name> is a c-style identifier.
May occur in Element binaryGrammar.
A grammar that returns the header dictionary of a CDF file (global attributes).
This grammar yields a single dictionary per file, which corresponds to the global attributes. The values in this dictionary may have complex structure; in particular, sequences are returned as lists.
To use this grammar, additional software is required that (by 2014) is not packaged for Debian. See https://pythonhosted.org/SpacePy/install.html for installation instructions. Note that you must install the CDF library itself as described further down on that page; the default installation instructions do not install the library in a public place, so if you use these, you'll have to set CDF_LIB to the right value, too, before running dachs imp.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that builds rowdicts out of character index ranges.
This works by using the colRanges attribute like <col key="mag">12-16</col>, which will take the characters 12 through 16 inclusive from each input line to build the input column mag.
As a shortcut, you can also use the colDefs attribute; it contains a string of the form {<key>:<range>}, i.e., a whitespace-separated list of colon-separated items of key and range as accepted by cols, e.g.:
<colDefs> a: 3-4 _u: 7 </colDefs>
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar for web inputs.
The source tokens for context grammars are dictionaries; these are either typed dictionaries from gavo.formal, where the values usually are atomic, or, preferably, the dictionaries of lists from request.strargs.
ContextGrammars never yield rows, so they're probably fairly useless in normal circumstances.
In normal usage, they just yield a single parameter row, corresponding to the source dictionary possibly completed with defaults, where non-requried input keys get None defaults where not given. Missing required parameters yield errors.
This parameter row honors the multiplicity specification, i.e., single or forced-single are just values, multiple are lists. The content are parsed values (using the InputKeys' parsers).
Since most VO protocols require case-insensitive matching of parameter names, matching of input key names and the keys of the input dictionary is attempted first literally, then disregarding case.
Since ContextGrammars can be parents of inputKeys and thus are a bit table-like, they can also carry metadata.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that uses python's csv module to parse files.
Note that these grammars by default interpret the first line of the input file as the column names. When your files don't follow that convention, you must give names (as in names='raj2000, dej2000, magV'), or you'll lose the first line and have silly column names.
If data is left after filling the defined keys, it is available under the NOTASSIGNED key.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A Grammar with a user-defined row iterator taken from a module.
See the Writing Custom Grammars (in the reference manual) for details.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that "parses" from lists of dicts.
Actually, it will just return the dicts as they are passed. This is mostly useful internally, though it might come in handy in custom code.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A user-defined external grammar.
See the booster.html on user-defined code on more on direct grammars.
You will almost always use these in connection with C code generated by dachs mkboost.
Properties:
A Grammar defined by a code application.
To define this grammar, write a ProcApp iterator leading to code yielding row dictionaries. The grammar input is available as self.sourceToken; for normal grammars within data elements, that would be a fully qualified file name.
Grammars can also return one "parameter" dictionary per source (the input to a make's parmaker). In an embedded grammar, you can define a pargetter to do that. It works like the iterator, except that it returns a single dictionary rather than yielding several of them.
This could look like this, when the grammar input is some iterable:
<embeddedGrammar> <iterator> <setup> <code> testData = "a"*1024 </code> </setup> <code> for i in self.sourceToken: yield {'index': i, 'data': testData} </code> </iterator> </embeddedGrammar>
If you need to raise an error from within an embeddedGrammar, use a SourceParseError, somewhat like this:
raise base.SourceParseError( "Bad line", offending=inputLine, location=str(lineNumber+1), source=inputFile.name)
To furnish other exceptions with information on the location they occurred at (rather than something like "unknown position -- locator missing"), since DaCHS 2.9.3 you can set self.location. As the error reporting code includes the sourceToken itself, there is generally no need to include it in the location. Make it a string, though.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that returns FITS-headers as dictionaries.
This is the grammar you want when one FITS file corresponds to one row in the destination table.
The keywords of the grammar record are the cards in the primary header (or some other hdu using the same-named attribute). "-" in keywords is replaced with an underscore for easier @-referencing. You can use a mapKeys element to effect further name cosmetics.
This grammar should handle compressed FITS images transparently if set qnd="False". This means that you will essentially get the headers from the second extension for those even if you left hdu="0".
The original header is preserved as the value of the header_ key. This is mainly intended for use WCS use, as in wcs.WCS(@header_).
If you have more complex structures in your FITS files, you can get access to the pyfits HDU using the hdusField attribute. With hdusField="_H", you could say things like @_H[1].data[10][0] to get the first data item in the tenth row in the second HDU.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar parsing from FITS tables.
fitsTableGrammar result in typed records, i.e., values normally come in the types they are supposed to have. Of course, that won't work for datetimes, STC-S regions, and the like.
The keys of the result dictionaries are simpily the names given in the FITS.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar allowing "free" regular expressions to parse a document.
Basically, you give a rowProduction to match individual records in the document. All matches of rowProduction will then be matched with parseRE, which in turn must have named groups. The dictionary from named groups to their matches makes up the input row.
For writing the parseRE, we recommend writing an element, using a CDATA construct, and taking advantage of python's "verbose" regular expressions. Here's an example:
<parseRE><![CDATA[(?xsm)^name::(?P<name>.*) ^query::(?P<query>.*) ^description::(?P<description>.*)\.\. ]]></parseRE>
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A declaration of what grammar to use within a UnionGrammar.
Each handler has a (full, python) regular expression defining what file names the grammar is responsible in the filePattern attribute; note that the pattern is matched against the full file name using search so you can match path parts, but you must take care not to overmatch. The other child is a normal DaCHS grammar.
May occur in Element unionGrammar.
a grammar for parsing single tables from HDF5 files.
These result in typed records, i.e., values normally come in the types they are supposed to have. The keys in the rows are the column names as given in the HDF file.
Regrettably, there are about as many conventions to serialise tables in HDF5 as there are programmes writing HDF5. This grammar supports a few styles; ask to have more included.
Styles currently implemented:
astropy: | The table comes as a record array. The grammar is aware of the astropy convention of using adding mask columns as name+".mask" and will turn masked values to Nones. |
---|---|
vaex: | The table comes as a group with the columns as individual arrays in the group member's data dataset. Put the parent of the columns group into the dataset attribute here. |
This class is not intended for ingesting large HDF5 files, as it will only process a few thousand rows per second on usual hardware. Use Element directgrammar for large files.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A definition of an iterator of a grammar.
The code defined here becomes the _iterRows method of a grammar.common.RowIterator class. This means that you can access self.grammar (the parent grammar; you can use this to transmit properties from the RD to your function) and self.sourceToken (whatever gets passed to parse()).
May occur in Element embeddedGrammar.
A grammar to parse key-value pairs from files.
The default assumes one pair per line, with # comments and = as separating character.
yieldPairs makes the grammar return an empty docdict and {"key":, "value":} rowdicts.
Whitespace around key and value is ignored.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A generator of ODBC queries.
This is is mainly useful when doing ODBC queries to incrementally havest some external resource.
The current ODBC iterator will be available as self.
The procedures also see a function escapeSQL(val) that returns val as a SQL literal (actually, it's psycopg2's adapt at the moment).
This is intended to be used somewhat like this with a monotonously increasing column insertion_time:
<makeQuery> <code> # find to until when we have data locally try: with base.getTableConn() as conn: localMax = next(conn.query( "SELECT MAX(insertion_time) FROM \schema.main"))[0] fragment = " WHERE insertion_time>{}".format( sqlEscape(localMax)) except base.DBError as msg: base.ui.notifyWarning(f"{msg} while harvesting: full re-harvest") fragment = "" return f"SELECT * FROM remote_table{fragment}" </code> </makeQuery>
May occur in Element odbcGrammar.
Mapping of names, specified in long or short forms.
mapKeys is necessary in grammars like keyValueGrammar or fitsProdGrammar. In these, the source files themselves give key names. Within the GAVO DC, keys are required to be valid python identifiers (i.e., match [A-Za-z_][A-Za-z_0-9]*). If keys coming in do not have this form, mapping can force proper names.
mapKeys could also be used to make incoming names more suitable for matching with shell patterns (like in rowmaker idmaps).
May occur in Element cdfHeaderGrammar, Element csvGrammar, Element directGrammar, Element fitsProdGrammar, Element keyValueGrammar, Element pdsGrammar.
A grammar pulling information from MySQL dump files.
WARNING: This is a quick hack. If you want/need it, please contact the authors.
At this point this is nothing but an ugly RE mess with lots of assumptions about the dump file that's easily fooled. Also, the entire dump file will be pulled into memory.
Since grammar semantics cannot do anything else, this will always only iterate over a single table. This currently is fixed to the first, but it's conceivable to make that selectable.
Database NULLs are already translated into Nones.
In other words: It might do for simple cases. If you have something else, improve this or complain to the authors.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that never returns any rows.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that feeds from a remote database.
This works as a sort of poor man's foreign data wrapper: you pull data from a remote database now and then, mogrifying it into whatever format you want locally.
This expects files containing pyodbc connection strings as sources, so you'll normally just have one source. Having the credentials externally helps keeping RDs using this safe for public version control.
An example for an ODBC connection string:
DRIVER={SQL Server};SERVER=localhost;DATABASE=testdb;UID=me;PWD=pass
See also http://www.connectionstrings.com/
This will only work if pyodbc (debian: python3-pyodbc) is installed. Additionally, you will have to install the odbc driver corresponding to your source database (e.g., odbc-postgresql).
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A definition of the parameter getter of an embedded grammar.
The code defined here becomes the getParameters method of the generated row iterator. This means that the dictionary returned here becomes the input to a parmaker.
If you don't define it, the parameter dict will be empty.
Like the iterators, pargetters see the current source token as self.sourceToken, and the grammar as self.grammar.
It is guaranteed that the pargetter is called exactly once before the iterator runs. Hence, if you have expensive initialisation to do, do it in the pargetter and pass the result to the iterator, typically somehow in the sourceToken.
May occur in Element embeddedGrammar.
A grammar that returns labels of PDS documents as rowdicts.
PDS is the file format of the Planetary Data System; the labels are quite like, but not quite like FITS headers.
Extra care needs to be taken here since the values in the rawdicts can be complex objects (e.g., other labels). It's likely that you will need constructs like @IMAGE["KEY"].
Current versions of PyPDS also don't parse the values. This is particularly insiduous because general strings are marked with " in PDS. When mapping those, you'll probably want a @KEY.strip('"').
You'll need PyPDS to use this; there's no Debian package for that yet, so you'll have to do a source install from git://github.com/RyanBalfanz/PyPDS.git
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar that builds rowdicts from records and fields specified via REs separating them.
There is also a simple facility for "cleaning up" records. This can be used to remove standard shell-like comments; use recordCleaner="(?:#.*)?(.*)".
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A generator for rows coming from a grammar.
Rowfilters receive rows (i.e., dictionaries) as yielded by a grammar under the name row. Additionally, the embedding row iterator is available under the name rowIter.
Macros are expanded within the embedding grammar.
The procedure definition must result in a generator, i.e., there must be at least one yield; in general, this will typically be a yield row, but a rowfilter may swallow or create as many rows as desired.
If you forget to have a yield in the rowfilter source, you'll get a "NoneType is not iterable" error that's a bit hard to understand.
Here, you can only access whatever comes from the grammar. You can access grammar keys in late parameters as row[key] or, if key is like an identifier, as @key.
May occur in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar.
A grammar handling sequences of tuples.
To add semantics to the field, it must know the "schema" of the data. This is defined via the table it is supposed to get the input from.
This grammar probably is only useful for internal purposes.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A procedure application that returns a dictionary added to all incoming rows.
Use this to programmatically provide information that can be computed once but that is then added to all rows coming from a single source, usually a file. This could be useful to add information on the source of a record or the like.
The code must return a dictionary. The source that is about to be parsed is passed in as sourceToken. When parsing from files, this simply is the file name. The data the rows will be delivered to is available as "data", which is useful for adding or retrieving meta information.
May occur in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar.
A grammar that returns its sourceToken as the row iterator.
This only makes sense in extreme situations and never without custom code. If you're not sure you need this, you don't want to know about it.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar using one of a sequence of grammars to parse its sources.
(since version 2.7.2)
Use this if you have differing input formats eventually processible by the same row maker (of course, you can make the row maker flexible enough to cope with different grammar outputs). To do that, use two or more handles definitions, each giving a regular expression against the full file name (but matched with re.search) and a grammar to use for such files.
Handles definitions will be tried in sequence; you can hence have special cases early and catch-alls later.
The basic idea is that you write something like:
<unionGrammar> <handles pattern=".*\.txt$"> <reGrammar...> </handles> <handles pattern=".*\.csv$"> <csvGrammar...> </handles> </unionGrammar>
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar parsing from VOTables.
Currently, the PARAM fields are ignored, only the data rows are returned.
voTableGrammars result in typed records, i.e., values normally come in the types they are supposed to have.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
A grammar parsing from generic XML files.
Use this grammar to parse from generic XML files. For now, one rawdict per document is returned (later extensions might let you define elements that will yield rows).
The keys are xpaths (e.g., root/element or root/element/@attr), the values the (joined) text nodes that are immediate children or the element.
When elements are repeated within an element, [ct] is appended to the path element (e.g., root/element([0]).
For now, this grammar ignores namespaces.
Because most of the keys are not valid python identifiers, you cannot use the @key syntax when mapping this. Use vars[key] instead (or <map key="dest" source="path"/>).
Do not use this for VOTables; use VOTableGrammar instead.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro colNames, Macro decapitalize, Macro dlMetaURI, Macro fullDLURL, Macro getConfig, Macro inputRelativePath, Macro inputSize, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro property, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro rootlessPath, Macro schema, Macro sourceDate, Macro splitPreviewPath, Macro sql_standardPubDID, Macro sqlquote, Macro srcstem, Macro standardPreviewPath, Macro test, Macro today, Macro upper, Macro urlquote
The following elements are related to cores. All cores can only occur toplevel, i.e. as direct children of resource descriptors. Cores are only useful with an id to make them referencable from services using that core.
A core taking an ADQL query from its query argument and returning the result of that query in a standard table.
Since the columns returned depend on the query, the outputTable of an ADQL core must not be defined.
A core retrieving biblinks-harvest records from dc.biblinks.
Probably is only place in which this makes sense is in //biblinks#links; consider making this a pythonCore there.
(since 2.8.2)
A definition of a pythonCore's functionality.
This is a procApp complete with setup and code; you could inherit between these.
coreProcs see the embedding service, the input table passed, and the query metadata as service, inputTable, and queryMeta, respectively.
The core itself is available as self.
May occur in Element pythonCore.
A wrapper around a core defined in a module.
This core lets you write your own cores in modules.
The module must define a class Core. When the custom core is encountered, this class will be instantiated and will be used instead of the CustomCore, so your code should probably inherit core.Core.
See Writing Custom Cores for details.
A procedure application that renders data in a processed service.
These play the role of the renderer, which for datalink is usually trivial. They are supposed to take descriptor.data and return a pair of (mime-type, bytes), which is understood by most renderers.
When no dataFormatter is given for a core, it will return descriptor.data directly. This can work with the datalink renderer itself if descriptor.data will work as a nevow resource (i.e., has a renderHTTP method, as our usual products do). Consider, though, that renderHTTP runs in the main event loop and thus most not block for extended periods of time.
May occur in Element datalinkCore.
A procedure application that generates or modifies data in a processed data service.
All these operate on the data attribute of the product descriptor. The first data function plays a special role: It must set the data attribute (or raise some appropriate exception), or a server error will be returned to the client.
What is returned depends on the service, but typically it's going to be a table or products.*Product instance.
Data functions can shortcut if it's evident that further data functions can only mess up (i.e., if the do something bad with the data attribute); you should not shortcut if you just think it makes no sense to further process your output.
To shortcut, raise either of FormatNow (falls though to the formatter, which is usually less useful) or DeliverNow (directly returns the data attribute; this can be used to return arbitrary chunks of data).
May occur in Element datalinkCore.
A core for processing datalink and processed data requests.
The input table of this core is dynamically generated from its metaMakers; it makes no sense at all to try and override it.
See Datalink and SODA for more information.
In contrast to "normal" cores, one of these is made (and destroyed) for each datalink request coming in. This is because the interface of a datalink service depends on the request's value(s) of ID.
The datalink core can produce both its own metadata and data generated. It is the renderer's job to tell them apart.
A core performing database queries on one table or view.
DBCores ask the service for the desired output schema and adapt their output. The DBCore's output table, on the other hand, lists all fields available from the queried table.
a core that returns its arguments stringified in a table.
You need to provide an external input tables for these.
A procedure application for making product descriptors for PUBDIDs
Despite the name, a descriptor generator has to return (not yield) a descriptor instance. While this could be anything, it is recommended to derive custom classes from prodocols.datalink.ProductDescrpitor, which exposes essentially the columns from DaCHS' product table as attributes. This is what you get when you don't define a descriptor generator in your datalink core.
Before writing your own, see if one of the predefined descriptor generators work for you; see Descriptor Generators below.
The following names are available to the code:
- pubDID -- the pubDID to be resolved
- args -- all the arguments that came in from the web (these should not usually be necessary for making the descriptor and are completely unparsed at this point)
- FITSProductDescriptor -- the base class of FITS product descriptors
- DLFITSProductDescriptor -- the same, just for when the product table has a datalink.
- ProductDescriptor -- a base class for your own custom descriptors
- DatalinkFault -- use this when flagging failures
- soda -- contents of the soda module for convenience
If you made your pubDID using the getStandardPubDID rowmaker function, and you need no additional logic within the descriptor, the default (//soda#fromStandardPubDID) should do.
If you need to derive custom descriptor classes, you can see the base class under the name ProductDescriptor; there's also FITSProductDescriptor and DatalinkFault in each proc's namespace. If your Descriptor does not actually refer to something in the product table, it is likely that you want to set the descriptor's suppressAutoLinks attribute to True. This will stop DaCHS from attempting to add automatic #this and #preview links.
May occur in Element datalinkCore.
A core executing a pre-specified query with fancy conditions.
Unless you select *, you must define the outputTable here; Weird things will happen if you don't.
The queriedTable attribute is mandatory but only used for name resolution (e.g., in outputTable). Instead, define your FROM in the query attribute (and use a %s where the condDescs-generated WHERE clause should end up)
A core executing a predefined query.
This usually is not what you want, unless you want to expose the current results of a specific query, e.g., for log or event data.
an input for a core.
These aren't actually proper tables but actually just collection of the param-like inputKeys. They serve as input declarations for cores and services (where services derive their inputTDs from the cores' ones by adapting them to the current renderer. Their main use is for the derivation of contextGrammars.
They can carry metadata, though, which is sometimes convenient when transporting information from the parameter parsers to the core.
For the typical dbCores (and friends), these are essentially never explicitly defined but rather derived from condDescs.
Do not read input values by using table.getParam. This will only give you one value when a parameter has been given multiple times. Instead, use the output of the contextGrammar (inputParams in condDescs). Only there you will have the correct multiplicities.
May occur in Element adqlCore, Element biblinksCore, Element customCore, Element datalinkCore, Element dbCore, Element debugCore, Element fancyQueryCore, Element fixedQueryCore, Element nullCore, Element productCore, Element pythonCore, Element registryCore, Element scsCore, Element siapCutoutCore, Element ssapCore, Element tapCore, Element uploadCore.
Macros predefined here: Macro RSTservicelink, Macro RSTtable, Macro decapitalize, Macro getConfig, Macro internallink, Macro lower, Macro magicEmpty, Macro metaSeq, Macro metaString, Macro quote, Macro rdId, Macro rdIdDotted, Macro reSub, Macro resdir, Macro schema, Macro sql_standardPubDID, Macro sqlquote, Macro test, Macro today, Macro upper, Macro urlquote
A procedure application that generates metadata for datalink services.
The code must be generators (i.e., use yield statements) producing either svcs.InputKeys or protocols.datalink.LinkDef instances.
metaMaker see the data descriptor of the input data under the name descriptor.
The data attribute of the descriptor is always None for metaMakers, so you cannot use anything given there.
Within MetaMakers' code, you can access InputKey, Values, Option, and LinkDef without qualification, and there's the MS function to build structures. Hence, a metaMaker returning an InputKey could look like this:
<metaMaker> <code> yield MS(InputKey, name="format", type="text", description="Output format desired", values=MS(Values, options=[MS(Option, content_=descriptor.mime), MS(Option, content_="text/plain")])) </code> </metaMaker>
(of course, you should give more metadata -- ucds, better description, etc) in production).
Note that InputKey-returning MetaMakers cannot rely on descriptor.pubDID to actually have a value; in particular SSA may construct cores to produce "direct" processing datalink descriptors. When you need a pubDID, just return if descriptor.pubDID is None.
Alternatively, yield link definitions, i.e., rows for the links response. You will usually define a semantics attribute for the meta maker then (this lets DaCHS use the right semantics in case something goes wrong), and you will usually create links from the descriptor so the pubDID will be right automatically:
<metaMaker semantics="#flat"> <code> yield descriptor.makeLink( "http://example.org/flats/master-flat.fits", contentType="application/fits", description="The master flat for this epoch", contentLength=2048x2048*2) </code> </metaMaker>
It's ok to yield None; this will suppress a Datalink and is convenient when some component further down figures out that a link doesn't exist (e.g., because a file isn't there). Note that in many cases, it's more helpful to client components to handle such situations by yielding a DatalinkFault.NotFoundFault.
In addition to the usual names available to ProcApps, meta makers have:
May occur in Element datalinkCore.
A core always returning None.
This core will not work with the common renderers. It is really intended to go with coreless services (i.e. those in which the renderer computes everything itself and never calls service.runX). As an example, the external renderer could go with this.
A core retrieving paths and/or data from the product table.
You will not usually mention this core in your RDs. It is mainly used internally to serve /getproduct queries.
It is instantiated from within //products.rd and relies on tables within that RD.
The input data consists of accref; you can use the string form of RAccrefs, and if you renderer wants, it can pass in ready-made RAccrefs. You can pass accrefs in through both an accref param and table rows.
The accref param is the normal way if you just want to retrieve a single image, the table case is for building tar files and such. There is one core instance in //products for each case.
The core returns a list of instances of a subclass of ProductBase above.
This core and its supporting machinery handles all the fancy product functionality (user authorization, cutouts, ...).
A core doing computation using a piece of python.
See Python Cores instead of Custom Cores in the reference.
is a core processing OAI requests.
Its signature requires a single input key containing the complete args from the incoming request. This is necessary to satisfy the requirement of raising errors on duplicate arguments.
It returns an ElementTree.
This core is intended to work the the RegistryRenderer.
A core performing cone searches.
This will, if it finds input parameters it can make out a position from, add a _r column giving the distance between the match center and the columns that a cone search will match against.
If any of the conditions for adding _r aren't met, this will silently degrade to a plain DBCore.
You will almost certainly want a:
<FEED source="//scs#coreDescs"/>
in the body of this (in addition to whatever other custom conditions you may have).
A core doing SIAP plus cutouts.
It has, by default, an additional column specifying the desired size of the image to be retrieved. Based on this, the cutout core will tweak its output table such that references to cutout images will be retrieved.
The actual process of cutting out is performed by the product core and renderer.
A core doing SSAP queries.
This core knows about metadata queries, version negotiation, and dispatches on REQUEST. Thus, it may return formatted XML data under certain circumstances.
Interpreted Properties:
A core for the TAP renderer.
A core handling uploads of files to the database.
It allows users to upload individual files into a special staging area (taken from the stagingDir property of the destination data descriptor) and causes these files to be parsed using destDD. Note that destDD must have updating="True" for this to work properly (it will otherwise drop the table on each upload). If uploads are the only way updates into the table occur, source management is not necessary for these, though.
You can tell UploadCores to either insert or update the incoming data using the "mode" input key.
Macro calls in DaCHS start with a backslash, arguments are given in curly braces. What macros are available depends on the element doing the expansion; regrettably, not all strings are expanded, and at this point it's not usually documented which are and which are not (though we hope DaCHS typically behaves "as expected"). If this bites you, complain to the authors and we promise we'll give fixing this a higher priority.
\RSTcc0{stuffDesignation}
expands to a declaration that stuffDesignation is available under CC-0.
This only works in reStructured text (though it's still almost readable as source).
You'll probably want to use the //procs#license-cc0 stream instead of this, as that also sets the rights URI.
Available in Element resource
\RSTccby{stuffDesignation}
expands to a declaration that stuffDesignation is available under CC-BY.
This only works in reStructured text (though it's still almost readable as source).
You'll probably want to use the //procs#license-cc-by stream instead of this, as that also sets the rights URI.
Available in Element resource
\RSTccbysa{stuffDesignation}
expands to a declaration that stuffDesignation is available under CC-BY-SA.
This only works in reStructured text (though it's still almost readable as source).
You'll probably want to use the //procs#license-cc-by-sa stream instead of this, as that also sets the rights URI.
Available in Element resource
\RSTservicelink{serviceId}{title=None}
a link to an internal service; id is <rdId>/<serviceId>/<renderer>, title, if given, is the anchor text.
The result is a link in the short form for restructured test.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\RSTtable{tableName}
adds an reStructured test link to a tableName pointing to its table info.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\colNames
returns an SQL-ready list of column names of this table.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\curtable
returns the qualified name of the current table.
(this is identical to the macro qName, which you should prefer in new RDs.)
Available in Element outputTable, Element table
\decapitalize{aString}
returns aString with the first character lowercased.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\dlMetaURI{dlId}
returns a link to the datalink document for the current product.
This assumes you're assigning standard pubDIDs (see also standardPubDID, which is used by this).
dlId is the XML id of the datalink service, which is supposed to be in the sameRD as the rowmaker.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\docField{name}
returns an expression giving the value of the column name in the document row.
Available in Element rowmaker
\fullDLURL{dlService}
returns a python expression giving a link to the full current data set retrieved through the datalink service.
You would write \fullDLURL{dlsvc} here, and the macro will expand into something like http://yourserver/currd/dlsvc/dlget?ID=ivo://whatever.
dlService is the id of the datalink service in the current RD.
This is intended for "virtual" data where the dataset is generated on the fly through datalink.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\fullPath
returns an expression expanding to the full path of the current input file.
Available in Element rowmaker
\getConfig{section}{name=None}
the current value of configuration item {section}{name}.
You can also only give one argument to access settings from the general section.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\getParam{parName}{default=''}
returns the string representation of the parameter parName.
This is the parameter as given in the table definition. Any changes to an instance are not reflected here.
If the parameter named does not exist, an empty string is returned. NULLs/Nones are rendered as NULL; this is mainly a convenience for obscore-like applications and should not be exploited otherwise, since it's ugly and might change at some point.
If a default is given, it will be returned for both NULL and non-existing params.
Available in Element outputTable, Element table
\inputRelativePath{liberalChars='True'}
returns an expression giving the current source's path relative to inputsDir
liberalChars can be a boolean literal (True, False, etc); if false, a value error is raised if characters that will result in trouble with the product mixin are within the result path.
In rowmakers fed by grammars with //products#define, better use @prodtblAccref.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\inputSize
returns an expression giving the size of the current source.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\internallink{relPath}
an absolute URL from a path relative to the DC root.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\lastSourceElements{numElements}
returns an expression calling rmkfuncs.lastSourceElements on the current input path.
Available in Element rowmaker
\lower{aString}
returns aString lowercased.
There's no guarantees for characters outside ASCII.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\magicEmpty{val}
returns __EMPTY__ if val is empty.
This is necessary when feeding possibly empty params from mixin parameters (don't worry if you don't understand this).
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\metaSeq{metaKey}{default=''}{joiner='}{'}
returns all values of metaKey on the current macro expander joined by joiner.
This will be an empty string if there is no corresponding metadata (or default, if passed).
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\metaString{metaKey}{default=None}
the value of metaKey on the macro expander.
This will raise an error when the meta Key is not available unless you give a default. It will also raise an error if metaKey is not atomic (i.e., single-valued). Use metaSeq for meta items that may have multiple values.
Because it's sometimes useful, if the expander itself doesn't have metadat, this goes up in the RD tree until it finds something that has metadata.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\nameForUCD{ucd}
returns the (unique!) name of the field having ucd in this table.
If there is no or more than one field with the ucd in this table, we raise a ValueError.
Available in Element outputTable, Element table
\nameForUCDs{ucds}
returns the (unique!) name of the field having one of ucds in this table.
Ucds is a selection of ucds separated by vertical bars (|). The rules for when this raises errors are so crazy you don't want to think about them. This really is only intended for cases where "old" and "new" standards are to be supported, like with pos.eq.*;meta.main and POS_EQ_*_MAIN.
If there is no or more than one field with the ucd in this table, we raise an exception.
Available in Element outputTable, Element table
\property{propName}
returns an expression giving the value of the property propName on the current DD.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\qName
returns the qName of the table we are currently parsing into.
Available in Element outputTable, Element rowmaker, Element table
\quote{arg}
returns the argument in quotes (with internal quotes backslash-escaped if necessary).
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\rdId
the identifier of the current resource descriptor.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\rdIdDotted
the identifier for the current resource descriptor with slashes replaced with dots (so they work as the "host part" in URIs.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\reSub{pattern}{replacement}{string}
returns the string with the python RE pattern replaced with replacement.
This is directly handed through to python re.sub, so you can (but probably shouldn't) play all the RE tricks you can in python (e.g., back references).
If you find yourself having to use reSub, you should regard that as an alarm sign that you're probably doing it wrong.
Oh: closing curly braces can be included in the argument by backslash-escaping them.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\resdir
the input-relative resource directory of the current resource descriptor.
This never has a trailing slash.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\rootlessPath
returns an expression giving the current source's path with the resource descriptor's root removed.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\rowsMade
returns an expression giving the number of records already returned by this row maker.
This number excludes failed and skipped rows.
Available in Element rowmaker
\rowsProcessed
returns an expression giving the number of records already delivered by the grammar.
Available in Element rowmaker
\schema
the schema of the current resource descriptor.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\sourceCDate
returns an expression giving the timestamp for the create date of the current source.
Use dateTimeToJdn or dateTimeToMJD to turn this into JD or MJD (which is usually preferred in database tables). See also the sourceDate macro.
Available in Element rowmaker
\sourceDate
returns an expression giving the timestamp of the current source.
This is a timestamp of the modification date; use dateTimeToJdn or dateTimeToMJD to turn this into JD or MJD (which is usually preferred in database tables). See also the sourceCDate macro.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\splitPreviewPath{ext}
returns an expression for the split standard path for a custom preview.
As standardPreviewPath, except that the directory hierarchy of the data files will be reproduced in previews. For ext, you should typically pass the extension appropriate for the preview (like {.png} or {.jpeg}).
See the introduction to custom previews for details.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\sql_standardPubDID{fromCol='accref'}
returns a SQL expression returning a DaCHS standard pubDID generated from the accref (or something overridden) column.
This is convenient in obscore or ssa views when the underlying table just has accrefs. If your code actually uses the pubDID to search in the table (and it probably shouldn't), better use an actual column and index it.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\sqlquote{arg}
returns the argument as a quoted string, unless it is 'NULL' or None, in which case just NULL is returned.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\srcstem
returns python code for the stem of the source file currently parsed in a rowmaker.
Example: if you're currently parsing /tmp/foo.bar.gz, the stem is foo.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowmaker, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\standardPreviewPath
returns an expression for the standard path for a custom preview.
This consists of resdir, the name of the previewDir property on the embedding DD, and the flat name of the accref (which this macro assumes to see in its namespace as accref; this is usually the case in //products#define, which is where this macro would typically be used).
As an alternative, there is the splitPreviewPath macro, which does not mogrify the file name. In particular, do not use standardPreviewPath when you have more than a few 1e4 files, as it will have all these files in a single, flat directory, and that can become a chore.
See the introduction to custom previews for details.
Available in Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element keyValueGrammar, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element pdsGrammar, Element reGrammar, Element rowsetGrammar, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\standardPubDID
returns the "standard publisher DID" for the current product.
The publisher dataset identifier (PubDID) is important in protocols like SSAP and obscore. If you use this macro, the PubDID will be your authority, the path component ~, and the current value of @prodtblAccref. It thus will only work where products#define (or a replacement) is in action. If it isn't, a normal function call getStandardPubDID(\\inputRelativePath) would be an obvious alternative.
You can of course define your PubDIDs in a different way.
Available in Element rowmaker
\tablename
returns the unqualified name of the current table.
In most contexts, you will probably need to use the macro qName instead of this.
Available in Element outputTable, Element table
\tablesForTAP
returns a list of table names available for TAP querying.
This, really, is an implementation detail for the TAP service and might go away anytime.
Available in Element service
\test{*args}
always "test macro expansion".
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\today
today's date in ISO representation.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\upper{aString}
returns aString uppercased.
There's no guarantees for characters outside ASCII.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
\urlquote{string}
wraps urllib.quote.
Available in Element FEED, Element LFEED, Element LOOP, Element binaryGrammar, Element cdfHeaderGrammar, Element columnGrammar, Element contextGrammar, Element csvGrammar, Element customGrammar, Element dictlistGrammar, Element embeddedGrammar, Element fitsProdGrammar, Element fitsTableGrammar, Element freeREGrammar, Element hdf5Grammar, Element inputTable, Element keyValueGrammar, Element mixinDef, Element mySQLDumpGrammar, Element nullGrammar, Element odbcGrammar, Element outputTable, Element pdsGrammar, Element reGrammar, Element resRec, Element resource, Element rowmaker, Element rowsetGrammar, Element service, Element table, Element transparentGrammar, Element unionGrammar, Element voTableGrammar, Element xmlGrammar
Mixins ensure a certain functionality on a table. Typically, this is used to provide certain guaranteed fields to particular cores. For many mixins, there are predefined procedures (both rowmaker applys and grammar rowfilters) that should be used in grammars and/or rowmakers feeding the tables mixing in a given mixin.
Note that when a piece of metadata in a mixin gets in your way, you can selectively override attributes of columns and params by copying and changing them. For instance, if your mixin example#m" gives you a column ``flux, and you need to change its unit, you would say:
<table mixin="example#m"> <column original="flux" unit="mJy"/> </table>
Use this mixin if your epntap table is filled with local products (i.e., sources matches files on your hard disk that DaCHS should hand out itself). This will arrange for your products to be entered into the products table, and it will automatically compute file size, etc.
This wants a //products#define rowfilter in your grammar and a //epntap2#populate-localfile-2_0 apply in your rowmaker.
This mixin defines a table suitable for publication via the EPN-TAP protocol.
According to the standard definition, tables mixing this in should be called epn_core. The mixin already arranges for the table to be accessible by ADQL and be on disk.
This also causes the product table to be populated. This means that grammars feeding such tables need a //products#define row filter. At the very least, you need to say:
<rowfilter procDef="//products#define"> <bind name="table">"\schema.epn_core"</bind> </rowfilter>
If you absolutely cannot use //products#define, you will have to manually provide the prodtblFsize (file size in bytes), prodtblAccref (product URL), and prodtblPreview (thumbnail image or None) keys in what's coming from your grammar.
Use the //epntap2#populate-2_0 apply in rowmakers feeding tables mixing this in.
The mixin has a parameter optional_columns, which accepts space-separated columns from the following extensions:
detector_name: | Detector name |
---|---|
north_pole_position: | |
North pole position angle with respect to celestial north pole | |
obs_mode: | Observing mode |
opt_elem: | Optical element name |
orientation: | Position angle of an image y-axis, in direct sense from north |
platesc: | pixel angular size or platescale (on sky only) |
target_primary_hemisphere: | |
Primary observed hemisphere | |
target_secondary_hemisphere: | |
Secondary observed hemisphere |
access_estsize: | Estimated file size in kbyte. |
---|---|
access_format: | File format type (RFC 6838 Media Type a.k.a. MIME type) |
access_md5: | MD5 Hash for the file |
access_url: | URL of the data file, case sensitive. If present, then access_format and access_estsize are mandatory. |
acquisition_id: | ID of the data file/acquisition in the original archive |
albedo: | Target albedo |
alt_target_name: | |
Provides alternative target name if more common (e.g. comets); multiple identifiers can be separated by hashes | |
altitude_fromshape_max: | |
Max altitude of observed area above shape model / DTM | |
altitude_fromshape_min: | |
Min altitude of observed area above shape model / DTM | |
bib_reference: | Bibcode or DOI preferred if available, or other bibliographic identifier or URL |
campaign: | Name of the observational campaign |
datalink_url: | Link to a datalink document for this dataset |
dec: | Declination |
earth_distance_max: | |
Max Earth-target distance | |
earth_distance_min: | |
Min Earth-target distance | |
external_link: | Web page providing more details on this granule |
feature_name: | Secondary name (can be standard name of region of interest) |
file_name: | Name of the data file only, case sensitive |
filter: | Identifies filter in use, typically for images |
flux: | Target flux |
instrument_type: | |
type of instrument | |
internal_reference: | |
Related granule_uid(s) in the current service; hash-separated | |
local_time_max: | Max local time at observed region |
local_time_min: | Min local time at observed region |
magnitude: | Absolute magnitude. For small bodies, from HG magnitude system |
messenger: | Vector of measured signal, including electromagnetic band, from http://www.ivoa.net/rdf/messenger |
processing_level_desc: | |
Describes specificities of the processing level | |
proposal_id: | Proposal identifier |
proposal_pi: | Proposal principal investigator |
proposal_target_name: | |
target name as in proposal title | |
proposal_title: | Proposal title |
publisher: | A short string identifying the entity running the data service used |
ra: | Right ascension |
radial_distance_max: | |
Max distance from observed area to body center | |
radial_distance_min: | |
Min distance from observed area to body center | |
coverage: | (ST)MOC footprint, valid for celestial, spherical, or body-fixed frames + time coverage |
solar_longitude_max: | |
Max Solar longitude Ls (location on orbit / season) | |
solar_longitude_min: | |
Min Solar longitude Ls (location on orbit / season) | |
spatial_coordinate_description: | |
ID of specific coordinate system and version or properties. | |
spatial_origin: | Defines the frame origin |
species: | Identifies a chemical species, case sensitive |
subobserver_latitude_max: | |
Maximum sub-observer point latitude (sub-Earth for ground based observations) | |
subobserver_latitude_min: | |
Minimum sub-observer point latitude (sub-Earth for ground based observations) | |
subobserver_longitude_max: | |
Maximum sub-observer point longitude (sub-Earth for ground based observations) | |
subobserver_longitude_min: | |
Minimum sub-observer point longitude (sub-Earth for ground based observations) | |
subsolar_latitude_max: | |
Maximum sub-solar point latitude | |
subsolar_latitude_min: | |
Minimum sub-solar point latitude | |
subsolar_longitude_max: | |
Maximum sub-solar point longitude | |
subsolar_longitude_min: | |
Minimum sub-solar point longitude | |
sun_distance_max: | |
Max Sun-target distance | |
sun_distance_min: | |
Min Sun-target distance | |
target_apparent_radius: | |
Apparent radius of the target | |
target_description: | |
Original target keywords | |
target_distance_max: | |
Max observer-target distance | |
target_distance_min: | |
Min observer-target distance | |
target_region: | Type of region or feature of interest |
target_time_max: | |
Max observing time in target frame | |
target_time_min: | |
Min observing time in target frame | |
thumbnail_url: | URL of a thumbnail image with predefined size (png ~200 pix, for use in a client only) |
time_refposition: | |
Defines where the time is measured (e.g., ground vs. spacecraft). Defaults to the observer's frame | |
time_scale: | Defaults to UTC in data services; takes values from http://www.ivoa.net/rdf/time_scale otherwise |
observer_code: | Image observer's username in service |
---|---|
observer_country: | |
Image observer's country of residence | |
observer_id: | Image observer numeric identifier in service |
observer_institute: | |
Observer institute | |
observer_lat: | Observer's approximate latitude |
observer_location: | |
Broad, free-text location and geographic position of the observer or telescope. Can be used when the exact location cannot be released | |
observer_lon: | Observer's approximate longitude |
observer_name: | Observer name |
original_publisher: | |
Refers to the source of the data, e. g., in compilations of experimental data | |
producer_institute: | |
Data producer institute, e. g., in compilations of experimental data | |
producer_name: | Data producer name, especially in compilations of experimental data |
event_cite: | Following VOEvent, this is one of followup, supersedes, retraction |
---|---|
event_status: | One of prediction, observation, or utility |
event_type: | Type of event from a controlled vocabulary (meteor_shower, fireball, lunar_flash, comet_tail_crossing...) |
azimuth_max: | Max azimuth angle for illumination |
---|---|
azimuth_min: | Min azimuth angle for illumination |
data_calibration_desc: | |
Provides information on post-processing. Can be a hash list | |
geometry_type: | Type of observation, from a controlled vocabulary (cf. EPN-TAP specification). Can be a hash list |
grain_size_max: | Max sample particle size |
grain_size_min: | Min sample particle size |
measurement_atmosphere: | |
Describes experimental conditions. vacuum for measurements under vacuum | |
pressure: | Ambient pressure |
sample_classification: | |
Information related to class, sub-class, species… as hash list | |
sample_desc: | Describes the sample, its origin, and possible preparation. Can be a hash list |
sample_id: | Additional ID of the sample, e.g., a specific fraction of a meteorite (in addition to target_name). Intended to refer to a pre-existing catalogue of a collection, will therefore contain a name/id mainly for local use |
species_inchikey: | |
Machine-readable description of the species involved. Can be a hash list | |
setup_desc: | Describes the experimental setup. Can be a hash list |
spectrum_type: | Type of spectral observation, from a controlled vocabulary (under construction). Can be a hash list |
temperature: | Ambient temperature |
map_height: | Map size in px |
---|---|
map_projection: | The map projection, preferably as a FITS name or code, or parameters as a free string |
map_scale: | Format TBD |
map_width: | Map size in px |
pixelscale_max: | Max pixel size at a surface |
pixelscale_min: | Min pixel size at a surface |
arg_perihel: | Argument of Perihelion, J2000.0 |
---|---|
eccentricity: | Orbit eccentricity |
epoch: | Epoch of interest |
inclination: | Orbit inclination |
long_asc: | Longitude of ascending node, J2000.0 |
mean_anomaly: | Mean anomaly at the epoch |
semi_major_axis: | |
particle_spectral_range_max: | |
---|---|
Upper bound of the mass/energy of the particles | |
particle_spectral_range_min: | |
Lower bound of the mass/energy of the particles | |
particle_spectral_resolution_max: | |
Worst resolution of the spectral range | |
particle_spectral_resolution_min: | |
Best resolution of the spectral range | |
particle_spectral_sampling_step_max: | |
Maximal separation of different values in the spectral range | |
particle_spectral_sampling_step_min: | |
Minimal separation of different values in the spectral range | |
particle_spectral_type: | |
The type of axis in use; this is one of energy (which is then in eV), mass (amu) or mass/charge (in amu/e). If you use any of this, please contact the DaCHS authors; this needs more work |
diameter: | Target diameter, or equivalent diameter for binary objects |
---|---|
dynamical_class: | |
Class of small body, from a controlled vocabulary (see the EPN-TAP specification) | |
dynamical_type: | Subdivision of the dynamical class, from a controlled vocabulary |
equatorial_radius: | |
Equatorial radius of a solar system object. | |
mass: | Mass of object |
mean_radius: | Mean radius of a solar system object. |
polar_radius: | Polar radius of a solar system object. |
sidereal_rotation_period: | |
Object rotation rate | |
taxonomy_code: | Code for target taxonomy |
This mixin has the following parameters:
This mixin makes a table suitable for publication as a LineTAP table.
It provides all standard columns, makes sure it is on disk and available through TAP, adds the most common indexes, and gives it the utype required by the standard.
It is recommended to fill this table with a rowmaker using the //linetap#populate-0 apply.
Mix this into a table that contains radio data to have their metadata show up in the obscore extension for radio data. This only makes sense together with one of the obscore mixins, as the main metadata is kept in the obscore table.
Use the mixin parameters to map whatever is in your table to obs_radio' columns. As with obscore, parameter values must be SQL expressions evaluatable within the table mixed in. The default is to have all extension columns NULL.
This mixin has the following parameters:
Publish this table to ObsTAP.
This means mapping or giving quite a bit of data from the present table to ObsCore rows. Internally, this information is converted to an SQL select statement used within a create view statement. In consequence, you must give SQL expressions in the parameter values; just naked column names from your input table are ok, of course. Most parameters are set to NULL or appropriate defaults for tables mixing in //products#table.
Since the mixin generates script elements, it cannot be used in untrusted RDs. The fact that you can enter raw SQL also means you will get ugly error messages if you give invalid parameters.
Some items are filled from product interface fields automatically. You must change these if you obscore-publish tables not mixin in products.
Note: you must say dachs imp //obscore before anything obscore-related will work.
This mixin has the following parameters:
Publish a table that already has large parts of the obscore schema.
This has parameters called like the corresponding obscore columns that also default to taking their data columns named like them. Use this when you already have, by and large, and obscore structure in your source table.
Note that this will not do the right thing with product#table instances by default. For these, you will have to manually map access_url, access_format, and access_estsize.
This mixin has the following parameters:
Publish a PGS SIAP table to ObsTAP.
This works like //obscore#publish except some defaults apply that copy fields that work analogously in SIAP and in ObsTAP.
For special situations, you can, of course, override any of the parameters, but most of them should already be all right. To find out what the parameters described as "preset for SIAP" mean, refer to //obscore#publish.
Note: you must say dachs imp //obscore before anything obscore-related will work.
This mixin has the following parameters:
Publish a table mixing in //ssap#view (or the deprecated //ssap#mixc) to ObsTAP.
This works like the //obscore#publish mixin except some defaults apply that copy fields that work analogously in SSAP and in ObsTAP.
The columns already set in SSAP are marked as UNDOCUMENTED in the parameter list below. For special situations, you can, of course, override any of the parameters. To find out what they actually mean, mean, refer to the //obscore#publish mixin.
Note that this mixin does not set coverage (obscore: s_region). This is because although we could make a circle from ssa_location and ssa_aperture, circles are not allowed in DaCHS' s_region (which has a fixed type of spoly). The recommended solution to still have s_region is to add (and index) a custom field; the //ssap#simpleCoverage will do this.
Note: you must say dachs imp //obscore before anything obscore-related will work.
This mixin has the following parameters:
A mixin for tables containing "products".
A "product" here is some kind of binary, typically a FITS file. The table receives the columns accref, accsize, owner, and embargo (which is defined in //products#prodcolUsertable).
By default, the accref is the path to the file relative to the inputs directory; this is also what /getproduct expects for local products. You can of course enter URLs to other places.
For local files, you are strongly encouraged to keep the accref URL- and shell-clean, the most important reason being your users' sanity. Another is that obscore in the current implementation does no URL escaping for local files. So, just don't use characters like like +, the ampersand, apostrophes and so on; the default accref parser will reject those anyway. Actually, try making do with alphanumerics, the underscore, the dash, and the dot, ok?
owner and embargo let you introduce access control. Embargo is a date at which the product will become publicly available. As long as this date is in the future, only authenticated users belonging to the group owner are allowed to access the product.
In addition, the mixin arranges for the products to be added to the system table products, which is important when delivering the files.
Tables mixing this in should be fed from grammars using the //products#define row filter.
A mixin adding a pgsphere index to the main spherical position for tables with separate RA and Dec columns.
You have to designate exactly one column with the ucds pos.eq.ra;meta.main pos.eq.dec;meta.main, respectively. These columns receive the positional index.
This should be used instead of q3cindex on new tables; it's a bit slower than q3c, but it's less funky, too.
If you'd like an index on other sorts of long/lat pairs, see the //scs#spoint-index-def STREAM.
A mixin adding an index to the main equatorial positions.
In new RDs, use pgs-pos-index instead; we'd like to stop q3c support at some point.
This is what you usually want if your input data already has "sane" (i.e., ICRS or at least J2000) positions or you convert the positions manually.
You have to designate exactly one column with the ucds pos.eq.ra;meta.main pos.eq.dec;meta.main, respectively. These columns receive the positional index.
This will fail without the q3c extension to postgres.
A mixin pulling in all columns necessary to support SIAP2.
This is pulls in all obscore columns, including any you define locally in %#obscore-extracolumns. In DaCHS, we additionally have the columns coming from the products table. This latter fact means that the grammar filling tables mixing this in will need a //products#define rowfilter.
To feed these tables, use the //siap2#computePGS and the //siap2#setMeta applies in the rowmaker.
Added in 2.7.3.
This mixin is for tables serving SLAP services, i.e., tables with spectral lines. It does not contain all "optional" columns, hence the name basic. We'd do "advanced", too, if there's demand.
Use the //slap#fillBasic procDef to populate such tables.
Deprecated. use the //ssap#view mixin instead.
This mixin is for "homogeneous" data collections, where homogeneous means that all values in hcd_outpars are constant for all datasets in the collection. This is usually the case if they all come from one instrument.
Rowmakers for tables using this mixin should use the //ssap#setMeta proc application.
Do not forget to call the //products#define row filter in grammars feeding tables mixing this in. At the very least, you need to say:
<rowfilter procDef="//products#define"> <bind name="table">"mySchema.myTableName"</bind> </rowfilter>
This mixin has the following parameters:
Deprecated. use the //ssap#view mixin instead.
This mixin provides the columns and params for a common SSA service.
Rowmakers for tables using this mixin should use the //ssap#setMeta and the //ssap#setMixcMeta proc applications.
There are some limitations to the variability; in particular, all spectra must have the same types of axes (i.e., frequency, wavelength, or energy) with identical units. If you don't have that, either leave the respective metadata empty or homogenize it before ingestion.
Do not forget to call the //products#define row filter in grammars feeding tables mixing this in. At the very least, you need to say:
<rowfilter procDef="//products#define"> <bind name="table">"schema.table"</bind> </rowfilter>
This mixin has the following parameters:
A mixin that adds ssa_location column to a table.
You probably want this in the source tables for //ssap#view tables. This will also index the column. At least if you later want to publish the data through obscore, you will also want the //ssap#simpleCoverage mixin if you mix this in.
Use the //ssap#fill-plainlocation apply to feed these.
This mixin is intended for tables that get serialized into documents conforming to the Spectral Data Model 1, specifically to VOTables
The input to such tables comes from ssa tables (hcd, in this case). Their columns (and params) are transformed into params here.
The mixin adds two columns (you could add more if, e.g., you had errors depending on the spectral or flux value), spectral (wavelength or the like) and flux. Their metadata is taken from the ssa fields where available (ssa_fluxucd as flux UCD, ssa_fluxunit etc).
This mixin in action could look like this:
<table id="instance" onDisk="False"> <mixin ssaTable="spectra" fluxUnit="Jy" >//ssap#sdm-instance</mixin> </table>
The mixin thus defines a gazillion of params. This will almost always be filled using //ssap#feedSSAToSDM as explained in SDM compliant tables
This mixin has the following parameters:
A mixin furnishes a table with an ssa_region column giving a polygonal coverage. For SSA itself, that's unnecessary, but it's highly recommended if you have data with positional and aperture data and will publish it via obscore, too (which in turn is highly recommended).
The column will be filled with a hexagon approximating the aperture. This is done by //ssap#fill-plainlocation (or, historically, by //ssap#setMeta), so usually you're all set with this mixin. We also create an index for the ssa_region field.
To make it visible in obscore, you must bind the ssa_region parameter of the //ssap#view mixin to ssa_region (so the column is in the SSAP table, and the coverage mixin par of the //obscore#publishSSAPMIXC mixin to ssa_region (so the value ends up in obscore's s_region).
This mixin produces an SSA-ready relation as a view.
The idea is that you import your spectra into a table suitable for your particular data collection (but mixing in //products#define). You then fill the columns for an SSA response giving in each mixin parameter here either with a column reference (as a simple column name) or with a SQL literals (put strings into single quotes – sourcetable is the exception here). Save typing by having the final column names in the source table and using the copiedcolumns mixin par.
If you have positions for your spectra, you probably want to also mix in the //ssap#plainlocation mixin in the original table in order to have indexed positions in a way suitable for SSA queries.
In general, you will have to generate indices on the source table; postgres doesn't support indices on views. If you can't use the plainlocation mixin, please not that the the SSA engine expects spoints as the location (and these would be indexed like <index columns="loc_col" method="GIST"/>).
The mixin will automatically create an index over whatever you give for ssa_pubDID (if you give something).
This mixin has the following parameters:
Warning: The specs here are strongly in flux. The interface here is quite likely to change. If you use this mixin, please tell gavo@ari.uni-heidelberg.de so we can inform you of incompatible changes, and make sure you have robust regression tests in place.
A mixin for a simple photometric time series. Mix this in to get columns obs_time and phot_val, properly declared for the 2020 time series note.
Parameters marked with “SIL literal” can take a C-style token, a quoted string, or a reference to a column or param, where the name is prefixed by an @.
This mixin has the following parameters:
In DaCHS, triggers are conditions on rows -- either the raw rows emitted by grammars if they are used within grammars, or the rows about to be shipped to a table if they are used within tables. Triggers may be used recursively, i.e., triggers may contain more triggers. Child triggers are normally or-ed together.
Currently, there is one useful top-level trigger, the element ignoreOn. If an ignoreOn is triggered, the respective row is silently dropped (actually, you ignoreOn has a bail attribute that allows you to raise an error if the trigger is pulled; this is mainly for debugging).
The following triggers are defined:
A trigger that is true when all its children are true.
A trigger firing when the value of key in row is equal to the value given.
Missing keys are always accepted. You can define an SQL type; value will then be interpreted as a literal for this type, and this literal's value will be compared against the key's value. This is only needed for grammars like fitsProductGrammar that actually yield typed values.
A trigger firing if a certain key is missing in the dict.
This is equivalent to:
<not><keyPresent key="xy"/></not>
A trigger firing if a certain key is missing or NULL/None
A trigger firing if a certain key is present in the dict.
A trigger that is false when its children, or-ed together, are true and vice versa.
The following renderers are available for allowing and URL creation. The parameter style is relevant when adapting condDescs` or table based cores to renderers:
Unchecked renderers can be applied to any service and need not be explicitly allowed by the service.
This renderer's parameter style is "clear".
A renderer allowing to block and/or reload services.
This renderer could really be attached to any service since it does not call it, but it usually lives on //services/overview. It will always require authentication.
It takes the id of the RD to administer from the path segments following the renderer name.
By virtue of builtin vanity, you can reach the admin renderer at /seffe, and thus you can access /seffe/foo/q to administer the foo/q RD.
This renderer's parameter style is "dali".
A renderer that works like a VO standard renderer but that doesn't actually follow a given protocol.
Use this for improvised APIs. The default output format is a VOTable, and the errors come in VOSI VOTables. The renderer does, however, evaluate basic DALI parameters. You can declare that by including <FEED source="//pql#DALIPars"/> in your service.
These will return basic service metadata if passed MAXREC=0.
This renderer's parameter style is "pql".
A renderer speaking UWS.
This is for asynchronous execution of larger jobs. This is what is executed by the async renderer. It requests the worker system required from the service, which in turn obtains it from the core; these must hence cooperate with this to allow async operation.
See Custom UWSes for how to use this with your own cores.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for a VOSI availability endpoint.
An endpoint with this renderer is automatically registered for every service. The answers can be configured using the admin renderer.
This renderer's parameter style is "clear".
A renderer to put out bibliography links.
This would work with a constant-query dbCore, typically against dc.biblinks.
(Since 2.8.2)
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for a VOSI capability endpoint.
An endpoint with this renderer is automatically registered for every service. The responses contain information on what renderers ("interfaces") are available for a service and what properties they have.
This also doubles as a canary for authentication, which is why there are the somewhat complicated things in render; cf. https://wiki.ivoa.net/twiki/bin/view/IVOA/SSO_next
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer returning various forms of a service's spatial coverage.
This will return a 404 if the service doesn't have a coverage.spatial meta (and will bomb out if that isn't a SMoc).
Based on the accept header, it will return a PNG if the client indicates it's interested in that or if it accepts text/html, in which case we assume it's a browser; otherwise, it will produce a MOC in FITS format.
This renderer's parameter style is "clear".
A renderer defined in a python module.
To define a custom renderer write a python module and define a class MainPage inheriting from gavo.web.ServiceBasedPage.
This class basically is a gavo.formal.nevowc TemplatedPage, i.e., you can define loader, getChild, render, and so on.
To use it, you have to define a service with the resdir-relative path to the module in the customPage attribute and probably a nullCore. You also have to allow the custom renderer (but you may have other renderers, e.g., static).
If the custom page is for display in web browsers, define a class method isBrowseable(cls, service) returning true. This is for the generation of links like "use this service from your browser" only; it does not change the service's behaviour with your renderer.
In general, avoid custom renderers. If you can't, see the upstream twisted documentation on twisted.web.resource for how to write them.
This renderer's parameter style is "clear".
A meta-renderer for DALI-like multi-renderer services (sync, async, ...)
This, for now, can only be used for creating registry records.
This renderer's parameter style is "pql".
A renderer for asynchronous datalink.
This renderer's parameter style is "clear".
A renderer for data processing by datalink cores.
This must go together with a datalink core, nothing else will do.
This renderer will actually produce the processed data. It must be complemented by the dlmeta renderer which allows retrieving metadata.
This renderer's parameter style is "dali".
A renderer for data processing by datalink cores.
This must go together with a datalink core, nothing else will do.
This renderer will return the links and services applicable to one or more pubDIDs.
See Datalink and SODA for more information.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer representing a (tutorial-like) text document.
This must have a meta accessURL with the document URI. It may have a sourceURL meta giving the VCS URI.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for examples for service usage.
This renderer formats _example meta items in its service. Its output is XHTML compliant to VOSI examples; clients can parse it to, for instance, fill forms for service operation or display examples to users.
The examples make use of RDFa to convey semantic markup. To see what kind of semantics is contained, try http://www.w3.org/2012/pyRdfa/Overview.html and feed it the example URL of your service.
The default content of _example is ReStructuredText, and, really, not much else makes sense. An example for such a meta item can be viewed by executing gavo admin dumpDF //userconfig, in the tapexamples STREAM.
To support annotation of things within the example text, DaCHS defines several RST extensions, both interpreted text roles (used like :role-name:`content with blanks`) and custom directives (used to mark up blocks introduced by a single line like .. directive-name :: (the blanks before and after the directive name are significant).
Here's the custom interpreted text roles:
These are the custom directives:
Examples for how to write TAP examples are in the userconfig.rd distributed with DaCHS. Examples for Datalink examples can be found in the GAVO RDs feros/q and califa/q3.A
In addition, you can define moreExamples meta items. These point to further DALI-compliant examples document(s) and will typically be presented in a hierarchical fashion by clients. The content is either a URI to the examples document (when it starts with https?://) or a DaCHS-internal reference to a service with the examples. They should have a (short) title meta child to give clients a hint as to what the continuation is about, somewhat like this:
<meta> # more examples provided by the ex service in the RD rr/q moreExamples: rr/q#ex moreExamples.title: RegTAP # more examples provided by an external document moreExamples: http://ivoa.net/doc/obscore/tap-examples.xhtml moreExamples.title: ObsCore </meta>
This renderer's parameter style is "clear".
A renderer redirecting to an external resource.
These try to access an external publication on the parent service and ask it for an accessURL. If it doesn't define one, this will lead to a redirect loop.
In the DC, external renderers are mainly used for registration of third-party browser-based services.
This renderer's parameter style is "clear".
A renderer that renders a single template.
Use something like <template key="fixed">res/ft.html</template> in the enclosing service to tell the fixed renderer where to get this template from.
In the template, you can fetch parameters from the URL using something like <n:invisible n:data="parameter FOO" n:render="string"/>; you can also define new render and data functions on the service using customRF and customDF.
This is, in particular, used for the data center's root page.
The fixed renderer is intended for non- or slowly changing content. It is annotated as cacheable, which means that DaCHS will in general only render it once and then cache it. If the render functions change independently of the RD, use the volatile renderer.
During development, users must add ?nocache=True to a fixed page URI to force DaCHS to reload the template.
Built-in services for such browser apps should go through the //run RD.
This renderer's parameter style is "form".
The "normal" renderer within DaCHS for web-facing services.
It will display a form and allow outputs in various formats.
It also does error reporting as long as that is possible within the form.
This renderer's parameter style is "clear".
The renderer used for delivering products.
This will only work with a ProductCore since the resulting data set has to contain products.Resources. Thus, you probably will not use this in user RDs.
This renderer's parameter style is "clear".
A static renderer with a few amenities for HiPS trees.
To make this work, set the service's staticData property
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer that lets you format citation instructions.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer showing all kinds of metadata on a service.
This renderer produces the default referenceURL page. To change its appearance, override the serviceinfo.html template.
This renderer's parameter style is "form".
A renderer allowing for updates to individual records using file uploads.
The difference to Uploader is that no form-redisplay will be done. All errors are reported through HTTP response codes and text strings. It is likely that this renderer will change and/or go away.
This renderer's parameter style is "clear".
A renderer that works with registry.oaiinter to provide an OAI-PMH interface.
The core is expected to return a stanxml tree.
This renderer's parameter style is "clear".
The Query Path renderer extracts a query argument from the query path.
Basically, whatever segments are left after the path to the renderer are taken and fed into the service. The service must cooperate by setting a queryField property which is the key the parameter is assigned to.
QPRenderers cannot do forms, of course, but they can nicely share a service with the form renderer.
To adjust the results' appreance, you can override resultline (for when there's just one result row) and resulttable (for when there is more than one result row) templates. In the templates, you can retrieve the input parameter's value as the inPar data, for instance, like this:
<n:invisible n:data="inPar" n:render="string"/>
This renderer's parameter style is "clear".
A renderer for displaying various properties about a resource descriptor.
This renderer could really be attached to any service since it does not call it, but it usually lives on //services/overview.
By virtue of builtin vanity, you can reach the rdinfo renderer at /browse, and thus you can access /browse/foo/q to view the RD infos. This is the form used by table registrations.
In addition to all services, this renderer also links tableinfos for all non-temporary, on-disk tables defined in the RD. When you actually want to hide some internal on-disk tables, you can set a property internal on the table (the value is ignored).
This renderer's parameter style is "dali".
A renderer for the Simple Cone Search protocol.
These do their error signaling in the value attribute of an INFO child of RESOURCE.
You must set the following metadata items on services using this renderer if you want to register them:
This renderer's parameter style is "pql".
A renderer for a the Simple Image Access Protocol.
These have errors in the content of an info element, and they support metadata queries.
For registration, services using this renderer must set the following metadata items:
- sia.type -- one of Cutout, Mosaic, Atlas, Pointed, see SIAP spec
You should set the following metadata items:
- testQuery.pos.ra, testQuery.pos.dec -- RA and Dec for a query that yields at least one image
- testQuery.size.ra, testQuery.size.dec -- RoI extent for a query that yields at least one image.
You can set the following metadata items (there are defaults on them that basically communicate there are no reasonable limits on them):
- sia.maxQueryRegionSize.(long|lat)
- sia.maxImageExtent.(long|lat)
- sia.maxFileSize
- sia.maxRecord (default dalHardLimit global meta)
This renderer's parameter style is "dali".
A renderer for SIAPv2.
In general, if you want a SIAP2 service, you'll need something like the obscore view in the underlying table.
This renderer's parameter style is "pql".
A renderer for the simple line access protocol SLAP.
For registration, you must set the following metadata on services using the slap.xml renderer:
There's two mandatory metadata items for these:
This renderer's parameter style is "pql".
A renderer for the simple line access protocol SLAP.
For registration, you must set the following metadata on services using the slap.xml renderer:
There's two mandatory metadata items for these:
This renderer's parameter style is "pql".
A renderer for the simple spectral access protocol.
For registration, you must set the following metadata for the ssap.xml renderer:
- ssap.dataSource -- survey, pointed, custom, theory, artificial
- ssap.testQuery -- a query string that returns some data; REQUEST=queryData is added automatically that describe the type of data served through the service. Will usually by spectrum, but timeseries is a realistic option.
Other SSA metadata includes:
- ssap.creationType -- archival, cutout, filtered, mosaic, projection, spectralExtraction, catalogExtraction (defaults to archival)
- ssap.complianceLevel -- set to "query" when you don't deliver SDM compliant spectra; otherwise don't say anything, DaCHS will fill in the right value.
It is recommended to set this metadata globally on the RD, as the SSA mixin can use that metadata to fill tables with sensible values without operator intervention.
Properties supported by this renderer:
- datalink -- if present, this must be the id of a datalink service that can work with the pubDIDs in this table (don't use this any more, datalink is handled through table-level metadata now)
- defaultRequest -- by default, requests without a REQUEST parameter will be rejected. If you set defaultRequest to querydata, such requests will be processed as if REQUEST were given (which is of course sane but is a violation of the standard).
This renderer's parameter style is "clear".
A renderer that just hands through files.
The standard operation here is to set a staticData property pointing to a resdir-relative directory used to serve files for. Indices for directories are created.
You can define a root resource by giving an indexFile property on the service. Note in particular that you can use an index file with an extension of shtml. This lets you use nevow templates, but since metadata will be taken from the global context, that's probably not terribly useful. You are probably looking for the fixed renderer if you find yourself needing this.
This renderer's parameter style is "dali".
a DALI sync renderer.
In principle, this is just a shallow parser of the input parameter and renders tables as VOTables.
In practice, there are a few legacy hacks making this a bit more complicated after all.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for a VOSI table metadata endpoint.
An endpoint with this renderer is automatically registered for every service. The responses contain information on the tables exposed by a given service.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for displaying table information.
Since tables don't necessarily have associated services, this renderer cannot use a service to sit on. Instead, the table is being passed in as as an argument. There's a built-in vanity tableinfo that sits on //dc_tables#show using this renderer (it could really sit anywhere else).
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for displaying table notes.
It takes a schema-qualified table name and a note tag in the segments.
This does not use the underlying service, so it could and will run on any service. However, you really should run it on __system__/dc_tables/show, and there's a built-in vanity name tablenote for this.
This renderer's parameter style is "clear". This is an unchecked renderer.
A renderer for a VOSI table metadata endpoint.
An endpoint with this renderer is automatically registered for every service. The responses contain information on the tables exposed by a given service.
This renderer's parameter style is "form".
A renderer allowing for updates to individual records using file upload.
This renderer exposes a form with a file widget. It is likely that the interface will change.
This renderer's parameter style is "clear".
A renderer for creating and deleting persistent TAP-queriable tables.
This really only makes sense on the TAP service.
This renderer's parameter style is "pql".
A renderer speaking UWS.
This is for asynchronous execution of larger jobs. This is what is executed by the async renderer. It requests the worker system required from the service, which in turn obtains it from the core; these must hence cooperate with this to allow async operation.
See Custom UWSes for how to use this with your own cores.
This renderer's parameter style is "clear".
A renderer rendering a single template with fast-changing results.
This is like the fixed renderer, except that the results are not cached.
Sets metadata for an epntap data set, including its products definition.
The values are left in vars, so you need to do manual copying, e.g., using idmaps="*".
In some descriptions below, you will see __replace_framed__. This means that the actual descriptions, units, and UCDs will depend on the value of spatial_frame_type in the //epntap2#table-2_0 mixin. After you have made a first (possibly severely incomplete) import of your table, you can see the actual metadata by opening http://localhost:8080/tableinfo/yourschema.epn_core.
Setup parameters for the procedure are:
Use this apply when you use the //epntap2#localfile-2_0 mixin. This will only (properly) work when you use a //products#define rowfilter; if you have that, this will work without further configuration.
Setup parameters for the procedure are:
Fills the columns of a LineTAP table, typically created using the //linetap#table-0 mixin. The values are left in vars, so you need to copy them into the finished record, probably through idmaps="*".
Setup parameters for the procedure are:
Maps input values through a dictionary.
The dictionary is given in its python form here. This apply only operates on the rawdict, i.e., the value in vars is changed, while nothing is changed in the rowdict.
Setup parameters for the procedure are:
runs a free query against the data base and enters the first result record into vars.
locals() will be passed as data, so you can define more bindings and refer to their keys in the query.
Setup parameters for the procedure are:
is an apply proc that translates values via a utils.NameMap
Destination may of course be the source field (though that messes up idempotency of macro expansion, which shouldn't usually hurt).
The format of the mapping file is:
<target key><tab><source keys>
where source keys is a whitespace-seperated list of values that should be mapped to target key (sorry the sequence's a bit unusual).
A source key must be encoded quoted-printable. This usually doesn't matter except when it contains whitespace (a blank becomes =20) or equal signs (which become =3D).
Here's an example application for a filter that's supposed to translate some botched object names:
<apply name="cleanObject" procDef="//procs#mapValue"> <bind name="destination">"cleanedObject"</bind> <bind name="failuresMapThrough">True</bind> <bind name="value">@preObject</bind> <bind name="sourceName">"flashheros/res/namefixes.txt"</bind> </apply>
The input could look like this, with a Tab char written as " <TAB> " for clarity:
alp Cyg <TAB> aCyg alphaCyg Nova Cygni 1992 <TAB> Nova=20Cygni=20'92 Nova=20Cygni
Setup parameters for the procedure are:
Resolve identifiers to simbad positions.
It caches query results (positive as well as negative ones) in cacheDir. To avoid flooding simbad with repetitive requests, it raises an error if this directory is not writable.
It leaves J2000.0 positions as floats in the simbadAlpha and simbadDelta variables.
Setup parameters for the procedure are:
Fill variables from a simple database query.
The idea is to obtain a set of values from the data base into some columns within vars (i.e., available for mapping) based on comparing a single input value against a database column. The query should always return exactly one row. If more rows are returned, the first one will be used (which makes the whole thing a bit of a gamble), if none are returned, a ValidationError is raised.
Setup parameters for the procedure are:
Computes the spatial coverage of an image based on WCS keys it looks for in the rawdict.
The minimum is CRVAL1, CRVAL2, CRPIX1, CRPIX2, CRVAL1, CRVAL2, CUNIT1, CUNIT2, NAXISn, and CDi_j or CDELTn. Interpretation of more WCS may or may not happen. You can override the indexes of the spatial axes using naxis, which will of course change the parameter names, too.
Records without or with insufficient wcs keys are furnished with all-NULL spatial columns if the missingIsError setup parameter is False, else they bomb out with a DataError (the default).
This writes directly into rowdict; do not use idmaps="*" on rowmakers with computePGS.
Added in 2.7.3.
Setup parameters for the procedure are:
sets the bandpassId, bandpassUnit, bandpassRefval, bandpassHi, and bandpassLo from a set of standard band Ids.
The bandpass ids known are contained in a file supplied by DaCHS that you should consult for supported values by running dachs admin dumpDF data/filters.txt.
If you pass in an unknown filter name, no keys will be generated, but no diagnostics will be emitted either. Make sure to dachs info on the imported table if you expect no NULLs in the bandpass columns.
All values filled in here are in meters.
If this is used, it must run after //siap#setMeta since setMeta clobbers our result fields.
Added in 2.7.3.
Setup parameters for the procedure are:
Fills non-spatial information in an obscore record for an image.
If you define the bandpasses yourself, do not change bandpassUnit and give all values in metres.
For optical images, we recommend to fill out bandpassId and then let the //siap2#getBandFromFilter apply compute the actual limits.
Do not use idmaps="*" when using this procDef; it writes directly into result, and you would be clobbering what it does.
The proc parameters use the obscore names wherever possible, but accept most of the names of the version 1 //siap#setMeta and the //obscore#publishSIAP mixins. easy migration.
Added in 2.7.3.
Setup parameters for the procedure are:
This apply is intended for rowmakers filling tables mixing in //slap#basic. It populates vars for all the columns in there; you'll normally want idmaps="*" with this apply.
For most of its parameters, it will take them for same-named vars, so you can slowly build up its arguments through var elements.
Setup parameters for the procedure are:
feedSSAToSDM takes the current rowIterator's sourceToken and feeds it to the params of the current target. sourceTokens must be an SSA rowdict (as provided by the sdmCore). Further, it takes the params from the sourceTable argument and feeds them to the params, too.
All this probably only makes sense in parmakers when making tables mixing in //ssap#sdm-instance in data children of sdmCores.
This mixin fills the columns added by the plainlocation mixin with values generated from ra, dec, and aperture.
Setup parameters for the procedure are:
Sets metadata for an SSA data collection, including its products definition.
Since this is only useful with the deprecated hcd and mixc mixins, this should no longer be used.
The values are left in vars, so you need to do manual copying, e.g., using idmaps="*", or, if you need to be more specific, idmaps="ssa_*".
Setup parameters for the procedure are:
Sets metadata for an SSA data set from mixed sources. This will only work sensibly in cooperation with setMeta
Since //ssap#mixc is deprecated, there is no reason to use this in new RDs.
As with setMeta, the values are left in vars; if you did as recommended with setMeta, you'll have this covered as well.
Setup parameters for the procedure are:
A row generator that reads comma separated values from a field and returns one row with a new field for each of them.
Setup parameters for the procedure are:
is a row generator to expand time ranges.
The finished dates are left in destination as datetime.datetime instances
Setup parameters for the procedure are:
A row processor that produces copies of rows based on integer indices.
The idea is that sometimes rows have specifications like "Star 10 through Star 100". These are a pain if untreated. A RowExpander could create 90 individual rows from this.
Setup parameters for the procedure are:
Enters the values defined by the product interface into a grammar's result.
See the documentation on the //products#table mixin. In short: you will always have to touch table (to the name of the table this row is managed in).
If you don't serve FITS images, you will also have to set mime. Use a media type like "image/jpeg" or "text/csv" here as appropriate. If not set, it defaults to application/fits (since DaCHS 2.10; before it was image/fits).
Everything else is optional: You may want to set preview and preview_mime if DaCHS can't do previews of your stuff automatically. What's left is for special situations.
This will create the keys prodblAccref, prodtblOwner, prodtblEmbargo, prodtblPath, prodtblFsize, prodtblTable, prodtblMime, prodtblPreview, prodtbleMime, and prodtblDatalink keys in rawdict -- you can refer to them in the usual @foo way, which is sometimes useful even outside products processing proper (in particular for prodtblAccref).
Setup parameters for the procedure are:
A descriptor generator simply pulling a row from a database table. This row is made available as the .metadata attribute. You also must give a field that the generator will pull a URL from; the generator arranges things so that the default dlget will simply redirect there.
Note that the #this and #preview links DaCHS normally makes for descriptors from products are not added automatically here. Try to at least provide #this.
See also Non-Product descriptor Generators.
Since version 2.4.
Setup parameters for the procedure are:
A fairly generic FITS cutout function.
It expects some special attributes in the descriptor to allow it to decode the arguments. These must be left behind by the metaMaker(s) creating the parameters.
This is axisNames, a dictionary mapping parameter names to the FITS axis numbers or the special names WCSLAT or WCSLONG. It also expects a skyWCS attribute, a wcs.WCS instance for spatial cutouts.
Finally, descriptor must have a list attribute slices, containing zero or more tuples of (fits axis, lowerPixel, upperPixel); this allows things like BAND to add their slices obtained from parameters in standard units.
The .data attribute must be a pyfits hduList, as generated by the fits_makeHDUList data function.
Formats pyfits HDUs into a FITS file.
This all works in memory, so for large FITS files you'd want something more streamlined.
A data function for SODA returning the a fits descriptor.
This has, in addition to the standard stuff, a hdr attribute containing the primary header as pyfits structure.
The functionality of this is in its setup, getFITSDescriptor. The intention is that customized DGs (e.g., fixing the header) can use this as an original.
Setup parameters for the procedure are:
Yields standard BAND params.
DaCHS should in general be smart enough to convert between common spectral units (like MHz, or keV, or whatever) and the meter that SODA BAND expects. If your files do to give VOUnits-parseable units on the spectral axis' CUNIT header, use the wavelengthOverride param.
This adds specToMeter, meterToSpec, and spectralAxis attributes to the descriptor for later use by fits_makeBANDSlice.
Setup parameters for the procedure are:
Computes a cutout for the parameters added by makeBANDMeta.
This must sit in front of doWCSCutout.
This also reuses internal state added by makeBANDMeta, so this really only makes sense together with it.
An initial data function to construct a pyfits hduList and make that into a descriptor's data attribute.
This wants a descriptor as returned by fits_genDesc.
There's a hack here: this sets a dataIsPristine boolean on descriptor that's made false when one of the fits manipulators change something. If that's true by the time the formatter sees it, it will just push out the entire file. So, if you use this and insert your own data functions, make sure you set dataIsPristine accordingly.
Setup parameters for the procedure are:
A metaMaker that generates parameters allowing cutouts along the various WCS axes in physical coordinates.
This uses astropy.wcs for the spatial coordinates and tries to figure out what these are with some heuristics. For the remaining coordinates, it can set up separate, manual transformations. However, since in general, they will correspond to standard SODA parameters (like BAND; there are other streams of that in this RD), you have to explicitly request that by mentioning the fits axis index in axisMetaOverrides, minimally like this:
<bind key="axisMetaOverrides">{3: {}}</bind>
This will generate a parameter from the FITS metadata of axis 3. You can override param attributes in the dictionary that is the value of the axis key.
The metaMaker leaves an axisNames mapping in the descriptor. This is important for the fits_doWCSCutout, and replacement metaMakers must do the same.
The meta maker also creates a skyWCS attribute in the descriptor if successful, containing the spatial transformation only. All other transformations, if present, are in miscWCS, by a dict mapping axis labels to the fitstools.WCS1Trans instances.
Setup parameters for the procedure are:
A descriptor generator for SODA that builds a ProductDescriptor for PubDIDs that have been built by getStandardsPubDID (i.e., the path part of the IVOID is a tilde, with the products table accref as the query part).
In case you want to add a bit of extra logic in these descriptor generators, define a function addExtras(descriptor) in a setup element and add attributes as necessary.
Setup parameters for the procedure are:
A data function for SODA that returns a product instance. You can restrict the mime type of the product requested so the following filters have a good idea what to expect.
Setup parameters for the procedure are:
A data function for SODA returning a spectral data model compliant table that later data functions can then work on. As usual for generators, it uses the implicit PUBDID argument.
Setup parameters for the procedure are:
A data function for SODA returning the product row corresponding to a PubDID within an SSA table.
The descriptors generated have an ssaRow attribute containing the original row in the SSA table.
Setup parameters for the procedure are:
The tivial formatter for SODA processed data -- it just returns descriptor.data, which will only work it it works as a nevow resource.
If you do not give any dataFormatter yourself in a SODA core, this is what will be used.
Streams are recorded RD elements that can be replayed into resource descriptors using the FEED active tag. They do, however, support macro expansion; if macros are expanded, you need to given them values in the FEED element (as attributes). What attributes are required should be mentioned in the following descriptions for those predefined streams within DaCHS that are intended for developer consumption.
A condDesc that expresses a range and has an InputKey each for min and max.
Specify the following macros when replaying:
groupdesc has to work after "Range of", "Lower bound of", and "Upper bound of". Do not include a concluding period.
A condDesc over a boolean column.
By default, DaCHS does not distinguish between False and NULL when auto-generating input keys from boolean columns. That is, you can check a checkbox and will get everything where the column is true. If you do not check it, however, you will get everything, i.e., rows in which the column is any of NULL, True, or False. That's pretty much the semantics of HTML checkboxes.
However, when you want people to be able to explicitly say “this should be off“, you need to be a bit more cunning. That is what this stream does; when you have a boolean column my_bool, you can have:
<FEED source="//procs#negatableBoolean" column="my_bool"/>
next to your other condDescs in the dbCore. This will then produce a three-way selection between Yes, No, and ANY.
This stream defines the condDescs for an SSA service based on one of the mixins defined here.
This stream inserts three condDescs for SCS services on tables with pos.eq.(ra|dec).main columns; one producing the standard SCS RA, DEC, and SR parameters, another creating input fields for human consumption, and finally MAXREC.
This stream includes the standard DALI service parameters (RESPONSEFORMAT, MAXREC, VERB). For services available through IVOA-standard protocols (renderers scs.xml, siap.xml, ssap.xml, and also api), this is included automatically, so you will not usually have to manually FEED this.
This stream includes a DALI UPLOAD parameter. This is purely declarative for now, as the interpretation is done anyway for VO renderers and not at all otherwise.
Tells DaCHS to copy the index declarations from tables a view derives from (meaning; feeding this probably does not make any sense outside of a view declaration).
Since DaCHS does not have a good way to guess which tables you derive from (in the end), you have to give blank-separated RD#id references to them in the sourceTables attribute.
The proc will declare all indexes over columns that appear to be in the View (by name). That's a rough heuristics which works in most cases. If it doesn't work for you, you can always declare the indexes manually.
Note that this will only see columns that are declared lexically before the FEED; you will thus typically want to include this at the end of the view definition.
Example:
<FEED source="//procs#declare-indexes-from" sourceTables="res1/q#table1 res14#othertable"/>
Include this stream with a @what (a short phrase saying what is released) to make your resource released under Creative Commons-0 (a.k.a. public domain). This will generate the rights and rightsURI metadata items. It needs to live in the toplevel /resource element.
Example:
<FEED source="//procs#license-cc0" what="the HSOY catalogue"/>
Include this stream with a @what (a short phrase saying what is licensed) to make your resource licensed under Creative Commons Attribution (CC-BY). This will generate the rights and rightsURI metadata items. It needs to live in the toplevel /resource element.
Example:
<FEED source="//procs#license-cc-by" what="the HSOY catalogue"/>
Include this stream with a @what (a short phrase saying what is licensed) to make your resource licensed under Creative Commons Attribution Share Alike (CC-BY-SA). This will generate the rights and rightsURI metadata items. It needs to live in the toplevel /resource element.
Example:
<FEED source="//procs#license-cc-by-sa" what="the HSOY catalogue"/>
The columns of a (standard) obscore table. This can be used to define a "native" obscore table (as opposed to the more usual mixins below that expose standard products via obscore.
Even if you are sure you want to do this, better ask again...
A stream for form-based service's VOTables to include simple RA and Dec rather than normal ssa_location.
SSA services get that from the core and don't need this.
Include this stream in an ssa-like table feeding obscore. It will add an index useful for querying against obscore t_min and t_max.
This uses the dateObs and timeExt attributes which are preset for what the columns are called in SSA tables. dateObs must be a column reference because we declare the index to be on it. timeExt can be an expression, too, and no index will be declared on it.
Defaults for macros used in this stream:
Additional columns for SSA metadata tables describing Echelle spectra.
Adds an index over the separate long and lat columns. This is currently implemented using q3c; DaCHS will move to pgsphere entirely at some point. RDs using this will not need an update then. This will by default cluster according to the index. Then the rare cases when you don't want this, add a cluster="False" in the FEED.
Since long and lat may be expressions, this will not automatically declare index/columns. You should therefore pass in a comma-separated list of column names in the columns attribute in order to inform clients of the index.
Examples:
<FEED source="//scs#splitPosIndex" long="ra" lat="dec" columns="ra,dec"/> <FEED source="//scs#splitPosIndex" long="long(ssa_location)" lat="lat(ssa_location)" columns="ssa_location"/>
Defaults for macros used in this stream:
Definition of an index over spoint(ra, dec); give this parameters long and lat naming the corresponding columns (in degrees).
Or use the spoint-index mixin.
This will also cluster the table according to this index, which is almost always what you want.
This needs to be replayed within a table (or something that defines a tablename macro).
DaCHS' basic configuration is done through one or more INI-style files, which are parsed using Python's configparser module; in case of doubt on features and syntax, refer to the Python reference documentation of the Python version you are using.
Some of the more common items are discussed in the tutorial (tutorial.html#configuration-settings).
DaCHS first looks for the configuration in /etc/gavo.rc, and on production sites, it is recommended to only use this file to avoid surprises. For special situations and development systems, the following extra features are available:
The configuration items available are, by section:
Paths and other general settings.
(ignored, only left for backward compatibility)
Settings concerning TAP, UWS, and friends
Settings concerning database access.
The interface to the Greater VO.
Settings concerning the local user interface
Settings related to serving content to the web.
This section contains a few general points for python code embedded in DaCHS. Most of the material applies to procedure definitions (Element apply, Element dataFormatter, Element dataFunction, Element descriptorGenerator, Element iterator, Element metaMaker, Element processEarly, Element processLate, Element pargetter, Element phraseMaker, Element regTest, Element rowfilter, Element sourceFields) as well as to other pieces of code, such as in Element customGrammar, Element customCore, or Custom Pages.
More information on what names DaCHS would like you to see are available in Functions Available For Row Makers and The DaCHS API.
To keep the various resources as separate from each other as possible, DaCHS does not manipulate Python's import path. However, one frequently wants to have library-like modules providing common functionality or configuration in a resdir (the conventional place for these would be in res/).
To import these, use api.loadPythonModule(path). Path, here, is the full path to the file containing the python code, but without the .py. When you have the RD, the conventional pattern is:
mymod, _ = api.loadPythonModule(rd.getAbsPath("res/mymod"))
instead of import mymod.
As you can see loadPythonModule returns a tuple; you're very typically only interested in the first element.
Note in particular that for modules loaded in this way, the usual rule that you can just import modules next to you does not apply. To import a modules “next to” you without having to go through the RD, use the special form:
siblingmod, _ = api.loadPythonModule("siblingmod", relativeTo=__file__)
instead of import siblingmod. This will take the directory part for what's in relativeTo (here, the module's own path) and make a full path out of the first argument to pull the modules from there.
You often need to query the database from DaCHS-related code. To get connections, use DaCHS' connection pool, and to make sure you return the connections to the pool when done, only use them through context managers. Depending on what you need to do, there are four pools you might be interested in:
With DaCHS' connections, you usually will not obtain cursors but directly use one of the
The query q has psycopg2-style placeholders (... WHERE mag<%(maglim)s) which are filled, again psycopg2-like from the args dictionary. timeout is given in seconds.
Both return an iterator; for query, this yields a tuple per row, for queryToDict, that is dictionaries, the keys of which are the lowercased column names (or something hard to predict for expressions in the select clause not sporting an AS). In case you need mixed-case keys (we recommend you avoid that), you can pass in a caseFixer dictionary that maps the lowercased names to their mixed-case versions.
Note that the queries passed will not be executed unless you start consuming the iterator returned. This means that you can only use them when you actually get back a result. Instead of the query methods, use execute for statements that do not return anything.
In sum, the typical database query in the vicinity of DaCHS would look like this:
count_sum = 0 with base.getTableConn() as conn: for row in conn.queryToDicts("SELECT * FROM sch.my_table"): count_sum += row["count"] print(row)
or, for queries not returning anything:
with base.getWritableAdminConn() as conn: conn.execute("DROP TABLE obsolete_table")
In case you wondered: Yes, in the past we have experimented with abstracting away the SQL. And we found it doesn't make the code any more robust, just a lot harder to figure out.
Most basic information on data descriptors is contained in tutorial.html. The material here just covers some advanced topics.
By default, dachs imp will try to drop all tables made by the data descriptors selected. For “growing” data, where new observations (or other data) are added on a cadence of hours to months, that is suboptimal, since typicaly just a few new datasets need to be added to the table, and re-ingesting everything else is just a waste of time and CPU.
To accomodate such situations, DaCHS supports an updating="True" attribute on data elements; updating DDs will create tables that do not exist but will not drop existing ones.
Updating DDs will still run like normal DDs and thus import everything matching the DD's sources. Thus, after the second import you would have duplicate records from sources that existed during the first import.
To avoid that, you (usually) need to ignore existing sources (see Element ignoreSources). In the typical case, where a dataset's accref is just the inputs-relative path to the dataset's source, that is easily accomplished by using fromdb on ignoreSources. More complex schemes, including, if you are daring, moving files in sourceDone Python scripts or, if there are file overwrite, also checking for updated files using fromdbUpdating on ignoreSources, are conceivable.
Where you have regular updates (like “re-ingest every night”), consider using Element execute to run the re-imports rather than a cron job so everything is in one place. That execute would run a simple dachs imp, except that you can save a bit of work by skipping metadata updates (--suppress-meta; this kind of thing should not happen as a side effect of something that runs automatically anyway), you should generally ignore bad files so that at least the good new ones are there (-c), and you should probably not interact with the user (--ui stingy), since there is none in such a cron-like situation. Assembled, you could write:
<execute id="do-update" title="ingest new files" at="14:10"> <job> <code> execDef.spawn("dachs --ui stingy" " imp --suppress-meta -c lightmeter/q add".split()) </code> </job> </execute>
This assumes the lightmeter/q RD has an updating DD with the id add. In many cases you can just use your single automatic DD as the updater, in which case you can compact that execute element to:
<execute id="do-update" title="ingest new files" at="14:10"> <job> <code> execDef.spawn(r"dachs --ui stingy imp --suppress-meta -c \rdId".split()) </code> </job> </execute>
The most straightforward way to ignore sources already imported is by checking a database table which are already there. This is exactly what you can get with the fromdb attribute on ignoreSources.
Unless you are playing games with the accrefs (in which case you are probably smart enough to figure out how to adapt the pattern), the following speficiation will exactly import all FITS files within the data subdirectory of the resdir that haven't been ingested into the mydata table during the last run, either because they have not been there or because there were skipped during an import -c:
<data id="import" updating="true"> <sources pattern="data/*.fits"> <ignoreSources fromdb="select accref from \schema.mydata"/> </sources> <fitsProdGrammar> <rowfilter procDef="//products#define"> <bind key="table">"\schema.mydata"</bind> </rowfilter> </fitsProdGrammar> <make table="mydata"> <!-- your rowmaker here --> </make> </data>
Note that fromdb can be combined with fromfiles and pattern; whatever is specified in the latter two will always be ignored.
To completely re-import such a table – for instance after a table schema change or because the whole data collection has been re-processed –, just run dachs drop on the DD and run import as usual.
It is probably a good idea to occasionally run dachs imp -I on tables updated in this way to optimise the indices (a REINDEX <tablename> in a database shell will do, too).
Sometimes reprocessing happens quite frequently to a small subset of the datasets in a resource. In that case, it would again be a waste to tear down the entire thing just to update a handful of records.
For such situations, there is the fromdbUpdating attribute of ignoreSources. As with fromdb, this contains a database query, but in addition to the accref, this query has to return a timestamp. A source is then only ignored if this timestamp is not newer than the disk file's one. If that timestamp is the mtime of the file in the original import, the net effect is that files that have been modified since that import will be re-ingested.
There is a catch, though: You need to make sure that the record ingested previously is removed from the table. Typically, you can do that by defining accref as a primary key (if that's not possible because you are generating multiple records with the same accref, there is nothing wrong with using a compound primary key). This will, on an attempted overwrite, cause an IntegrityError, and you can configure DaCHS to turn this into an overwrite using the table's forceUnique and dupePolicy attributes.
The following snippet illustrates the technique:
<table id="withdate" mixin="//products#table" onDisk="True" primary="accref" forceUnique="True" dupePolicy="overwrite"> <column name="mtime" type="timestamp" ucd="time;meta.file" tablehead="Timestamp" description="Modification date of the source file."/> <!-- your other columns --> </table> <data id="import" updating="True"> <sources pattern="data/*.fits"> <ignoreSources fromdbUpdating="select accref, mtime from \schema.withdate"/> </sources> <fitsProdGrammar> <rowfilter procDef="//products#define"> <bind key="table">"\schema.withdate"</bind> </rowfilter> </fitsProdGrammar> <make table="withdate"> <rowmaker> <map key="mtime">datetime.datetime.utcfromtimestamp( os.path.getmtime(\fullPath))</map> <!-- other rowmaker rules --> </rowmaker> </make> </data>
Again, this can be combined with the other attributes of ignoreSources; in effect, whatever is ignored from them is treated as if their modification dates were in the future.
When using Element odbcGrammar, identifying what is already ingested clearly cannot use sources. Instead, you will have to pick added or modified records in some other way. Realistically, you should keep some monotonously increasing value on both sides. Ideally, it would be a transaction id, because for these, it is clear whether or not something has actually been transferred. In a pinch, a unix timestamp will do, too.
Particular care should be taken when harvesting from databases to avoid duplicate rows when a re-harvest fetches the “same” record twice for whatever reason. You should probably designate a primary key and specify a dupePolicy like this:
<table id="mirror" onDisk="True" primary="obid" dupePolicy="overwrite">
In case some other table holds foreign keys into your table, it is wise to think hard whether the dupePolicy should really be dropOld (cf. Element table).
Your data element will use an odbc grammar with a computed query (available on DaCHS newer than 2.5). The example in Element makeQuery shows the basics. The important part is to consider the case when the local table does not exist yet (as it will on the original import). Dealing with this as shown in the example lets you use the same data element for imports and updates.
Various elements support the setting of metadata through meta elements. Metadata is used for conveying RMI-style metadata used in the VO registry. See [RMI] for an overview of those. We use the keys given in RMI, but there are some extensions discussed in RMI-style Metadata.
The other big use of meta information is for feeding templates. Those "local" keys should all start with an underscore. You are basically free to use those as you like and fetch them from your custom templates. The predefined templates already have some meta items built in, discussed in Template Metadata.
So, metadata is a key-value mapping. Keys may be compound like in RMI, i.e., they may consist of period-separated atoms, like publisher.address.email. There may be multiple items for each meta key.
In RDs, there are two ways to define metadata: Meta elements and meta streams; the latter are also used in defaultmeta.txt.
These look like normal XML elements and have a mandatory name attribute, a meta key relative to the element's root . The text content is taken as the meta value; child meta elements are legal.
An optional attribute for all meta elements is format (see Meta Formats).
Typed meta elements can have further attributes; these usually can also be given as meta children with the same name.
Usually, metadata is additive; add a key twice and you will have a sequence of two meta values. To remove previous content, prefix the meta name with a bang (!). Here is an example:
<resource> <!-- a simple piece of metadata --> <meta name="title">A Meta example</meta> <!-- repeat a meta thing for a sequence (caution: not everything is repeatable in all output formats --> <meta name="subject">Examples</meta> <meta name="subject">DaCHS</meta> <!-- Hierarchical meta can be set nested --> <meta name="creator"> <meta name="name">Nations, U.N.</meta> <meta name="logo">http://un.org/logo.png</meta> </meta> <meta name="creator"> <meta name="name">Neumann, A.E.</meta> </meta> <!-- @format lets you specify extra markup; make sure you have consistent initial indentation. --> <meta name=description" format="rst"> This resource is used in the `DaCHS reference docs`_ .. _DaCHS reference Docs: http://docs.g-vo.org/DaCHS </meta> <!-- you can contract "deeper" trees in paths --> <meta name="contact.email">gavo@ari.uni-heidelberg.de</meta> <!-- typed meta elements can have additional attributes --> <meta name="uses" ivoId="ivo://org.gavo.dc/DaCHS" >DaCHS server sortware</meta> <!-- To overwrite a key set before, prefix the name with a bang. --> <meta name="!title">An improved Meta example</meta> </resource>
The resulting meta structure is like this:
+-- title | +---- "An improved Meta example | +-- subject | +---- "Examples | +---- "DaCHS | +-- creator | +----- name | | +---- "Nations, U.N. | +----- logo | | +---- "http://un.org/logo.png +-- creator | +----- name | +---- "Neumann, A.E. | +-- description | +----- [formatted text, "This resource..."] | +-- contact | +----- email | +----- "gavo@ari.uni-heidelberg.de | +-- uses +----- "DaCHS server software +----- ivoId +----- "ivo://org.gavo.dc/DaCHS
In several places, most notably in the defaultmeta.txt file and in meta elements without a name attribute, you can give metadata as a “meta stream”. This is just a sequence of lines containing pairs of <meta key> and <meta value>.
In addition, there are comments, empty lines, continuations, forced overwriting, and format selection.
Continuation lines work by ending a line with a backslash. The following line separator and all blanks and tabs following it are then ignored. Thus, the following two meta keys end up having identical values:
meta1: A contin\ uation line needs \ a blank if you wan\ t one. meta2: A continuation line needs a blank if you want one
Note that whitespace behind a backslash prevents it from being a continuation character. That is, admittedly, a bit of a trap.
Other than their use as continuation characters, backslashes have no special meaning within meta streams as such. Within meta elements, however, macros are expanded after continuation line processing if the meta parent knows how to expand macros. This lets you write things like:
<meta> creationDate: \metaString{authority.creationDate} managingOrg:ivo://\getConfig{ivoa}{authority} </meta>
Comments and empty lines are easy: Empty lines are allowed, and a comment is a line with a hash (#) as its first non-whitespace character. Both constructs are ignored, and you can even continue comments (though you should not).
When you repeat a key, metadata is being added. Hence,:
subject: active-galaxies subject: meteorites
will lead to two keywords in the subject meta. Sometimes you instead want to overwrite what is already there, in particular for meta that needs to be unique:
!title: A revised title
will make sure that there is just a single title meta, and it is “A revised title”.
Finally, stream meta has format=plain by default. To select raw or reStructuredText format, prefix the value with raw: or rst:, respectively:
_sidebarlocal: raw:<div class="sidebarnote">\ <a href="/adql">Try ADQL</a> to query our data.</div> description: rst:This is *nice*, **important** data.
When you query an element for metadata, it first sees if it has this metadata. If that is not the case, it will ask its meta parent. This usually is the embedding element. It wil again delegate the request to its parent, if it exists. If there is no parent, configured defaults are examined. These are taken from rootDir/etc/defaultmeta, where they are given as colon-separated key-value pairs, e.g.,
publisher: The GAVO DC team publisherID: ivo://org.gavo.dc contact.name: GAVO Data Center Team contact.address: Moenchhofstrasse 12-14, D-69120 Heidelberg contact.email: gavo@ari.uni-heidelberg.de contact.telephone: ++49 6221 54 1837 creator.name: GAVO Data Center creator.logo: http://vo.ari.uni-heidelberg.de/docs/GavoTiny.png
The effect is that you can give global titles, descriptions, etc. in the RD but override them in services, tables, etc. The configured defaults let you specify meta items that are probably constant for everything in your data center, though of course you can override these in your RD elements, too.
In HTML templates, missing meta usually is not an error. The corresponding elements are just left empty. In registry documents, missing meta may be an error.
Metadata must work in registry records as well as in HTML pages and possibly in other places. Thus, it should ideally be given in formats that can be sensibly transformed into the various formats.
DaCHS knows four input formats:
Macros will be expanded in meta items using the embedding element as macro processors (i.e., you can use the macros defined by this element).
While generally the DC software does not care what you put into meta items and views them all as strings, certain keys are treated specially. The following meta keys trigger some special behaviour:
A meta value representing a timestamp.
Accessing it, you will get a formatted ISO/DALI string. You can construct them with both strings (that we'll try to parse and bomb if that's not possible) and datetime.datetime objects.
A MetaValue to keep VOSI examples in.
All of these must have a title, which is also used to generate references.
These also are in reStructuredText by default, and changing that probably makes no sense at all, as these will always need interpreted text roles for proper markup.
Thus, the usual pattern here is:
<meta name="_example" title="An example for _example"> See docs_ .. _docs: http://docs.g-vo.org </meta>
A meta value representing a timestamp.
Accessing it, you will get a formatted ISO/DALI string. You can construct them with both strings (that we'll try to parse and bomb if that's not possible) and datetime.datetime objects.
A meta value representing a "news" item.
The content is the body of the news. In addition, they have date, author, and role children. In plain text, you would write:
_news: Frobnicated the quux. _news.author: MD _news.date: 2009-03-06 _news.role: updated
In XML, you would usually write:
<meta name="_news" author="MD" date="2009-03-06"> Frobnicated the quux. </meta>
_news items become serialised into Registry records despite their leading underscores. role then becomes the date's role.
A meta value containing a link and optionally a title
In plain text, this would look like this:
_related:http://foo.bar _related.title: The foo page
In XML, you can write:
<meta name="_related" title="The foo page" ivoId="ivo://bar.org/foo">http://foo.bar</meta>
or, if you prefer:
<meta name="_related">http://foo.bar <meta name="title">The foo page</meta></meta>
These values are used for _related (meaning "visible" links to other services).
For links within you data center, use the internallink macro, the argument of which the the "path" to a resource, i.e. RD path/service/renderer; we recommend to use the info renderer in such links as a rule. This would look like this:
<meta name="_related" title="Aspec SSAP" >\internallink{aspec/q/ssa/info}</meta>
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value representing a timestamp.
Accessing it, you will get a formatted ISO/DALI string. You can construct them with both strings (that we'll try to parse and bomb if that's not possible) and datetime.datetime objects.
A MetaValue corresponding to a small image.
These are rendered as little images in HTML. In XML meta, you can say:
<meta name="_somelogo" type="logo">http://foo.bar/quux.png</meta>
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A MetaValue for a DOI.
This lets people construct DOI meta with or without a doi: prefix. It also creates landing page links in HTML.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value for info items in VOTables.
In addition to the content (which should be rendered as the info element's text content), it contains an infoName and an infoValue.
They are only used internally in VOTable generation and might go away without notice.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A MetaValue corresponding to a small image.
These are rendered as little images in HTML. In XML meta, you can say:
<meta name="_somelogo" type="logo">http://foo.bar/quux.png</meta>
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value representing a "note" item.
This is like a footnote, typically on tables, and is rendered in table infos.
The content is the note body. In addition, you want a tag child that gives whatever the note is references as. We recommend numbers.
Contrary to other meta items, note content defaults to rstx format.
Typically, this works with a column's note attribute.
In XML, you would usually write:
<meta name="note" tag="1"> Better ignore this. </meta>
A meta value containing a link and optionally a title
In plain text, this would look like this:
_related:http://foo.bar _related.title: The foo page
In XML, you can write:
<meta name="_related" title="The foo page" ivoId="ivo://bar.org/foo">http://foo.bar</meta>
or, if you prefer:
<meta name="_related">http://foo.bar <meta name="title">The foo page</meta></meta>
These values are used for _related (meaning "visible" links to other services).
For links within you data center, use the internallink macro, the argument of which the the "path" to a resource, i.e. RD path/service/renderer; we recommend to use the info renderer in such links as a rule. This would look like this:
<meta name="_related" title="Aspec SSAP" >\internallink{aspec/q/ssa/info}</meta>
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A MetaValue translating UAT terms to their labels in HTML.
This follows the assumption that HTML is what humans look at, anything else is either computers or nerds.
A meta value containing an ivo-id and a name of a related resource.
The sort of relationsip is encoded in the meta name, where the terms are defined in the vocabuary http://www.g-vo.org/rdf/voresource/relationship_type (where there are minor lexical deviations from identifiers there to DaCHS' meta names).
All relationship metas should look like this (using isSupplementTo as an example; while an ivoId is not mandatory, it rarely makes sense to declare a relationship without it):
isSupplementTo: GAVO TAP service isSupplementTo.ivoId: ivo://org.gavo.dc
It is also possible to include an altIdentifier (such as a DOI) to declare a relationship to a resource not declared in the IVOA registry.
isServedBy and isServiceFor are somewhat special cases, as the service attribute of data publications automatically takes care of them; so, you shouldn't usually need to bother with these two manually.
A MetaValue serialized into VOTable links (or, ideally, analogous constructs).
This exposes the various attributes of VOTable LINKs as href linkname, contentType, and role. You cannot set ID here; if this ever needs referencing, we'll need to think about it again. The href attribute is simply the content of our meta (since there's no link without href), and there's never any content in VOTable LINKs).
You could thus say:
votlink: http://docs.g-vo.org/DaCHS votlink.role: doc votlink.contentType: text/html votlink.linkname: GAVO DaCHS documentation
Additionally, there is creator, which is really special (at least for now). When you set creator to a string, the string will be split at semicolons, and for each substring a creator item with the respective name is generated. This may sound complicated but really does about what your would expect when you write:
<meta name="creator">Last, J.; First, B.; Middle, I.</meta>
Additional “magic” meta keys in DaCHS (in the sense that they control DaCHS behaviour) include:
Certain meta keys have a data center-internal interpretation, used in renderers or writers of certain formats. These keys should always start with an underscore. Among those are:
_intro: | used by the standard HTML template for explanatory text above the search form. |
---|---|
_bottominfo: | used by the standard HTML template for explanatory text below the search form. |
_related: | used in the standard HTML template for links to related services. As listed above, this is a link, i.e., you can give a title attribute. |
_longdoc: | used by the service info renderer for an explanatory piece of text of arbitrary length. This will usually be in ReStructuredText, and we recommend having the whole meta body in a CDATA section. |
_news: | news on the service. See above at Typed Meta Elements. |
_warning: | used by both the VOTable and the HTML table renderer. The content is rendered as some kind of warning. Unfortunately, there is no standard how to do this in VOTables. There is no telling if the info elements generated will show anywhere. |
_noresultwarning: | |
displayed by the default response template instead of an empty table (use it for things like "No Foobar data for your query") |
|
_type: | on Data instances, used by the VOTable writer to set the type attribute on RESOURCE elements (to either "results" or "meta"). Probably only useful internally. |
superseded: | in RDs or services, marks them as superseded, which generally makes them inaccessible. The body of this meta should provide pointers to where the new version(s) of the resources might be found (cf. tutorial.html#deleting-resources). |
_plotOptions: | typically set on services, this lets you configure the initial appearance of the javascript-based quick plot. The value must be a javascript dictionary literal (like {"xselIndex": 2}) unless you're trying CSS deviltry (which you could, using this meta; then again, if you can inject RDs, you probably don't need CSS attacks). Keys evaluated include:
|
For services (and other things) that are registered in the Registry, you must give certain metadata items (and you can give more), where we take their keys from [RMI]. We provide a explanatory leaflet for data providers. The most common keys -- used by the registry interface and in part by HTML and VOTable renderers -- include:
title: | this should in general be given separately on the resource, each table, and each service. In simple cases, though, you may get by by just having one global title on the resource and rely on metadata inheritance. |
---|---|
shortName: | a string that should indicate what the service is in 16 characters or less. |
creationDate: | Use ISO format with time, UTC only, like this: 2007-10-04T12:00:00Z |
_dataUpdated: | The timestamp of the last successful dachs imp, again in DALI/ISO format. |
_metadataUpdated: | |
Timestamp when the metadata was last updated. On RDs, that's the timestamp of the RD source, on published things, it's the timestamp of the last dachs pub. |
|
subject: | A subject keyword. By VOResource 1.1, these should be taken from http://www/ivoa.net/rdf/uat |
rights: | freetext copyright notice. See tutorial.html#Licensing for details. |
rights.rightsURI: | |
machine readable license URI. Take from https://spdx.org/licenses/ if possible. |
|
source: | bibcodes will be expanded to ADS links here. |
referenceURL: | again, a link, so you can give a title for presentation purposes. If you give no referenceURL, the service's info page will be used. |
creator.name: | this should be the name of the "author" of the data set. If you set this, you may want to override creator.logo as well. For persons, always use the form “Last, F.I.“; this saves all components the unsolvable problem to tell first from last names, and sorting these strings will naturally yield the sequence people expect. Also, if you have multiple creators, better just set creator as discussed in Typed Meta Elements. |
type: | one of Other, Archive, Bibliography, Catalog, Journal, Library, Simulation, Survey, Transformation, Education, Outreach, EPOResource, Animation, Artwork, Background, BasicData, Historical, Photographic, Press, Organisation, Project, Registry – it's optional and we doubt its usefulness. You may repeat the content type if you need to; see also [RMI], sect. 3.3. |
contentLevel: | addresse(s) of the data: Research, Amateur, General |
facility: | no IVOA ids are supported here yet, but probably this should change. |
coverage: | see the special section |
service-specific metadata (for SIA, SCS, etc.): | |
see the documentation of the respective cores. |
|
utype: | tables (and possibly other items) can have utypes to signify their role in specific data models. For tables, this utype gets exported to the tap_schema. |
identifier: | this is the IVOID of the resource, usually generated by DaCHS. Do not override this unless you know what you are doing (which at least means you know how to make DaCHS declare an authority and claim it). If you do override the identifier of a service that's already published, make sure you run dachs admin makeDeletedRecord <previous identifier> (before or after the dachs pub on the resource, or the registries will have two copies of your record, one of which will not be updated any more; and that would suck for Registry users. |
published_identifer: | |
Like identifier, except it will be None if the resource does not look like it is (destined to be) published. |
|
mirrorURL: | add these on publication to declare mirrors for a service. Only do so if you actually manage the other service. If you list the service's own accessURL here, it will be filtered from this registry record; this is so you can use the same RD on the primary site and the mirror. |
_example: | A DALI example. See tutorials.html#writing-examples |
moreExamples: | A URI for an additional DALI examples document. These get translated to DALI continuation-s. |
tableset: | A DaCHS reference to a table to include in a registry tableset (new in 2.7.3). This currently will only be interpreted in document-typed resources. |
productTypeServed: | |
Set on a service element, this declares what sort of data products the service publishes (something like image, spectrum, or polarization-cube). The content of these meta items must come from the IVOA product-type vocabulary, and you can give more than one. For SIA1 and SSAP services, you do not have to give it unless you are serving “unusual” product types (like time series from an SSAP service). For Obscore, where this is relevant, the recommended way to give the metadata is to run dachs limits //obscore, which will take the values from the database. |
|
altIdentifier: | You can set this as a meta child for various keys (for the resource as a whole, use doi to set what comes out as altIdentifier in VOResource). Elements for which this can be set includes creator.name, publisher, facility, instrument, and contributor. This would look like this: publisher: GAVO central publisher.altIdentifier: http://orcid.org/abcd-efgh In the special case of creator.name you would write: creator.name: Author, A. creator.name.altIdentifier: whatever creator: creator.name: Other, A. creator.name.altIdentifier: whatever-else |
While you can set any of these in etc/defaultmeta.txt, the following items are usually set there:
Coverage metadata lets clients get a quick idea of where in space, time, and electromagnetic spectrum the data within a resource is. Obviously, this information is particularly important for resource discovery in registries.
Not all resources have coverages on all axes; a service validator, say, probably has no physical coverage at all, and a theoretical spectral service may just have meaningful spectral coverage.
There are two meta keys pertinent to coverage metadata:
The legacy coverage.profile meta key should not be used any more.
To give proper, numeric STC coverage, use the Element coverage.
It has three children, one each for the spatial, spectral, and temporal axes. For spectral and temporal, just add as many intervals as necessary. Do not worry about gaps in the temporal coverage: it is not necessary that the coverage is “tight”; as long as there is a reasonable expectation that data could be there, it's fine to declare coverage. Hence, for ground-based observations, there is no need to exclude intervals of daylight, bad weather, or even maintenance downtime.
Intervals are given as in VOTable tabledata, i.e., as two floating point numbers separated by whitespace. There are no (half-) open intervals – just use insanely small or large numbers if you really think you need them.
For spatial coverage, a single spatial element should be given. It has to contain a MOC in ASCII serialisation. Recent versions of Aladin can generate those, or you can write SQL queries to have them computed by sufficiently new versions of pgsphere. Most typically, you will use updater elements to fill spatial coverage (see below).
A complete coverage element would thus look like this:
<coverage> <spectral>3.8e-07 5.2e-07</spectral> <temporal>18867 27155</temporal> <spatial> 4/2068 5/8263,8268-8269,8271,8280,8323,8326,8329,9376,9378 6/33045-33047,33049,33051,33069,33080-33081,33083,33104-33106, 33112,33124-33126,33128-33130,33287,33289,33291,33297-33299, 33313,33315,33323-33326,33328-33330,37416,37418,37536 </spatial> </coverage>
In general computing coverage is a tedious task. Hence, DaCHS has rules to compute it for many common cases (SSAP, SIAP, Obscore, catalogs with usable UCDs). Because coverage calculations can run for a long time, they are not performed online. Instead, DaCHS updates coverage elements when the operator runs dachs limits. In the simplest case, operators add:
<coverage> <updater sourceTable="data"/> <spectral/> <temporal/> <spatial/> </coverage>
into an RD with a table named data. Currently, this must be lexically below the table element, but if this isn't fixed to allow the location of the coverage element near the rest of the metadata near the top of the RD, complain fiercely.
Operators then run dachs limits q (assuming the RD is called q.rd), and DaCHS will fill out the three coverage elements (in case you want to fix them: the heuristics it uses to do that are in gavo.user.info).
In this construction, DaCHS will overwrite any previous content in the coverage child elements. If you want to fill out some coverage items manually and have DaCHS only compute, say, the spatial coverage, don't give the sourceTable attribute (which essentially says: “grab as much coverage from the referenced table as you can”) but rather the specialised spaceTable. This is particularly useful if you want to annotate ”holes” in your temporal coverage. For instance, if your resource contains two fairly separate campaigns (which DaCHS does not currently realise automatically):
<coverage> <updater spaceTable="main"/> <spatial/> <temporal>45201 45409</temporal> <temporal>54888 55056</temporal> </coverage>
Due to limitations of pgsphere, DaCHS does not currently take into account the size of the items in a database table. While that is probably all right for spectra and catalogs, for images this might lose significant coverage, as DaCHS only uses the centers of the images and just marks the containing healpix of the selected MOC order. The default MOC order is 6 (a resolution of about a degree). Until we properly deal with polygons, make sure to increase the MOC order to at least the order of magnitude of the images in an image service, like this:
<coverage> <updater sourceTable="main" mocOrder="4"/> <spatial/> </coverage>
If you know your resource only contains relatively few but compact patches, you may also want to increase mocOrder (spatial resolution doubles when you increase mocOrder by one).
Display hints use an open vocabulary. As you add value formatters, you can evaluate any display hint you like. Display hints understood by the built-in value formatters include:
a key that gives hints what to do with the column. Values currently understood include:
Note that not any combination of display hints is correctly interpreted. The interpretation is greedy, and only one formatter at a time attempts to interpret display hints.
In the VO, data models are used when simple, more or less linear annotation methods like UCDs do not provide sufficient expressive power. Or well, they should be used. As of early 2017, things are, admittedly, still a mess.
DaCHS lets you annotate your data in dm elements; the annotation will then be turned into standard VOTable annotation (when that's defined). Sometimes, the structured references provided by the DM annotation are useful elsewhere, too – the first actual use of this framework was the geojson serialisation discussed below.
We first discuss SIL, then its use in actual data models. At least skim over the next section – it sucks to discover the SIL grammar by trial and error.
Old-style STC annotation is not discussed here. If you still want to do it (and for now, you have to if you want any STC annotation – sigh), check out the terse discussion in the tutorial
Data model annotation in DaCHS is done using SIL, the Simple Instance Language. It essentially resembles JSON, but all delimiters not really necessary for our use case have been dropped, and type annotation has been added.
The elements of SIL are:
The NULL literal, __NULL__. Attributes that are set to this literal are elided. This is mostly useful in the connection with data model annotation within mixins.
Atomic Values. For SIL, everything is a string (it's a problem of DM validation to decide otherwise). When your string consists exclusively of alphanumerics and [._-], you can just write it in SIL. Otherwise, you must use double quotes. as in SQL, write two double quotes to include a literal double quote. So, valid literals in SIL are
Invalid literals include:
Plain Identifiers. These are C-like identifiers (a letter or an underscore optionally followed by letters, number or underscores).
Comments. SIL comments are classical C-style comments (/*...*/). They don't nest yet, but they probably will at some point, so don't write /* within a comment.
Object annotation. This is like a dictionary; only plain identifiers are allowed as keys. So, an object looks like this:
{ foo: bar longer: "This is a value with blanks in it" }
Note again that no commas or quotes around the keys are necessary (or even allowed).
Sequences. This is like a list. Members can be atomic or objects, but they have to be homogeneous (SIL doesn't enforce this by grammatical means, though. Here is an object with two sequences:
{ seq1: [3 4 5 "You thought these were numbers? They're strings!"] seq2: [ { seq_index: 0 value: 3.3} { seq_index: 1 value: 0.3} ] }
References. The point of SIL is to say things about column and param instances. Both of them (and other dm instances, tables, and in principle anything else in RDs) can be referenced from within SIL. A reference starts with an @ and is then a normal DaCHS cross identifier (columns and params within a table can be referenced by name only, columns take precedence on name clashes). If you use odd characters in your RD names or in-RD identifiers, think again: only [._/#-] are allowed in such references. Here is an object with some valid references:
{ long: @raj2000 /* a column in the enclosing table */ lat: @dej2000 system: @//systems#icrs /* could be a dm instance in a DaCHS-global RD;, this does *not* exist yet */ source: @supercat/q#main /* perhaps a table in another RD */ }
Casting. You can (and sometimes have to) give explicit types in the SIL annotation. Types look like C-style casts. The root of a SIL annotation must always have a cast; that allows DaCHS to figure out what it is, which is essential for validation (and possibly inference of defaults and such). You can cast both single objects and sequences. Here's an example that actually validates for DaCHS' SIL (which the examples above wouldn't because they're missing the root annotation):
(testdm:testclass) { /* cast on root: mandatory */ attr1 { /* no cast here; DaCHS can infer attr1's type if necessary */ attr2: val } seq: (testdm:otherclass)[ /* Sequence cast: */ {attr1: a} /* all of these are now treated as testdm:otherclass */ {attr1: b} {attr1: c}]}
To produce photcal groups as per the 2020 Timeseries Note and perhaps later specs, use an annotation like this:
<dm> (phot:PhotCal) { filterIdentifier: "Gaia/G" zeroPointFlux: \zeroPointFlux magnitudeSystem: Vega effectiveWavelength: \effectiveWavelength value: @phot } </dm>
– where phot would be the column containing the photometry.
When – as is most likely when you want to have the – you are using the //timeseries#phot-0 mixin, you would only give such a group when you have multiple photometry systems in one light curve (which you should avoid). For the first photometry column, this declaration is done by the mixin.
To produce GeoJSON output (as supported by DaCHS' TAP implementation), DaCHS needs to know what the “geometry“ in the sense of GeoJSON is. Furthermore, DaCHS keeps supporting declaring reference systems in the crs attribute, as the planetology community uses it.
The root class of the geojson DM is geojson:FeatureCollection. It has up to two attributes (crs and feature), closely following the GeoJSON structure itself. The geometry is defined in feature's geometry attribute. All columns not used for geometry will end up in GeoJSON properties.
So, a complete GeoJSON annotation, in this case for an EPN-TAP table, could look like this:
<table> <dm> (geojson:FeatureCollection){ crs: (geojson:CRS) { type: name properties: (geojson:CRSProperties) { name: "urn:x-invented:titan"}}}} feature: { geometry: { type: sepsimplex c1min: @c1min c2min: @c2min c1max: @c1max c2max: @c2max }}} </dm> <mixin spatial_frame_type="body"/> </table>
Yes, the use type attributes is a bit of an abomination, but we wanted the structure to follow GeoJSON in spirit.
The crs attribute could also be of type link, in which case the properties would have attributes href and type; we're not aware of any applications of this in planetology, though. crs is optional (but standards-compliant GeoJSON clients will interpret your coordinates as WGS84 on Earth if you leave it out).
For geometry, several values for type are defined by DaCHS, depending on how the GeoJSON geometry should be constructed from the table. Currently defined types include (complain if you need something else, it's not hard to add):
sepcoo – this is for a spherical point with separate columns for the two axes. This needs latitude and longitude attributes, like this:
<dm> (geojson:FeatureCollection){ feature: { geometry: { type: sepcoo latitude: @lat longitude: @long }}} </dm>
seppoly – this constructs a spherical polygon out of column references. These have the form c_n_m, where m is 1 or 2, and n is counted from 1 up to the number of points. DaCHS will stop collecting points as soon as it doesn't find an expected key. If you find yourself using this, check your data model. An example:
<dm> (geojson:FeatureCollection){ feature: { geometry: { type: seppoly /* a triangle of some kind */ c1_1: @rb0 c1_2: @rb1 c2_1: @lb0 c2_2: @lb1 c3_1: @t0 c3_2: @t1 }}} </dm>
sepsimplex – this constructs a spherical box-like thing from minimum and maximum values. It has c[12](min|max) keys as in EPN-TAP. As a matter of fact, a fairly typical annotation for EPN-TAP would be:
<dm> (geojson:FeatureCollection){ feature: { geometry: { type: sepsimplex c1min: @c1min c2min: @c2min c1max: @c1max c2max: @c2max }}} </dm>
geometry – this constructs a geometry from a pgsphere column. Since GeoJSON doesn't have circles, only spoint and spoly columns can be used. They are referenced from the value key. For instance, obscore and friends could use:
<dm> (geojson:FeatureCollection) { feature: { geometry: { type: geometry value: @s_region }}} </dm>
Even though normal users should rarely be confronted with too many of the technical details of request processing in DaCHS, it helps to have a rough comprehension in order to understand several user-visible details.
In DaCHS' architecture, a service is essentially a combination of a core and a renderer. The core is what actually does the query or the computation, the renderer adapts input and outputs to what a protocol or interface expects. While a service always has exactly one core (could be a nullCore, though), it can support more than one renderer, although the parameters in all renderers are, within reason, about the same, within reason.
However, parameters on a form interface will typcially be interpreted differently from a VO interface on the same core. For instance, ranges on the form interface are written as 1 .. 3 (VizieR compliance), on an SSA 1.x interface 1/3 ("PQL" prototype), and on a datalink dlget interface "1 2" (DALI 1.1 style). The extreme of what probably still makes sense is the core search core that replaces SCS's RA, DEC, and SR with an entirely different set of parameters perhaps better suited for interactive, browser-based usage.
Cores communicate their input interface by defining an input table, which is essentially a sequence of input keys, which in turn essentially work like params: in particular, they have all the standard metadata like units, ucds, etc. Input tables, contrary to what their name might suggest, have no rows. They can hold metadata, though, which is sometimes convenient to pass data between parameter parsers and the core.
When a request comes in, the service first determines the renderer responsible. It then requests an inputTable for that renderer from the core. The core, in turn, will map each inputKey in its inputTable through a renderer adaptor as returned from svcs.inputdef.getRendererAdaptor; this inspects the renderer.parameterStyle, which must be taken from the svcs.inputdef._RENDERER_ADAPTORS' keys (currently form, pql, dali). inputKeys have to have the adaptToRenderer property set to True to have them adapted. Most automatically generated inputKeys have that; where you manually define inputKeys, you would have to set the property manually if you want that behaviour (and know that you want it; outside of table-based cores, it is unikely that you do).
The input table, together with the raw arguments coming from the client, is then used to build a svcs.CoreArgs instance, which in turn takes the set of input keys to build a context grammar. The core args have the underlying input table (with the input keys for the metadata) in the inputTD attribute, the parsed arguments in the dictionary args.
For each input key args maps its name to a value; context grammars are case-semisensitive, meaning that case in the HTTP parameter names is in general ignored, but if a parameter name matching case is found, it is preferred. Yes, ugly, but unfortunately the VO has started with case-insensitive parameter names. Sigh.
The values in args are a bit tricky:
These rules are independent of the type of core and hold for pythonCores or whatever just as for the normal, table-based cores. For these (and they are what users are mostly concerned with), special rules and shortcuts apply, though.
You will usually deal with cores querying database tables – dbCore, ssapCore, etc. For these, you will not normally define an inputTable, as it is being generated by the software from condDescs.
To create simple constraints, just buildFrom the columns queried:
<condDesc buildFrom="myColumn"/>
(the names are resolved in the core's queried table). DaCHS will automatically adapt the concrete parameter style is adapted to the renderer – in the web interface, there are vizier-like expressions, in protocol interfaces, you get fields understanding expressions, either as in SSAP (for the pql parameter style) or as defined in DALI (the dali parameter style).
This will generate query fields that work against data as stored in the database, with some exceptions (columns containing MJDs will, for example, be turned into VizieR-like date expressions for web forms).
Since in HTML forms, astronomers often ask for odd units and then want to input them, too, DaCHS will also honor the displayUnit display hint for forms. for instance, if you wrote:
<table id="ex1"> <column name="minDist" unit="deg" displayHint="displayUnit=arcsec"/> ... <dbCore queriedTable="ex1"> <condDesc buildFrom="minDist"/> ...
then the form renderer would declare the minDist column to take its values in arcsecs and do the necessary conversions, while minDist would properly work with degrees in SCS or TAP.
For object lists and similar, it is frequently desirable to give the possible values (unless there are too many of those; these will be translated to option lists in forms and to metadata items for protocol services and hence be user visible). In this case, you need to change the input key itself. You can do this by deriving the input key from the column and assign it to a condDesc, like this:
<condDesc> <inputKey original="source"> <values fromdb="source from plc.data"/> </inputKey> </condDesc>
Use the showItems="n" attribute of input keys to determine how many items in the selector are shown at one time.
If you want your service to fail if a parameter is not given, declare the condDesc as required:
<condDesc buildFrom="myColumn" required="True"/>
(you can also declare individual an inputKey as required).
If, on the other hand, you want DaCHS to fill in a default if the user provides no value, give a default to the input key using the values child:
<condDesc> <inputKey original="radius"> <values default="0.5"/> </inputKey> <condDesc>
Sometimes a parameter shouldn't be defaulted in a protocol request (perhaps to satisfy an external contract), while the web interface should pre-fill a sensible choice. In that case, use the defaultForForm property:
<condDesc> <inputKey original="radius"> <property key="defaultForForm">0.5</property> </inputKey> <condDesc>
DaCHS will also interpret min and max attributes on the input keys (and the columns they are generated from) to generate input hints; that's a good way to fight the horror vacui users have when there's an input box and they have no idea what to put there. The best way to deal with this, however, is to not change the input keys but the columns themselves, as in:
<table id="ex1"> <column name="mjd" type="double precision" ...> <values min="" max=""/> ... <dbCore queriesTable="ex1"> <condDesc buildFrom="mjd"/>
You will typically leave min and max empty and run:
dachs limits q#ex
when the table contents change; this will make DaCHS update the values in the RD itself.
CondDescs will generate SQL adapted to the type of their input keys, which; as you can imagine, for cases like the VizieR expressions, that's not done in a couple of lines. However, there are times when you need custom behaviour. You can then give your conddescs a phraseMaker, a piece of python code generating a query and adding parameters:
<condDesc> <inputKey original="confirmed" multiplicity="single"> <property name="adaptToRenderer">False</property> </inputKey> <phraseMaker> <code> if inPars.get(inputKeys[0].name, False): yield "confirmed IS NOT NULL" </code> </phraseMaker> </condDesc>
PhraseMakers work like other code embedded in RDs (and thus may have setup). inPars gives a dictionary of the input parameters as parsed by the inputDD according to multiplicity. inputKeys contains a sequence of the conddesc's inputKeys. By using their names as above, your code will not break if the parameters are renamed.
It is usually a good idea to set the property adaptToRenderer to False in such cases – you generally don't want DaCHS to use its standard rules for input key adaptation as discussion above because that will typically change what ends up in inPars and hence break your code for some renderers.
Note again that parameters not given will have the value None throughout. The will be present in inPars, though, so do not try things like "myName" in inPars – that's always true.
Phrase makers must yield zero or more SQL fragments; multiple SQL fragments are joined in conjunctions (i.e., end up in ANDed conditions in the WHERE clause). If you need to OR your fragments, you'll have to do that yourself. Use the base.joinOperatorExpr(operator, operands) for robustness to construct ORs.
Since you are dealing with raw SQL here, never include material from inPars directly in the query strings you return – this would immediately let people do SQL injections at least when the input key's type is text or similar. Instead, use the getSQLKey function as in this example:
<condDesc> <inputKey original="hdwl" multiplicity="single"/> <phraseMaker> <code> ik = inputKeys[0] destRE = "^%s\\.[0-9]*$"%inPars[ik.name] yield "%s ~ (%%(%s)s)"%(ik.name, base.getSQLKey("destRE", destRE, outPars)) </code> </phraseMaker> </condDesc>
getSQLKey takes a suggested name, a value and a dictionary, which within phrase makers always is outPars. It will enter value with the suggested name as key into outPars or change the suggested name if there is a name clash. The generated name will be returned, and that is what is entered in the SQL statement.
The outPars dictionary is shared between all conddescs entering into a query. Hence, if you do anything with it except passing it to base.getSQLKey, you're voiding your entire warranty.
Here's how to define a condDesc doing a full text search in a column:
<condDesc> <inputKey original="source" description="Words from the catalog description, e.g., author names or title words."> <property name="adaptToRenderer">False</property> </inputKey> <phraseMaker> <code> yield ("to_tsvector('english', source)" " @@ plainto_tsquery('english', %%(%s)s)")%( base.getSQLKey("source", inPars["source"], outPars)) </code> </phraseMaker> </condDesc>
Incidentally, this would go with an index definition like:
<index columns="source" method="gin" >to_tsvector('english', source)</index>
For special effects, you can group inputKeys. This will make them show up under a common label and in a single line in HTML forms. Other renderers currently don't do anything with the groups.
Here's an example for a simple range selector:
<condDesc> <inputKey name="el" type="text" tablehead="Element"/> <inputKey name="mfmin" tablehead="Min. Mass Fraction \item"> <property name="cssClass">a_min</property> </inputKey> <inputKey name="mfmax" tablehead="Max. Mass Fraction \item"> <property name="cssClass">a_max</property> </inputKey> <group name="mf"> <description>Mass fraction of an element. You may leave out either upper or lower bound.</description> <property name="label">Mass Fraction between...</property> <property name="style">compact</property> </group> </condDesc>
You will probably want to style the result of this effort using the service element's customCSS property, maybe like this:
<service...> <property name="customCSS"> input.a_min {width: 5em} input.a_max {width: 5em} input.formkey_min {width: 6em!important} input.formkey_max {width: 6em!important} span.a_min:before { content:" between "; } span.a_max:before { content:" and "; } tr.mflegend td { padding-top: 0.5ex; padding-bottom: 0.5ex; border-bottom: 1px solid black; } </property> </service>
See also the entries on multi-line input, selecting input fields with a widget, and customizing generated SCS conditions in DaCHS' howto document.
When determining what columns to include in a response from a table-based core, DaCHS follows relatively complicated rules because displays in the browser and almost anywhere else are subject to somewhat different constraints. In the following, when wie talk about “VOTable”, we refer to all tabular formats produced by DaCHS (FITS binary, CSV, TSV...).
The column selection is influenced by:
Verbosity. This is controlled by the VERB parameter (1..3) or preferentially verbosity (1..30). Only columns with verbLevel not exceeding verbosity (or, if not given, VERB*10) are included in the result set. This, in particular, means that columns with verbLevel larger than 30 are never automatically included in output tables (but they can be manually selected for HTML using _ADDITEM).
Output Format. While VOTable takes the core's output table and apply the verbosity filter, HTML uses the service's output table as the basis from which to filter columns. On the other hand, in HTML output the core output table is used to create the list of potential additional columns.
votableRespectsOutputTable. This is a property on services that makes DaCHS use the service's output table even when generating VOTable output if it is set to True. Write:
<property name="votableRespectsOutputTable">True</property>
in your service element to enable this behaviour.
_ADDITEM. This parameter (used by DaCHS' web interface) lets users select columns not selected by the current settings or the service's output table. _ADDITEM is ignored in VOTable unless in HTML mode (which is used in transferring web results via SAMP).
noxml. Columns can be furnished with a displayHint="noxml=true", and they will never be included in VOTable output; use this when you use complex formatters to produce HTML displays.
_SET. DaCHS supports “column sets”, for instance, to let users select certain kinds of coordinates. See apfs/res/apfs_new.rd` for an example. Essentially, when defining an output table, each output field gets a sets attribute (default: no set; use ALL to have the column included in all outputs). Then, add a _SET service parameter (use values do declare the available sets). Note that the _SET parameter changes VOTable column selection to votableRespectsOutputTable mode as discussed above. Services that use column sets should therefore set the property manually for consistency whether or not clients actually pass _SET.
Sorry for this mess; all this had, and by and large still has, good reasons.
DaCHS' URL scheme leads to somewhat clunky URLs that, in particular, reflect the file system underneath. While this doesn't matter to the VO registry, it is possibly unwelcome when publishing URLs outside of the VO. To overcome it, you can define "vanity names", single path elements that are mapped to paths.
These mappings are read from the file $GAVO_ROOT/etc/vanitynames.txt. The file contains lines of the format:
<target> <key> [<option>]
Target is a path that must not include nevowRoot and must not start with a slash (unless you're going for very special effects).
Key normally is a single path element (i.e., a string without a slash). If this path element is found in the first segment, it is replaced with the segments in target.
<option> can only be !redirect or empty right now.
If it is !redirect, <key> may be a path fragment (as opposed to a single path element); leading and trailing slashes are ignored. If the enire query path matches this key, a redirect to this key is generated. This is intended to let you shut down services and introduce replacements. If the incoming URL contains a query, it will be appended to the replacement URL. Thus, even stored queries or forms can potentially work across such a redirect.
You can also (ab)use the redirect option to give vanity names, but since the target will show up in the browser address line, normal maps are highly preferred. The only time normal maps don't work for this is when the resource directory is identical to the vanity name (you'll get an endless loop then), so you should avoid that situation.
Empty lines and #-on-a-line-comments are allowed in the input.
As an example, here's the vanity map that DaCHS had builtin as of version 2.1:
__system__/products/p/get getproduct __system__/products/p/dlasync datalinkuws __system__/services/registry/pubreg.xml oai.xml __system__/services/overview/external odoc __system__/dc_tables/show/tablenote tablenote __system__/dc_tables/show/tableinfo tableinfo __system__/services/overview/admin seffe __system__/services/overview/rdinfo browse __system__/tap/run/tap tap __system__/adql/query/form adql !redirect __system__/run/genrd genrd
Note again that <key> must be a single path element only.
While DaCHS provides cores for many common operations – in particular, database queries and wrapped external binaries –, there are of course services that need to do things not covered by what the shipped cores do. A common case is wrapping external binaries.
Many such cases still follow the basic premise of services: GET or POST parameters in, something table-like out. You should then use custom cores, which then still let you use normal DaCHS renderers (in particular form and api/sync). When that doesn't cut it, you'll need to use a custom renderer.
While a custom core is defined in a separate module – this also helps debugging since you can run it outside of DaCHS –, there's also the python core that keeps the custom code inside of the RD. This is very similar; Python Cores instead of Custom Cores explains the differences.
The following exposition is derived from the times service in the GAVO data center, a service wrapping some FORTRAN code wrapping SOFA (yes, we're aware that we would directly use SOFA through astropy; that's not the point here). Check out the sources at http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/apfs; the RD is times.rd.
In an RD, a custom core is very typically just written with a reference to a defining module:
<customCore module="res/timescore"/>
The path is relative to the resdir, and you don't include the module's extension (DaCHS uses normal python module resolution, except for temporarily extending the search path with the enclosing directory). You can, in principle, declare the core's interface in that element, but that's typically not a good idea (see below).
The above declaration means you will find the core itself in res/timescore.py.
Ideally, you'll just use the DaCHS API in the core, since we try fairly hard to keep that api constant. The timescore doesn't quite follow that rule because it wants to expand VizieR expressions, which normal services probably won't do.
DaCHS expects the custom core under the name Core. Thus, the centerpiece of the module is:
from gavo import api class Core(api.Core):
The core needs an InputTable and an OutputTable like all cores. You could define it in the resource descriptor like this:
<customCore id="createCore" module="bin/create"> <inputTable> <inputKey .../> </inputTable> <outputTable> <column name="itemsAdded" type="integer" tablehead="Items added"/> </outputTable> </customCore>
It's preferable to define at least the input in the code, though, since it's more likely to be kept in sync with the code in that case. Embedding the definitions is done using the class attribute inputTableXML:
class Core(core.Core): inputTableXML = """ <inputTable> <inputKey name="ut1" type="vexpr-date" multiplicity="single" tablehead="UT1" description="Date and time (UT1)" ucd="time.epoch;meta.main"/> <inputKey name="interval" type="integer" multiplicity="single" tablehead="Interval" unit="s" ucd="time.interval" description="Interval between two sets of computed values" >3600</inputKey> </inputTable> """
There is also outputTableXML, which you should use if you were to compute stuff in some lines of Python, since then the fields are directly defined by the core itself.
However, the case of timescore is fairly typical: There is some, essentially external, resource that produces something that needs to be parsed. In that case, it's a better idea to define the parsing logic in a normal RD data item. Its table then is the output table of the core. In the times example, the output of timescompute is described by the build_result data item in times.rd:
<table id="times"> <column name="ut1" type="timestamp" tablehead="UT1" ucd="time.epoch;meta.main" verbLevel="1" description="Time and date (UT1)" displayHint="type=humanDate"/> <column name="gmst" type="time" tablehead="GMST" verbLevel="1" description="Greenwich mean sidereal time" xtype="adql:TIMESTAMP" displayHint="type=humanTime,sf=4"/> <column name="gast" type="time" tablehead="GAST" verbLevel="1" description="Greenwich apparent sidereal time" xtype="adql:TIMESTAMP" displayHint="type=humanTime,sf=4"/> <column name="era" type="double precision" tablehead="ERA" verbLevel="1" description="Earth rotation angle" displayHint="type=dms,sf=3" unit="deg"/> </table> <data id="build_result" auto="False"> <reGrammar> <names>ut1,gmst,gast,era</names> </reGrammar> <make table="times"> <rowmaker> <map dest="gmst">parseWithNull(@gmst, parseTime, "None")</map> ... </rowmaker> </make> </data>
So, the core needs to say “my output table has the structure of #times”.
As usual with DaCHS structures, you should not override the constructor, as it is defined by a metaclass. Instead, Cores call, immediately after the XML parse (technically, as the first thing of their completeElement method), a method called initialize. This is where you should set the output table. For the times core, this looks like this:
def initialize(self): self.outputTable = api.OutputTableDef.fromTableDef( self.rd.getById("times"), None)
Of course, you are not limited to setting the output table there; as initialize is only called once while parsing, this is also a good place to perform expensive, one-time operations like reading and parsing larger external resources.
To have the core do something, you have to override the run method, which has to have the following signature:
run(service, inputTable, queryMeta) -> stuff
The stuff returned will usually be a Table or Data instance (that need not match the outputTable definition -- the latter is targeted at the registry and possibly applications like output field selection). The standard renderers also accept a pair of mime type and a string containing some data and will deliver this as-is. With custom renderers, you could return basically anything you want.
Services come up with some idea of the schema of the table they want to return and adapt tables coming out of the core to this. Sometimes, you want to suppress this behaviour, e.g., because the service's ideas are off. In that case, set a noPostprocess attribute on the table to any value (the TAP core does this, for instance).
In service you get the service using the core; this may make a difference since different services can use the same core and could control details of its operations through properties, their output table, or anything else.
The inputTable argument is the CoreArgs instance discussed in Core Args. Essentially, you'll usually use its args attribute, a dictionary mapping the keys defined by your input table to values or lists of them.
The queryMeta argument is discussed in Database Options.
In the times example, the parameter interpretation is done in an extra function (which helps testability when there's a bit more complex things going on):
def computeDates(args): """yields datetimes at which to compute times from the ut1/interval inputs in coreArgs args. """ interval = args["interval"] or 3600 if args["ut1"] is None: yield datetime.datetime.utcnow() return try: expr = vizierexprs.parseDateExpr(args["ut1"]) if expr.operator in set([',', '=']): for c in expr.children: yield c elif expr.operator=='..': for c in expandDates(expr.children[0], expr.children[1], interval): yield c elif expr.operator=="+/-": d0, wiggle = expr.children[0], datetime.timedelta( expr.expr.children[1]) for c in expandDates(d0-wiggle, d0+wiggle): yield c else: raise api.ValidationError("This sort of date expression" " does not make sense for this service", colName="ut1") except base.ParseException, msg: raise api.ValidationError( "Invalid date expression (at %s)."%msg.loc, colName="ut1")
While the details of the parameter parsing and expansion don't really matter, note now exceptions are mapped to a ValidationError and give a colName – this lets the form renderer display error messages next to the inputs that caused the failure.
The next thing timescore does is build some input, which in this case is fairly trivial:
input = "\n".join(utils.formatISODT(date) for date in dates)+"\n"
If your input is more complex or you need input files or similar, you want to be a bit more careful. In particular, do not change directory (or, equivalently, use the utils.sandbox context manager); this may confuse the server, and in particular will break the first time two requests are served simultaneously: The core runs within the main process, and that can only have one current directory.
Instead, in such situations, make a temporary directory and manually place your inputs in there. The spacecore (http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/sp_ace/res/spacecore.py) shows how this could look like, including tearing the stuff down safely when done (the runSpace function).
For the timescore, that is not necessary; you just run the wrapped program using standard subprocess functionality:
computer = service.rd.getAbsPath("bin/timescompute") pipe = subprocess.Popen([computer], stdin=subprocess.PIPE, stdout=subprocess.PIPE, close_fds=True, cwd=os.path.dirname(computer)) data, errmsg = pipe.communicate(input) if pipe.returncode: raise api.ValidationError("The backend computing program failed" " (exit code %s). Messages may be available as" " hints."%pipe.returncode, "ut1", hint=errmsg)
Note that with today's computers, you shouldn't need to worry about streaming input or output until they are in the dozens of megabytes (in which case you should probably think hard about a custom UWS and keep the files in the job's working directories).
To turn the program's output into a table, you use the data item defined in the RD:
return api.makeData( self.rd.getById("build_result"), forceSource=StringIO(data))
When the core defines the data itself, you would skip makeData. Just directly produce the rowdicts and make the output table directly from the rows:
rows = [{"foo": 3*i, "bar": 8*i} for i in range(30)] return api.TableForDef(self.outputTable, rows=rows)
The standard DB cores receive a “table widget” on form generation, including sort and limit options. To make the Form renderer output this for your core as well, define a method wantsTableWidget() and return True from it.
The queryMeta that your run method receives has a dbLimit key. It contains the user selection or, as a fallback, the global db/defaultLimit value. These values are integers.
So, if you order a table widget, you should do something like:
cursor.execute("SELECT .... LIMIT %(queryLimit)s", {"queryLimit": queryMeta["dbLimit"],...})
In general, you should warn people if the query limit was reached; a simple way to do that is:
if len(res)==queryLimit: res.addMeta("_warning", "The query limit was reached. Increase it" " to retrieve more matches. Note that unsorted truncated queries" " are not reproducible (i.e., might return a different result set" " at a later time).")
where res would be your result table. _warning metadata is displayed in both HTML and VOTable output, though of course VOTable tools will not usually display it.
If you only have a couple of lines of python, you don't have to have a separate module. Instead, use a python core. In it, you essentially have the run method as discussed in Giving the Core Functionality in a standard procApp. The advantage is that interface and implementation is nicely bundled together. The following example should illustrate the use of such python cores; note that rsc already is in the procApp's namespace:
<pythonCore> <inputTable> <inputKey name="opre" description="Operand, real part" required="True"/> <inputKey name="opim" description="Operand, imaginary part" required="True"/> <inputKey name="powers" description="Powers to compute" type="integer" multiplicity="multiple"/> </inputTable> <outputTable> <outputField name="re" description="Result, real part"/> <outputField name="im" description="Result, imaginary part"/> <outputField name="log" description="real part of logarithm of result"/> </outputTable> <coreProc> <setup imports="cmath"/> <code> powers = inputTable.args["powers"] if not powers: powers = [1,2] op = complex(inputTable.args["opre"], inputTable.args["opim"]) rows = [] for p in powers: val = op**p rows.append({ "re": val.real, "im": val.imag, "log": cmath.log(val).real}) return api.TableForDef(self.outputTable, rows=rows) </code> </coreProc> </pythonCore>
Things break – perhaps because someone foolishly dropped a database table, because something happened in your upstream, because you changed something or even because we changed the API (if that's not mentioned in Changes, we owe you a beverage of your choice). Given that, having regression tests that you can easily run will really help your peace of mind.
Therefore, DaCHS contains a framework for embedding regression tests in resource descriptors. Before we tell you how these work, some words of advice, as writing useful regression tests is an art as much as engineering.
Don't overdo it. There's little point in checking all kinds of functionality that only uses DaCHS code – we're running our tests before committing into the repository, and of course before making a release. If the services just use condDescs with buildFrom and one of the standard renderers, there's little point in testing beyond a request that tells you the database table is still there and contains something resembling the data that should be there.
Don't be over-confident. Just because it seems trivial doesn't mean it cannot fail. Whatever code there is in the service processing of your RD, be it phrase makers, output field formatters, custom render or data functions, not to mention custom renderers and cores, deserves regression testing.
Be specific. In choosing the queries you test against, try to find something that won't change when data is added to your service, when you add input keys or when doing similar maintenance-like this. Change will happen, and it's annoying to have to fix the regression test every time the output might legitimately change. This helps with the next point.
Be pedantic. Do not accept failing regression tests, even if you think you know why they're failing. The real trick with useful testing is to keep "normal" output minimal. If you have to "manually" ignore diagnostics, you're doing it wrong. Also, sometimes tests may fail "just once". That's usually a sign of a race condition, and you should really try to figure out what's going on.
Make it fail first. It's surprisingly easy to write no-op tests that run but won't fail when the assertion you think you're making is no longer true. So, when developing a test, assert something wrong first, make sure there's some diagnostics, and only then assert what you really expect.
Be terse. While in unit tests it's good to test for maximally specific properties so failing unit tests lead you on the right track as fast as possible, in regression tests there's nothing wrong with plastering a number of assertions into one test. Regression tests actually make requests to a web server, and these are comparatively expensive. The important thing here is that regression testing is fast enough to let you run them every time you make a change.
DaCHS' regression testing framework is organized a bit along the lines of python's unittest and its predecessors, with some differences due to the different scope.
So, tests are grouped into suites, where each suite is contained in a regSuite element. These have a (currently unused) title and a boolean attribute sequential intended for when the tests contained must be executed in the sequence specified and not in parallel. It defaults to false, which means the requests are made in random order and in parallel, which speeds up the test runs and, in particular, will help uncover race conditions.
On the other hand, if you're testing some sort of interaction across requests (e.g., make an upload, see if it's there, remove it again), this wouldn't work, and you must set sequential="True". Keep these sequential suites as short as possible. In tests within such suites (and only there), you can pass information from one test to the following one by adding attributes to self.followUp (which are available as attributes of self in the next test). If you need to manipulate the next URL, it's at self.followUp.url.content_. For the common case of a redirect to the url in the location header (or a child thereof), there's the pointNextToLocation(child="") method of regression tests. In the tests that are manipulated like this, the URL given in the RD should conventionally be overridden in the previous test. Of course, additional parameters, httpMethods, etc, are still applied in the manipulated url element.
Regression suites contain tests, represented in regTest elements. These are procDefs (just like, e.g., rowmakery apply), so you can have setup code, and you could have a library of parametrizable regTests procDefs that you'd then turn into regTests by setting their parameters. We've not found that terribly useful so far, though.
You must given them a title, which is used when reporting problems with them. Otherwise, the crucial children of these are url and, as always with procDefs, code.
Here are some hints on development:
The url element encapsulates all aspects of building the request. In the simplest case, you just can have a simple URL, in which case it works as an attribute, like this:
<regTest title="example" url="svc/form"> ...
URLs without a scheme and a leading slash are interpreted relative to the RD's root URL, so you'd usually just give the service id and the renderer to be applied. You can also specify root-relative and fully specified URLs as described in the documentation of the url element.
White space in URLs is removed, which lets you break long URLs as convenient.
You could have GET parameters in this URL, but that's inconvient due to both XML and HTTP escaping. So, if you want to pass parameters, just give them as attributes to the element:
<regTest title="example"> <url RA="10" DEC="-42.3" SR="1" parSet="form">svc/form</url>
The parSet=form here sets up things such that processing for the form renderer is performed – our form library nevow formal has some hidden parameters that you don't want to repeat in every URL.
To easily translate URLs taken from a browser's address bar or the form renderer's result link, you can run dachs totesturl and paste the URLs there. Note that totesturl fails for values with embedded quotes, takes only the first value of repeated parameters and is a over-quick hack all around. Patches are gratefully accepted.
The url element hence accepts arbitrary attributes, which can be a trap if you think you've given values to url's private attributes and mistyped their names. If uploads or authentication don't seem to happen, check if your attribute ended up the in the URL (which is displayed with the failure message) and fix the attribute name; most private url attributes start with http. If you really need to pass a parameter named like one of url's private attributes, pass it in the URL if you can. If you can't because you're posting, spank us. After that, we'll work out something not too abominable .
If you have services requiring authentication, use url's httpAuthKey attribute. We've introduced this to avoid having credentials in the RD, which, after all, should reside in a version control system which may be (and in the case of GAVO's data center is) public. The attribute's value is a key into the file ~/.gavo/test.creds, which contains, line by line, this key, a username and a password, e.g.:
svc1 testuser notASecret svc2 regtest NotASecretEither
A test using this would look like this:
<regTest title="Authenticated user can see the light"> <url httpAuthKey="svc1">svc1/qp/light.txt</url> <code> self.assertHTTPStatus(200) </code> </regTest>
By default, a test will perform a GET request. To change this, set the httpMethod attribute. That's particularly important with uploads (which must be POSTed).
For uploads, the url element offers two facilities. You can set a request payload from a file using the postPayload attribute (the path is interpreted relative to the resource directory), but it's much more common to do a file upload like browsers do them. Use the httpUpload element for this, as in:
<url> <httpUpload name="UPLOAD" fileName="remote.txt">a,b,c</httpUpload> svc1/async </url>
(which will work as if the user had selected a file remote.txt containing "a,b,c" in a browser with a file element named UPLOAD), or as in:
<url> <httpUpload name="UPLOAD" fileName="remote.vot" source="res/sample.regtest"/> svc1/async </url>
(which will upload the file referenced in source, giving the remote server the filename remote.vot). The fileName attribute is optional.
Finally, you can pass arbitrary HTTP headers using the httpHeader element. This has an attribute key; the header's value is taken from the element content, like this:
<url postPayload="res/testData.regtest" httpMethod="POST"> <httpHeader key="content-type">image/jpeg</httpHeader> >upload/custom</url>
Since regression tests are just procDefs, the actual assertions are contained in the code child of the regTest. The code in there sees the test itself in self, and it can access
Incidentally, that last name is right; the regression framework only supports http, and it's not terribly likely that we'll change that.
You should probably only access those attributes in a pinch and instead use the pre-defined assertions, which are methods on the test objects as in pyunit – conventional assertions are clearer to read and less likely to break if fixes to the regression test API become necessary. If you still want to have custom tests, raise AssertionErrors to indicate a failure.
Here's a list of assertion methods defined right now:
checks that all its arguments are found within content.
If string arguments are passed, they are utf-8 encoded before comparison. If that's not what you want, pass bytes yourself.
checks that header key has value in the response headers.
keys are compared case-insensitively, values are compared literally.
checks whether the returned data are XSD valid.
This uses DaCHS built-in XSD validator with the built-in schema files; it hence will in general not retrieve schema files from external sources.
checks an xpath assertion.
path is an xpath (as understood by lxml), with namespace prefixes statically mapped; there's currently v2 (VOTable 1.2), v1 (VOTable 1.1), v (whatever VOTable version is the current DaCHS default), h (the namespace of the XHTML elements DaCHS generates), m (the provisional MIVOT namespace) and o (OAI-PMH 2.0).
If you need more prefixes, hack the source and feed back your changes (or just add to self.XPATH_NAMESPACE_MAP locally).
path must match exactly one element.
assertions is a dictionary mapping attribute names to their expected value. Use the key None to check the element content, and match for None if you expect an empty element. To match against a namespaced attribute, you have to give the full URI; prefixes are not applied here. This would look like:
"{http://www.w3.org/2001/XMLSchema-instance}type": "vg:OAIHTTP"
If you need an RE match rather than equality, there's EqualingRE in your code's namespace.
interprets data as a VOTable and returns the first row as a dictionary
It will normally ensure that only one row is returned. To make it silently discard extra rows, make sure the result is sorted, or you will get randomly failing tests. Database-querying cores (which is where order is an issue) also honor _DBOPTIONS_ORDER).
returns seq[0], asserting at the same time that len(seq) is 1.
The idea is that you can say row = self.getUnique(self.getVOTableRows()) and have a nice test on the side -- and no ugly IndexError on an empty respone.
returns the equivalent of tree.xpath(path) for an lxml etree of the current document or in element, if passed in.
This uses the same namespace conventions as assertXpath.
All of these are methods, so you would actually write self.assertHasStrings('a', 'b', 'c') in your test code (rather than pass self explicitly).
When writing tests, you can, in addition, use assertions from python's unittest TestCases (e.g., assertEqual and friends). This is provided in particular for use to check values in VOTables coming back from services together with the getFirstVOTableRow method.
Also please note that, like all procDef's bodies, the test code is macro-expanded by DaCHS. This means that every backslash that should be seen by python needs to be escaped itself (i.e., doubled). An escaped backslash in python thus is four backslashes in the RD.
Finally, here's a piece of .vimrc that inserts a regTest skeleton if you type ge in command mode (preferably at the start of a line; you may need to fix the indentation if you're not indenting with tabs. We've thrown in a column skeleton on gn as well:
augroup rd au! autocmd BufRead,BufNewFile *.rd set ts=2 tw=79 au BufNewFile,BufRead *.rd map gn i<tab><tab><lt>column name="" type=""<CR><tab>unit="" ucd=""<CR>tablehead=""<CR>description=""<CR>verbLevel=""/><CR><ESC>5kf"a au BufNewFile,BufRead *.rd map ge i<tab><tab><lt>regTest title=""><CR><tab><lt>url><lt>/url><CR><lt>code><CR><lt>/code><CR><BS><lt>/regTest><ESC>4k augroup END
The first mode to run the regression tests is through dachs val. If you give it a -t flag, it will collect regression tests from all the RDs it touches and run them. It will then output a brief report listing the RDs that had failed tests for closer inspection.
It is recommended to run something like:
dachs val -tv ALL
before committing changes into your inputs repository. That way, regressions should be caught.
The tests are ran against the server described through the [web]serverURL config item. In the recommended setup, this would be a server started on your own development machine, which then would actually test the changes you made.
There is also a dedicated gavo sub-command test for executing the tests. This is what you should be using for developing tests or investigating failures flagged with dachs val. On its command line, you can give on of an RD id or a cross-rd reference to a test suite, or a cross-rd reference to an individual test. For example,
dachs test res1/q dachs test res2/q#suite1 dachs test res2/q#test45
would run all the tests given in the RD res1/q, the tests in the regSuite with the id suite1 in res2/q, and a test with id="test45 in res2/q, respectively.
To traverse inputs and run tests from all RDs found there, as well as tests from the built-in RDs, run:
dachs test ALL
dachs test by default has a very terse output. To see which tests are failing and what they gave as reasons, run it with the '-v' option.
To debug failing regression tests (or maybe to come up with good things to test for), use '-d', which dumps the server response of failing tests to stdout.
In the recommended setup with a production server and a development machine sharing a checkout of the same inputs, you can exercise production server from the development machine by giving the -u option with what your production server has in its [web]serverURL configuration item. So,
dachs test -u http://production.example.com ALL
is what might help your night's sleep.
Here are some examples how these constructs can be used. First, a simple test for string presence (which is often preferred even when checking XML, as it's less likely to break on schema changes; these usually count as noise in regression testing). Also note how we have escaped embedded XML fragments; an alternative to this shown below is making the code a CDATA section:
<regTest title="Info page looks ok" url="siap/info"> <code> self.assertHasStrings("SIAP Query", "siap.xml", "form", "Other services", "SIZE</td>", "Verb. Level") </code> </regTest>
The next is a test with a "rooted" URL that's spanning lines, has embedded parameters (not recommended), plus an assertion on binary data:
<regTest title="NV Maidanak product delivery" url="/getproduct/maidanak/data/Q2237p0305/Johnson_R/ red_kk050001.fits.gz?siap=true"> <code> self.assertHasStrings('\\x1f\\x8b\\x08\\x08') </code> </regTest>
This is how parameters should be passed into the request:
<regTest title="NV Maidanak SIAP returns accref."> <url POS="340.12,3.3586" SIZE="0.1" INTERSECT="OVERLAPS" _TDENC="True" _DBOPTIONS_LIMIT="10">siap/siap.xml</url> <code> self.assertHasStrings('<TD>AZT 22') </code> </regTest>
Here's an example for a test with URL parameters and xpath assertions:
<regTest title="NV Maidanak SIAP metadata query" url="siap/siap.xml?FORMAT=METADATA"> <code> self.assertXpath("//v1:FIELD[@name='wcs_cdmatrix']", { "datatype": "double", "ucd": "VOX:WCS_CDMatrix", "arraysize": "*", "unit": "deg/pix"}) self.assertXpath("//v1:INFO[@name='QUERY_STATUS']", { "value": "OK", None: "OK",}) self.assertXpath("//v1:PARAM[@name='INPUT:POS']", { "datatype": "char", "ucd": "pos.eq", "unit": "deg"}) </code> </regTest>
The following is a fairly complex example for a stateful suite doing inline uploads (and simple tests):
<regSuite title="GAVO roster publication cycle" sequential="True"> <regTest title="Complete record yields some credible output"> <url httpAuthKey="gvo" parSet="form" httpMethod="POST"> <httpUpload name="inFile" fileName="testing_ignore.rd" ><![CDATA[ <resource schema="gvo"> <meta name="description">x</meta> <meta name="title">A test service</meta> <meta name="creationDate">2010-04-26T11:45:00</meta> <meta name="subject">Testing</meta> <meta name="referenceURL">http://foo.bar</meta> <nullCore id="null"/> <service id="run" core="null" allowed="external"> <meta name="shortName">u</meta> <publish render="external" sets="gavo"> <meta name="accessURL">http://foo/bar</meta> </publish></service></resource> ]]></httpUpload>upload/form</url> <code><![CDATA[ self.assertHasStrings("#Published</th><td>1</td>") ]]></code> </regTest> <regTest title="Publication leaves traces on GAVO list" url="list/custom"> <code> self.assertHasStrings( '"/gvo/data/testing_ignore/run/external">A test service') </code> </regTest> <regTest title="Unpublication yields some credible output"> <url httpAuthKey="gvo" parSet="form" httpMethod="POST"> <httpUpload name="inFile" fileName="testing_ignore.rd" ><![CDATA[ <resource schema="gvo"> <meta name="description">x</meta> <meta name="title">A test service</meta> <meta name="creationDate">2010-04-26T11:45:00</meta> <meta name="subject">Testing</meta> <meta name="referenceURL">http://foo.bar</meta> <service id="run" allowed="external"> <nullCore/> <meta name="shortName">u</meta></service></resource> ]]></httpUpload>upload/form</url> <code><![CDATA[ self.assertHasStrings("#Published</th><td>0</td>") ]]></code> </regTest> <regTest title="Unpublication leaves traces on GAVO list" url="list/custom"> <code> self.assertLacksStrings( '"/gvo/data/testing_ignore/run/external">A test service') </code> </regTest> </regSuite>
If you still run SOAP services, here's one way to test them:
<regTest id="soaptest" title="APFS SOAP returns something reasonable"> <url postPayload="res/soapRequest.regtest" httpMethod="POST"> <httpHeader key="SOAPAction">'"useService"'</httpHeader> <httpHeader key="content-type">text/xml</httpHeader >qall/soap/go</url> <code> self.assertHasStrings( '="xsd:date">2008-02-03Z</tns:isodate>', '<tns:raCio xsi:type="xsd:double">25.35') </code> </regTest>
– here, res/soapRequest.regtest would contain the request body that you could, for example, extract from a tcpdump log.
[Datalink] is an IVOA protocol that allows associating various products and artifacts with a data set id. Think the association of error or mask maps, progenitor datasets, or processed data products, with a data set.
It also lets you associate data processing services with datasets, which allows on-the-fly generation of cutouts, format conversions or recalibrations; a particular set of parameters for working with certain kinds of cubes is described in a standard called [SODA] (Server-side Operations for Data Access). Hence, we sometimes call the processing part of datalink SODA.
In DaCHS, Datalink is implemented by the dlmeta renderer, SODA by the dlget renderer. In all but fairly exotic cases, both renderers are used on the same service. While in DaCHS, you cannot use SODA without Datalink, there are perfectly sensible datalink services without SODA. In the following, we first treat the generation of “normal” datalinks and discuss processing services later.
A central term for datalink is the pubDID, or publisher DID. This is an identifier assigned (essentially) by you that points to a concrete dataset. In DaCHS, datalink services always use pubDIDs as the values of the datalink ID parameter.
Unless you arrange things differently (for which you should have good reasons), the pubDIDs used by DaCHS are formed as:
<authority>/~?<accref>
where the accref usually is the inputsDir-relative path to the file. If you use datalinks of that form, you should at some point run dachs pub //products; this will register the products deliverer as <authority>/~, which means that pubDIDs of this form are compliant with [IVOA Identifiers]_
When developing datalink services, it sometimes is useful to access datalink services directly, in particular because they don't usually have a useful web interface. Armed with the knowledge about the structure of DaCHS standard PubDIDs, you can easily build the URLs and parameters. For instance, to retrieve the datalink document for mlqso/data/FBQ0951_data.fits on the server dc.g-vo.org using the datalink renderer on the mlqso/q/d service, you'd write:
curl -FID=ivo://org.gavo.dc/~?mlqso/data/slits/FBQ0951_data.fits \ http://dc.g-vo.org/mlqso/q/d/dlmeta | xmlstarlet fo
(of course, xmlstarlet isn't actually necessary, and you can use wget if you want, but you get the idea). Going on, you could pull out what parameters are mentioned somewhat like this:
curl -s -FID=ivo://org.gavo.dc/~?mlqso/data/slits/FBQ0951_data.fits \ http://dc.g-vo.org/mlqso/q/d/dlmeta | \ xmlstarlet sel -N v=http://www.ivoa.net/xml/VOTable/v1.3 -T \ -t -m "//v:PARAM" -v "@name" -nl
In the remainder of this section, we first discuss the generation of datalinks and processing services “by example”, which should do for a basic use of the facilities. We continue with a somewhat more in-depth look at the processing of a SODA request, after which we look more closely at the various elements that make up Datalink/SODA services.
You generally declare datalink services on the table(s) that contain the identifiers the datalink service accepts. For that, you include two pieces of metadata: The identifier of the datalink service (which can be a cross-RD id with a hash; use the _associatedDatalinkService.serviceId meta key) and the column name within the table (use the _associatedDatalinkService.idColumn meta key). Both items will only be checked at run time, and broken links will be reported as warnings. If the following doesn't give you the datalink resources in results involving the tables, be sure to check the dcInfos log file.
The following example is a table that contains two sorts of identifiers that are understood by two different datalink services; one, dlsvc within the same RD, works on values in the accref column, the other, taken from a (hypothetical) doires/q RD, would work on the doi column:
<table id="datasets" onDisk="True"> <meta name="_associatedDatalinkService"> <meta name="serviceId">dlsvc</meta> <meta name="idColumn">accref</meta> </meta> <meta name="_associatedDatalinkService"> <meta name="serviceId">doires/q#doidl</meta> <meta name="idColumn">doi</meta> </meta> <column name="accref" type="text".../> <column name="doi" type="text".../> </table> <service id="dlsvc" allowed="dlmeta,dlget"> <meta name="dlget.description">A service for slicing and dicing.</meta> ... </service>
Note that forward references, which are generally not allowed in DaCHS, are possible in serviceId and idColumn.
An older way to associate datalink services with tables is to give certain services (most notably, SSA ones) a datalink property. This is deprecated now. If you see it in examples, please tell us so we can fix it.
Sometimes it makes sense to directly have datalinks in table columns. To help clients notice that that is what they are, declare a target type by adding:
<property name="targetType" >application/x-votable+xml;content=datalink</property> <property name="targetTitle">Datalink</property>
into the element (column or outputField) content. In VOTables, this will be turned into LINK elements.
A dataset frequently has associated data, like error or weight maps, derived data, or pieces of provenance. Datalink lets you tie these together algorithmically, using a specialised core (see element DatalinkCore) and the dlmeta renderer.
To produce datalinks, the datalink core must be furnished with
Here is an example, adapted from boydende/q:
<datalinkCore> <descriptorGenerator procDef="//soda#fits_genDesc"/> <metaMaker semantics="#isMetadataFor"> <code> basename = descriptor.accref.split("/")[-1].split(".")[0] envPath = "data/static/envelopes/{0}.jpg".format(basename) yield descriptor.makeLinkFromFile( envPath, description="Scan of the plate envelope") </code> </metaMaker> </datalinkCore>
A descriptor generator – in the example, one that has additional functionality for FITS files, although the default (//soda#fromStandardPubDID) would work here, too – is passed the pubDID and returns an instance of datalink.ProductDescriptor (or a derived class). If a descriptor generator returns None, the datalink request will be rejected with a 404.
Whatever is returned by the descriptor generator is then available as descriptor to the remaining datalink procs (in this case, the meta makers). The columns of the product table (see dc.products) are available as attributes of this object. In addition, subclasses of data.ProductDescriptor may add more attributes; the fits_genDesc used in the example, for instance, provides a hdr attribute containing the primary header as given by pyfits.
The descriptor is then passed, in turn, to all meta makers given. These must yield LinkDef instances that describe additional data products; a single meta maker may yield zero or more of these. You generally should not construct LinkDefs yourself, as there are convenience methods doing that on the descriptor which prevent some common errors.
These methods are descriptor.makeLink (for when you have external links) and descriptor.makeLinkFromFile (for when you link to files published through a static renderer by DaCHS itself). These take some common arguments:
Except for semantics, all of these are optional and must be passed in as keyword arguments.
makeLinkFromFile additionally accepts a positional argument containing a local file name; makeLink instead has a URL in a string.
When returning link definitions, the tricky part mostly is to come up with the URLs. Use the makeAbsoluteURL rowmaker function to make them from relative URLs; the rest just depends on your URL scheme. An example could look like this:
<metaMaker semantics="#error"> <code> yield descriptor.makeLink( makeAbsoluteURL("get/"+descriptor.accref[:-5]+".err.fits"), contentType="image/fits", description="Errors for this dataset") </code> <metaMaker semantics="#progenitor"> <code> yield descriptor.makeLink( "http://foo.bar/raw/"+descriptor.accref.split("/")[-1], contentType="image/fits", description="Un-flatfielded, uncalibrated source data") </code> </metaMaker>
makeLinkFromFile will create NotFoundFault error links if the file does not exist, thus alerting the user (and possibly you) that an expected file was not there. When missing files are expectable and should not cause diagnostics, pass a suppressMissing=True to makeLinkFromFile.
To make this work, DaCHS will have to know how the file can be accessed from the web to be able to produce the link. The recommended pattern is shown in the example: the datalink service itself is used to deliver the static, non-product files. This is effected by declaring the service embedding the core somewhat like this:
<service id="dl" allowed="dlget,dlmeta,static"> <property name="staticData">data/static</property> <datalinkCore .../> </service>
Note that, of course, exposing a directory via the static renderer like this bypasses any access restrictions (e.g., embargoes) on the respective data. So, do not do with with your primary data if you want to enforce access control. Also, there currently is no way to control the media types returned in this way except by editing the system mime.types information. Let us know if that is a problem for you.
A LinkDef for the product itself (semantics #this) and, if defined in the product table, a preview (semantics #preview) is automatically added by DaCHS unless a suppressAutoLinks attribute is set on the descriptor (you can set that in a meta maker or the descriptor generator).
In DaCHS data processing services (“SODA services”) use the same datalink cores as the datalink services, and they share the same descriptor. A datalink core does data processing when used by the dlget renderer.
To enable data processing, datalink cores additionally need data functions (see element dataFunction) and up to one data formatter (see element dataFormatter). The first data function must add a data attribute to the descriptor and thus plays a somewhat special role.
Processing services also use meta makers, but instead of links, these yield parameter definitions in the form of InputKeys (they are used by the datalink services, too, because the datalink documents contain the metadata of the processing services). So, typically, a given piece of SODA functionality comes as a pair of a meta maker and a data function, which then normally are combined in a STREAM (cf. Datalink-related Streams).
Processing services usually are a good deal more stereotypical than metadata generation; it is actually beneficial if different services have identical behaviour to facilitate the creation of interoperable clients. SODA itself essentially enumerates what in DaCHS are pre-defined meta makers and data functions. So, most of the time data processing will just re-use STREAMs and procDefs from the //soda RD.
The two most common cases are cutouts over FITS cubes and over spectra.
Processing services are referenced from the links table. In DaCHS, the description column in the links table is set from the services description meta. This falls back to the resource's description meta, which is almost never what you want. So, make sure you include something reasonably concise into the service element like this:
<meta name="description">Slicing and dicing the images from the wonderous survey</meta>
Datalink services identify themselves as supporting some standard. Whenever DaCHS sees a dlget, it will declare the service a SODA service; this is harmless as long as you don't define SODA parameters that do something different from what SODA says they should do. Still, if you have to, you can override the standardID meta to declare support of a different standard, or write:
<meta name="standardID"/>
to entirely suppress the declaration of a standard identifier.
In the first case, the core would like this piece extracted from the dl service in califa/q3:
<datalinkCore> <descriptorGenerator procDef="//soda#fits_genDesc" name="genFITSDesc"> <bind key="accrefPrefix">'califa/datadr3'</bind> <bind key="descClass">DLFITSProductDescriptor</bind> </descriptorGenerator> <FEED source="//soda#fits_standardDLFuncs" spectralAxis="3"/> </datalinkCore>
Here, we use the //soda#fits_genDesc descriptor generator with a DLFITSProductDescriptor because CALIFA DR3 stores datalink URLs rather than actual file paths in the product table. You would leave the descClass parameter out when your products are the FITS files themselves.
Giving an accrefPrefix to anything using the product table to get accrefs (//soda#fromStandardPubDID is another example for these) usually is a good idea. If you don't give it, users can apply the datalink service to any dataset you publish, which might lead to information leaks and hard-to-understand error messages on the user side. accrefPrefix is simply a string that the accref of the product being processed must match. Since in the usual setup, the accref is the inputsDir-relative path of the file, you're usually fine if you just give the path to the directory containing the products in question.
The //soda#fits_standardDLFuncs STREAM arrange for all general FITS processing functions to be pulled in; these encompass the SODA parameters where applicable (at the time of this writing, there is no support for TIME and POL yet, but if you have such data, we'll be glad to add it), and some additional ones.
If you need extended functionality, it is a good idea to start from this STREAM. Copy it from dachs adm dumpDF //soda and hack from there.
The other very common sort of SODA-like processing is for spectra. A sketch for these from the sdl service in flashheros/q:
<datalinkCore> <descriptorGenerator procDef="//soda#sdm_genDesc"> <bind key="ssaTD">"\rdId#data"</bind> </descriptorGenerator> <dataFunction procDef="//soda#sdm_genData"> <bind key="builder">"\rdId#build_sdm_data"</bind> </dataFunction> <FEED source="//soda#sdm_plainfluxcalib"/> <FEED source="//soda#sdm_cutout"/> <FEED source="//soda#sdm_format"/> </datalinkCore>
Here, the descriptor generator will in general be //soda#sdm_genDesc. It builds a special descriptor that contains the full metadata from an associated SSA row, which is why you need to give the id of the SSA table in the ssaTD parameter. Since pubDIDs will only be resolved within this table, no accrefPrefix is necessary or supported.
The first data function for spectra usually will be //soda#sdm_genData. This will read the entire spectrum into memory using a data item, the id of which is given in the builder parameter. This has to build an SDM-compliant spectrum. Some examples of how to do this can be found in cdfspect/q.rd (reading from half-broken FITS files), c8spect/q.rd (which shows how to create spectra that don't exist on disk as files), pcslg/q.rd (which nicely uses WCSAxis for parsing spectra that come as 1D-array, “IRAF-style”), or theossa/q.rd (which pulls the source files from a remote server and caches it). For more on generating SDM-compliant spectra, see SDM compliant tables.
For large spectra, reading the spectrum in its entirety may incur a significant CPU cost. When that becomes a problem for you, you'll need to write different data functions, perhaps only parsing a header, and implement, e.g., cutouts directly in a subsequent data function.
The two next STREAMs pulled in are just combinations of data functions and meta makers, one for optionally re-calibrating the spectrum (right now, only maximum normalisation is supported), the other for providing a SODA-like cutout.
Finally, //soda#sdm_format pulls in a meta maker defining a FORMAT parameter (letting people order several formats including VOTable, FITS binary table, and CSV) and a formatter that interprets it.
If you yield InputKeys from meta makers, all of them will end up in a single processing service, and all data functions will contribute to that same processing service.
Sometimes, however, you want to have two different processing services in a single datalink document. In that case. define a second datalink service in DaCHS, usually with only a dlget renderer; that way, it will always be clear where the actual datalink information for data sets belonging to some collection can be obtained from.
You can then yield a ProcLinkDef object (constructed with the current pubDID and the second datalink service) from a meta maker in the main datalink service. Make sure that there is a description meta in the dlget-only datalink service so users have a chance to figure out why it is there.
Since this is slightly subtle, here is a sketch of how this works:
<service id="dl" allowed="dlmeta,dlget"> <meta name="description">Data Collection's datalink service</meta> <datalinkCore> <descriptorGenerator id="gen" procDef="//datalink#fromtable"> <bind key="tableName">"my.table"</bind> <bind key="idColumn">"prikey"</bind> </descriptorGenerator> <metaMaker> <code> # that's an argument for the built-in dlget service yield MS(InputKey, name="arg1", type="text", description="First service's argument 1") </code> </metaMaker> <metaMaker> <code> # This links the second datalink service yield ProcLinkDef(descriptor.pubDID, rd.getById("dl2")) </code> </metaMaker> <!-- now give the main service its functionality --> <dataFunction .../> </datalinkCore> </service> <service id="dl2" allowed="dlget"> <meta name="description">Mogrification of the shlabudl</meta> <meta name="standardID"/> <!-- stresses that this is higher magic --> <datalinkCore> <descriptorGenerator original="gen"/> <metaMaker> <code> yield MS(InputKey, name="mogrification_level", type="real", description="Second service's argument") </code> </metaMaker> <!-- and again the meat of the service: --> <dataFunction .../> </datalinkCore> </service>
You can also wrap DaCHS-external services into Datalink descriptors. Do not do this to just pass an identifier so some third-party service. If all you pass is a single parameter with a fixed value, what you really have is a link, and it is much better to define it using a LinkDef which has semantics and a proper description, which processing services do not.
Instead, this is intended for when there are actually multiple parameters going into the external service, or for when there is a relevant service implementing an IVOA standard, where clients can pick up pre-configured arguments.
To define such a service, yield an ExternalProcLinkDef instance from a meta maker. Construct it with:
Say you have a cutout service that accepts equatorial limits in RA_MIN/MAX and DEC_MIN/MAX and figures out the dataset to work on from DATASET_ID. Then you could write a meta maker that makes this service available to datalink clients like this:
<metaMaker> <code> footprint = descriptor.skyWCS.calcFootprint(descriptor.hdr) ra_range = MS(Values, min=min(footprint[:,0]), max=max(footprint[:,0])) dec_range = MS(Values, min=min(footprint[:,1]), max=max(footprint[:,1])) yield ExternalProcLinkDef( descriptor.pubDID, [ MS(InputKey, name="DATASET_ID", type="text", ucd="meta.id;meta.main", description="Dataset to operate on", content_=descriptor.pubDID), MS(InputKey, name="RA_MIN", unit="deg", ucd="pos.eq.ra;stat.min", values=ra_range), MS(InputKey, name="RA_MAX", unit="deg", ucd="pos.eq.ra;stat.max", values=ra_range), MS(InputKey, name="DEC_MIN", unit="deg", ucd="pos.eq.dec;stat.min", values=dec_range), MS(InputKey, name="DEC_MAX", unit="deg", ucd="pos.eq.dec;stat.max", values=dec_range)], "http://example.org/cgi-bin/cutout.pl", "Cutout", "External service doing a cutout on this dataset") </code> </metaMaker>
This assumes you have a descriptor generator derived from //soda#fits_genDesc in your datalink core, which gives you the hdr and skyWCS attributes on your descriptor. If you have some other way to figure out the coverage of the image, that is of course fine, too. But please make sure that you do give the legal values on the parameters as shown here, and please take your time to assign units and UCDs, too. For parameters that are not as obvious as the coordinate limits here, please provide a description, too.
Also note how DATASET_ID is passed in here: The value of the content_ constructor argument is used as a constant value to this parameter by datalink clients, so this is how you make the client select the right dataset. Mangling the pubDID into whatever id the external service actually understands is left as an exercise to the reader.
As a somewhat more exotic application, consider this meta maker:
<metaMaker> <code> table_name = descriptor.pubDID.split('?')[-1] yield ExternalProcLinkDef( descriptor.pubDID, [ MS(InputKey, name="QUERY", type="text", content_=f"SELECT * FROM {table_name}"), MS(InputKey, name="LANG", type="text", content_="ADQL")], "http://voparis-tap-maser.obspm.fr/tap", "Data via TAP", "A TAP service with an alternative representation of the data", standardId="ivo://ivoa.net/std/tap") </code> </metaMaker>
While this does not really work at this point, our hope in March 2024 is that clients will interpret this specification as “open a TAP UI for the given TAP base URL, pre-filled with the given TAP query“. If you want this to do something at this point, append a sync to the access URL; that way, you will execute the TAP query (but you should then probably not define the standardID then).
This section contains an overview over how data processing services are built and executed. You should read it if you want to write data processing functions; for just using them, don't bother.
When a request for processed data comes in, the descriptor generator is used to make a product descriptor, and the input keys are adapted to the concrete dataset. This means that, contrary to normal DaCHS services, services with a Datalink core have a variable interface; in particular, the interface on the dlmeta renderer (essentially, just ID) is very different from the one on the dlget renderer (ID plus whatever the meta makers produce).
The input key so produced are used to build a context grammar that parses the request. If this succeeds, the data descriptor is passed to the initial data function together with the arguments parsed. This must set the data attribute of the descriptor or raise a ValidationError on the ID parameter; leaving data as None results in a 500 server error. Descriptor.data could an rsc.InMemoryTable (e.g., in SDM processing) or a products.Products instance, but as long as the other data functions and the formatter agree on what it is, anything goes.
The remaining data functions can change the data in place or potentially replace descriptor.data. When writing code, be aware, though, that a data function should only do something when the corresponding parameter has actually been used. When you change descriptor.data fundamentally, you'll probably make the lives of further data functions and the formatter a good deal harder.
Finally, the data enters the formatter, which actually generates the output, usually returning a pair of mime type and string to be delivered.
It is a design decision of the service creator which manipulations are done in the initial data function, which are in later filters, and which perhaps only in the formatter. The advantage of filters is that they are more flexible and can more easily be reused, while doing it things in the data generator itself will usually be more efficient, sometimes much so (e.g., sums being computed within a database rather than in a filter after all the data had to go through the interface of the database).
Descriptor generators (see element descriptorGenerator) are procedure applications that, roughly, see a pubDID value and are expected to return a datalink.ProductDescriptor instance, or something derived from it.
In the end, this usually boils down to figuring out the value of accref in the product table and using what's there to construct the descriptor generator. In the simplest case, the pubDID will be in DaCHS' “standard” format (see the getStandardPubDID rowmaker function or the macro standardPubDID), in which case the default descriptor generator works and you don't have to specify anything. You could manually insert that default by saying:
<descriptorGenerator procDef="//soda#fromStandardPubDID"/>
This happens to be DaCHS' default if no descriptor generator is given, but as said above that is suboptimal as no accrefPrefix constrains what the service will run on.
The default ProductDescriptor class exposes as attributes all the columns from the products table. See dc.products for their names and descriptions.
If you need extra attributes – typically, data pulled from file headers or specialised database tables –, define an addExtras(descriptor) function ''(since version 2.9.4)'' in the setup of fromStandardPubDID descriptor generators. For instance, to pull extra information from a data collection-specific table, you can write your descriptor generator like this:
<descriptorGenerator procDef="//soda#fromStandardPubDID"> <bind name="accrefPrefix">"dasch/q/"</bind> <bind name="contentQualifier">"image"</bind> <setup> <code> def addExtras(descriptor): descriptor.suppressAutoLinks = True accref = descriptor.pubDID.split("?")[-1] with base.getTableConn() as conn: descriptor.extMeta = next(conn.query( "SELECT * FROM dasch.plates" " WHERE accref = %(accref)s", {"accref": accref})) </code> </setup> </descriptorGenerator>
Note that the other pre-defined descriptor generators do not look for addExtras yet; let us know if that would be useful for you.
A slightly more interesting example is provided by datalink for SSA, where cutouts and similar is generated from spectra. The actual definition is in //soda#sdm_genDesc, but the gist of it is:
<procDef type="descriptorGenerator" id="sdm_genDesc"> <setup imports="gavo.api,gavo.protocols.ssap"> <par key="ssaTD" description="Full reference (like path/rdname#id) to the SSA table the spectrum's PubDID can be found in."/> <par key="descriptorClass" description="The SSA descriptor class to use. You'll need to override this if the dc.products path doesn't actually lead to the file (see `custom generators <#custom-product-descriptor-generators>`_)." late="True">ssap.SSADescriptor</par> <code> ssaTD = api.resolveCrossId(ssaTD, api.TableDef) </code> </setup> <code> with api.getTableConn() as conn: ssaTable = api.TableForDef(ssaTD, connection=conn) matchingRows = list(ssaTable.iterQuery(ssaTable.tableDef, "ssa_pubdid=%(pubdid)s", {"pubdid": pubDID})) if not matchingRows: return DatalinkFault.NotFoundFault(pubDID, "No spectrum with this pubDID known here") # the relevant metadata for all rows with the same PubDID should # be identical, and hence we can blindly take the first result. return descriptorClass.fromSSARow(matchingRows[0], ssaTable.getParamDict()) </code> </procDef>
Here, we use ssa.SSADescriptor, derived from ProductDescriptor, rather than monkeypatching the extra ssaRow attribute the former provides; being explicit here may help when debugging. As usual, the descriptor generates encodes how to resolve a pubDID to an accref, in this case using an SSA table. If the product table just lists a datalink URL, you will want to override the accessPath this comes up with. See, for instance, pcslg/q for how to do this.
Incidentally, in this case you could stuff the entire code into the main code element, saving on the extra setup element. However, apart from a minor speed benefit, keeping things like function or class definitions in setup allows easier re-use of such definitions in procedure applications and is therefore recommended.
For FITS files, you will usually just use //soda#fits_genDesc, defining the accrefStart as discussed in FITS/SODA processing. This will produce datalink.FITSProductDescriptor instances. As in the SSA/SDM case, you may need different descriptor classes in special situations. Since for large FITS files, just delivering datalink files is a fairly compelling proposition, there is actually a predefined descriptor class to use with datalink access paths, DLFITSProductDescriptor; the dl service in califa/q3 shows how to use it.
Sometimes, you want to produce datalinks for tables that do not manage products – most likely because all you have is URLs, but possibly also because there simply are no products in the first place. In that case, you probably want to use the the //datalink#fromtable product descriptor.
To use this, you have to pass the name of the table with the items you want to link from (tableName) and the column to match the identifier against (idColumn). The descriptor generator then does the database query, makes sure exactly one row matched and, if so, puts the result into the metadata attribute of the descriptor.
A simple case is to associate some sort of preview with an EPN-TAP table row:
<service id="dl" allowed="dlmeta"> <meta name="title">SoHO EIT Synoptic maps datalink service</meta> <datalinkCore> <descriptorGenerator procDef="//datalink#fromtable"> <bind key="tableName">"\schema.epn_core"</bind> <bind key="idColumn">"granule_uid"</bind> </descriptorGenerator> <metaMaker semantics="#preview"> <code> yield descriptor.makeLink( descriptor.metadata['thumbnail_url'], description="Preview image", contentType='image/jpeg') </code> </metaMaker> </datalinkCore> </service>
This service will accept any value from the granule_uid column as ID.
If the value in the ID column actually contains IVOA publisher DIDs, you may want to also accept “relative” identifiers, for instance, just dataset_15 instead of ivo://myauthority/~?/data/foo/dataset_15. In that case, bind the prefix to the descriptor generator's didPrefix parameter, like this:
<descriptorGenerator procDef="//datalink#fromtable"> <bind key="tableName">"\schema.myssa"</bind> <bind key="idColumn">"ssa_pubDID"</bind> <bind key="didPrefix">"ivo://myauthority/~?/data/foo/"</bind> </descriptorGenerator>
(but that's really only convenience).
Since it cannot know about them, the fromtable descriptor does not automatically add the #this and #preview links (i.e., it sets suppressAutoLinks). I personally consider datalink documents without #this as flakey, so if you can, add a #this link manually. In the EPN-TAP case, an obvious choice would be:
<metaMaker semantics="#this"> <code> yield descriptor.makeLink( descriptor.metadata['access_url'], description="The full dataset", contentType="image/fits") </code> </metaMaker>
As a last point regarding this non-local use case, if you want to enable the dlget renderer here, there is DataFromURL visible in dataFunction-s. In the simplest case, you can write something like (continuing the EPN-TAP example):
<dataFunction> <code> descriptor.data = DataFromURL( descriptor.metadata["access_url"]) </code> </dataFunction>
When this gets rendered, the client will be redirected to whatever access_url points to.
The use of meta makers to produce link rows was already discussed in Making Datalinks.
To define a datalink service's processing capabilities, meta makers yield input keys (InputKey instances). The classes usually required to build input keys return (InputKey, Values, Option) are available to the code as local names. As usual, DaCHS structs should not be constructed directly but only using the MS helper (which is really an alias for base.makeStruct; it takes care that the special postprocessing of DaCHS structures takes place).
You should make sure that the input keys have proper annotation as regards minima, maxima, or enumerated values; clients, in general, have to way to guess what is sensible here.
The limits can usually be obtained from the descriptor (which, again, is available as descriptor in the meta maker. For instance, the FITS descriptor has a header attribute describing the instance that the core operates on, the SSA descriptor an attribute ssaROW.
A meta maker that generates an extra cutout parameter for radio astronomers (note that this is of course a bad idea -- unit adaption should be done on the client side) could be:
<metaMaker> <setup imports="gavo.utils.unitconv"/> <code> yield MS(InputKey, name="FREQ", unit="MHz", ucd="em.freq", description="Spectral cutout interval", type="double precision[2]" xtype="interval" multiplicity="forced-single" values=MS(Values, min=1e-6*unitconv.LIGHT_C/(descriptor.ssaRow["ssa_specstart"], max=1e-6*unitconv.LIGHT_C/descriptor.ssaRow["ssa_specend"])) </code> </metaMaker>
The SODA-compliant version of this is in the //soda#sdm_cutout predefined stream.
The main point here is that you should follow section 4.3 for the [SODA] spec, i.e., use interval-xtyped parameters. Also, unless you're actually prepared to handle multiply-specified parameter values, you should use the forced-single mulitplicity, which makes DaCHS reject requests that contain a parameter more than once.
An extra complication occurs when SODA descriptors are generated for DAL responses. Currently, this is only envisaged for SSA. There, the descriptor has an extra limits attribute that gives, for each eligible column, minimum and maximum values or a set of values for enumerated columns.
Similar (if possibly less useful) mechanisms are conceivable for, say, partial obscore results or SIAv1. We suggest to keep the attribute name of this sort of collective characterisation as limits. DaCHS does not implement anything of this kind right now, though.
Both descriptor generators and meta makers can return (or yield, in the case of meta makers) error messages instead of either a descriptor or a link definition. This allows more fine-tuned control over the messages generated than raising an exception.
Error messages are constructed using class functions of DatalinkFault, which is visible to both procedure types. The class function names correspond to the message types defined in the datalink spec and match the semantics given there:
Thus, a descriptor generator could look like this:
<descriptorGenerator> <code> with base.getTableConn() as conn: matchingRows = list(conn.queryToDicts( "select physPath from schema.myTable where pub_did=%(pubDID)s", locals())) if not matchingRows: return DatalinkFault.NotFoundFault(pubDID, "No dataset with this pubDID known here") return MyCustomDescriptor.fromFile(matchingRows[0]["physPath"]) </code> </descriptorGenerator>
Where sensible, you should pass (as a keyword argument) semantics (as for LinkDefs) to the DatalinkFault's constructor; this would indicate what kind of link you wanted to create.
Data functions (see element dataFunction) generate or manipulate data. They see the descriptor and the arguments (as args), parsed according to the input keys produced by the meta makers, where the descriptor's data attribute is filled out by the first data function called (the “initial data function”).
As described above, DaCHS does not enforce anything on the data attribute other than that it's not None after the first data function has run. It is the RD author's responsibility to make sure that all data functions in a given datalink core agree on what data is.
All code in a request for processed data is also passed the input parameters as processed by the context grammar. Hence, the code can rely on whatever contract is implicit in the context grammar, but not more. In particular, a datalink core has no way of knowing what data functions expects which parameters. If no value for a parameter was provided on input, the corresponding value is None but a data function using it still is called.
An example for a generating data function is //soda#generateProduct, which may be convenient when the manipulations operate on plain local files; it basically looks like this:
<dataFunction> <code> descriptor.data = products.getProductForRAccref(descriptor.accref) </code> </dataFunction>
(the actual implementation lets you require certain mime types and is therefore a bit more complicated).
You could do whatever you want, however. The following would work perfectly if you make your data functions handle lists of dicts:
<dataFunction> <setup imports="random"/> <code> descriptor.data = [{"pix": i, "val": random.random()} for i in range(20000)] </code> </dataFunction>
It wouldn't be hard to come up with a formatter that turns this into a nice VOTable.
Filtering data functions should always come with a meta maker declaring their parameters. As an example, continuing the frequency cutout example above, consider this:
<dataFunction> <code> if not args.get("FREQ"): return lam_min, lam_max = (unitconv.LIGHT_C/(args[FREQ][0]*1e6) unitconv.LIGHT_C/(args[FREQ][1]*1e6)) from gavo.protocols import sdm sdm.mangle_cutout( descriptor.data.getPrimaryTable(), lam_min, lam_max) </code> </dataFunction>
(Ignoring for the moment troubles with half-open intervals).
There are situations in which a data function must shortcut, mostly because it is doing something other than just “pushing on” descriptor.data. Examples include preview producers or a data function that should produce the FITS header only. For cases like this, data functions can raise one of DeliverNow (which means descriptor.data must be something servable, see Data Formatters and causes that to be immediately served) or FormatNow (which immediately goes to the data formatter; this is less useful).
Here's an example for DeliverNow; a similar thing is contained in the STREAM //soda#fits_genKindPar:
<dataFunction> <setup imports="gavo.utils.fitstools"/> <code> if args["KIND"]=="HEADER": descriptor.data = ("application/fits-header", fitstools.serializeHeader(descriptor.data[0].header)) raise DeliverNow() </code> </dataFunction>
When writing data functions, you should raise soda.EmptyData() when a cutout results in empty data (e.g., because the cutout limits are out of range). If you don't, users of your service might become angry with you when they have to click away many empty windows (say).
For further examples of data functions, see the //soda RD coming with the distribution. If you write some, please consider whether they might be interesting for other DaCHS users, too, and submit them for inclusion into //soda.
Data formatters (see element dataFormatter) take a descriptor's data attribute and build something servable out of it. Datalink cores do not absolutely need one; the default is to return descriptor.data (the //soda#trivialFormatter, which might be fine if that data is servable itself).
What is servable? The easiest thing to come up with is a pair of content type and data in byte strings; if descriptor.data is a Table or Data instance, the following could work:
<dataFormatter> <code> from gavo import formats return "text/plain", formats.getAsText(descriptor.data) </code> </dataFormatter>
Another example is an excerpt from //soda#sdm_cutout:
<dataFormatter> <code> from gavo.protocols import sdm if len(descriptor.data.getPrimaryTable().rows)==0: raise base.ValidationError("Spectrum is empty.", "(various)") return sdm.formatSDMData(descriptor.data, args["FORMAT"]) </code> </dataFormatter>
(this goes together with a metaMaker for an input key describing FORMAT).
An alternative is to return an object that has a renderHTTP(request) method. This must then use request.write to produce content for the client. This is true for the Product instances that //soda#generateProduct generates, for example. You can also write something yourself by inheriting from protocols.products.ProductBase and overriding its iterData method.
If you don't inherit from ProductBase, be aware that this renderHTTP runs in the main server loop. If it blocks, the server blocks, so make sure that this doesn't happen. The conventional way would be to return, from the renderHTTP method, some twisted producer. Non-Product nevow resources will also not work with asynchronous datalink at this point.
For certain renderers (currently, only ssap.xml, but we might do it for SIAP, too), DaCHS will add a direct SODA block if there's an _associatedDatalinkService meta on the table it serves from and that datalink service has a dlget capability. Here's how the datalink declarations could look like in such a case:
<RESOURCE name="links" type="meta" utype="adhoc:service"> <DESCRIPTION>...</DESCRIPTION> <GROUP name="inputParams"> <PARAM arraysize="*" datatype="char" name="ID" ref="ssa_pubDID" ucd="meta.id;meta.main" value=""/> </GROUP> <PARAM arraysize="*" datatype="char" name="standardID" value="ivo://ivoa.net/std/DataLink#links-1.0"/> <PARAM arraysize="*" datatype="char" name="accessURL" value="http://localhost:8080/gaia/q2/tsdl/dlmeta"/> </RESOURCE> <RESOURCE ID="proc_svc" name="proc_svc" type="meta" utype="adhoc:service"> <DESCRIPTION>...</DESCRIPTION> <GROUP name="inputParams"> <PARAM arraysize="*" datatype="char" name="ID" ref="ssa_pubDID" ucd="meta.id;meta.main" value=""> <DESCRIPTION>The publisher DID of the dataset of interest</DESCRIPTION> </PARAM> <PARAM arraysize="*" datatype="char" name="BANDPASS" value=""> <DESCRIPTION>Gaia bandpass to generate the time series for.</DESCRIPTION> <VALUES> <OPTION name="G" value="G"/> <OPTION name="BP" value="BP"/> <OPTION name="RP" value="RP"/> </VALUES> </PARAM> </GROUP> <PARAM arraysize="*" datatype="char" name="accessURL" ucd="meta.ref.url" value="http://localhost:8080/gaia/q2/tsdl/dlget"/> <PARAM arraysize="*" datatype="char" name="standardID" value="ivo://ivoa.net/std/SODA#sync-1.0"/> </RESOURCE>
– the first block declares where to obtain full datalink documents by publisher DID from.
The second block lets clients take a shortcut and call a processing service directly, without first retrieving the datalink document; it is essentially an anonymised version of the processing declaration fromt the datalink block.
To generate these, DaCHS also calls the dlmeta procs, but with pubDID set to None. Whenever you need a concrete pubDID in a dlmeta proc used with SSA, you should therefore add something like:
if descriptor.pubDID is None: return
Also note that in these cases, a special descriptor type is being used rather than whatever you put into your descriptor generator, and hence you can't use any special attributes you defined there. On the other hand, you'll have a limits attribute with a dictionary giving ranges of values within the concrete (SSA) result. This should be used to build Values objects tailored to the specific result.
All this is admittedly painful; the shortcut SODA blocks that cause all that pain can probably count as a classic case of premature optimisation.
You can publish the metadata generating endpoint on your service by saying <publish render="dlmeta" sets="ivo_managed"/>. However, that is not recommended, as it clutters the registry with services that are not really usable after discovery.
Datalink services will, however, appear as capabilities of services that publish tables that have associated datalink services.
While it might be a good idea to provide some _example meta for all datalink services, when you register them, you really should provide one in any case so validators can pick up IDs and parameters to use when valdiating your service. Here is an example, taken from califa/q3:
CALIFA cubes can be cut out along RA, DEC, and spectral axes. CIRCLE and POLYGON cutouts yield bounding boxes. Also note that the coverage of CALIFA cubes is hexagonal in space. This explains the empty area when cutting out :genparam:`CIRCLE(225.5202 1.8486 0.001)` :genparam:`BAND(366e-9 370e-9)` on :dl-id:`ivo://org.gavo.dc/~?califa/datadr3/V1200/UGC9661.V1200.rscube.fits`.
Essentially, an identifier to use is given as the dl-id interpreted text role, whereas processing parameters are given as DALI genparams. In DaCHS, they are written as the parameter name and its value in parentheses.
In particular for larger datasets like cubes, it is rude to put the entire dataset into an obscore table. Although obscore gives expected download sizes, clients nevertheless do not usually expect to have to retrieve several gigabytes or even terabytes of data when dereferencing an obscore access URL.
While you could define additional datalink URLs and use these in Obscore – this is what lswscans/res/positions does, and there's a piece of text on this in the tutorial –, you should in general use datalinks as product URLs throughout with datasets larger than a couple of Megabytes. c8spect/q shows how to do that with completely virtual data, califa/q3 and pcslg/q are examples for what to do with FITS cubes or spectra.
This way, of course, without a datalink-enabled client people might be locked out from the dataset entirely. On the other hand, DaCHS comes with a stylesheet that enables datalink operation from a common web browser, so that's perhaps not too bad.
Aladin likes it when columns containing datalink URLs are marked up. DaCHS has two properties that let you add that markup, targetType and targetTitle. On a standalone datalink column that you just add to an output table, this could look like this (the datalink service would have an id of “dl” here; this also assumes you have a column named pub_did):
<outputField name="datalink" type="text" id="datalink_output" ucd="meta.ref.url" select="'\getConfig{web}{serverURL}/\rdId/dl/dlmeta?ID=' || gavo_urlescape(pub_did)" tablehead="DL" description="URL of a datalink document for this dataset." displayHint="type=url" verbLevel="1"> <property name="targetType" >application/x-votable+xml;content=datalink</property> <property name="targetTitle">Datalink</property> </outputField>
When your product link is a datalink, you have to amend the accref column in your main table. This stereotypically looks like this:
<column original="accref"> <property name="targetType" >application/x-votable+xml;content=datalink</property> <property name="targetTitle">Datalink</property> </column>
To have datalinks rather than the plain dataset as what the accref points to, you need to change what DaCHS thinks of your dataset; this is what the //products#define rowfilter in your grammar is for:
<fitsProdGrammar qnd="True"> <rowfilter procDef="//products#define"> <bind key="path">\dlMetaURI{dl}</bind> <bind key="mime">'application/x-votable+xml;content=datalink'</bind> <bind key="fize">10000</bind> [...] </rowfilter> [...] </fitsProdGrammar>
This includes the estimate that the datalink document will have about 10k octets; in that region, there is no need to be precise. Note that the argument to the macro dlMetaURI is the id of the datalink service; DaCHS has no way to work that out by itself.
When you do this, you must use a datalink-aware descriptor generator in SODA. When you use the recommended setup, where the accref is the inputsDir-relative path to the main file, and you're dealing with FITS, you can use the DLFITSProductDescriptor class. Thus, the base functionality of a FITS cutout service with datalink products would be:
<service id="dl" allowed="dlget,dlmeta"> <meta name="title">My Cutout Service</meta> <datalinkCore> <descriptorGenerator procDef="//soda#fits_genDesc" name="genFITSDesc"> <bind key="accrefPrefix">'mysvcs/data'</bind> <bind key="descClass">DLFITSProductDescriptor</bind> </descriptorGenerator> <FEED source="//soda#fits_standardDLFuncs"/> </datalinkCore> </service>
When not using FITS, you will need to change the descriptor generator's computation of the local file path yourself, as done, e.g., in pcslg/q.
A common use for datalink cores in DaCHS is for server-side generation and processing of spectra as discussed in SDM processing . This almost invariably involves defining tables compliant with the spectral data model and filling them.
The builder parameter of //soda#sdm_genData expects a reference to an SDM compliant data element. To define it, you first need to define an instance table. The columns that are in there depend on your data. In the simplest case, the //ssap#sdm-instance mixin is sufficient and adds the columns flux and spectral. Here's how you'd add flux errors if you needed to:
<table id="instance" onDisk="False"> <mixin ssaTable="slitspectra" spectralDescription="Wavelength" fluxDescription="Flux" >//ssap#sdm-instance</mixin> <column name="fluxerror" ucd="stat.error;phot.flux.density;em.wl" unit="m" description="Estimate for error in flux based on the procedure discussed at referenceURL"/> </table>
What's referenced in //soda#sdm_genData is a data element that builds this table. Here's one that fills the table from the database:
<data id="get_slitcomponent"> <!-- datamaker to pull spectra values out of the database --> <embeddedGrammar> <iterator> <code> obsId = self.sourceToken["accref"].split("/")[-1] with base.getTableConn() as conn: for row in conn.queryToDicts( "SELECT lambda as spectral, flux, error as fluxerror" " WHERE obsId=%(obsid)s ORDER BY lambda"): yield row </code> </iterator> </embeddedGrammar> <make table="instance"> <parmaker> <apply procDef="//ssap#feedSSAToSDM"/> </parmaker> </make> </data>
-- obviously, you can just as well fill it from a file (e.g., cdfspect/q, which also shows what to do when the metadata that comes with the files is boken).
The parmaker with the //ssap#feedSSAToSDM call is generic, i.e., you won't usually need any more tricks here.
DaCHS has full support for all aspects of https://www.ivoa.net/documents/BibVO since version 2.8.2. The link to articles is done through the source meta, and marking records for export to external metadata repositories works by saying:
<meta> date: 2023-10-27 date.role: ExportRequested </meta>
(in association with setting a doi meta); both has been possible in DaCHS since the pre-1.0 days. Support for linking datasets to articles (sect. 3 of BibVO, the biblink-harvest endpoint), however, needs code only present in DaCHS since version 2.9.
To enable biblink-harvest, first create the underlying table and declare your biblink-harvest endpoint to the registry:
dachs imp //biblinks dachs pub //biblinks
You can then inspect what biblinks you export either in the database:
SELECT * FROM dc.biblinks
or by retrieving the json generated from that:
curl http://localhost:8080/__system__/biblinks/links/biblinks.json
At this point, of course, that table is still empty. You will usually fill it using python-language scripts (see Scripting for the general picture of scripts in DaCHS), where the type of the script depends on the nature and the size of the metadata: is it just a few biblinks updated more or less independently of the individual datasets, or is it something of the order of a publication per dataset in the table?
We treat both cases in turn. Even if you go for the more complicated Per-Publication, Locally Aggregated Biblinks case, please read the next section, as it introduces some concepts needed in the aggregated case, too.
Use these when you export only up to a handful of links per publication. In that case, you keep all the links directly in dc.biblinks. The most convenient place to keep the script deriving the links is in an afterMeta script sitting in the table in question; the advantage of this arrangement is that you can update the links with a quick dachs imp -m.
Consider this example from toss/q:
<table id="data" onDisk="True" adql="True" mixin="//slap#basic"> [...] <script type="afterMeta" lang="python" name="add biblinks"> from gavo.protocols import biblinks biblinks.clearLinks(table.connection, table.tableDef.rd) src_id = base.getMetaText( table.tableDef.rd.getById("line_tap"), "identifier") pubs_mentioned = [r[0] for r in table.connection.query("SELECT DISTINCT pub FROM toss.data")] biblinks.defineLinks(table.connection, table.tableDef.rd, [(pub, "IsSupplementedBy", src_id) for pub in pubs_mentioned]) </script>
This links all the publications mentioned in the pub column of the toss.data table to the VO publication (i.e., in effect the registry record) of the line_tap table, stating the latter supplements the former (“data from this publication is published in that VO resource“). Let's take it step by step:
from gavo.protocols import biblinks
Biblinks-managing code is gavo.protocols.biblinks, in particular the clearLinks and the defineLinks functions.
biblinks.clearLinks(table.connection, table.tableDef.rd)
In order to keep the links entries unique, you will always want to clear links previously inserted by your script (possible exception: incremental imports; but even then, entirely redoing the biblinks after every import is probably a reasonable strategy). This needs a database connection – always use the one coming with the table you are working on in order to not break transactionality – and the RD, which it uses to recognise previously imported records.
In case you have multiple biblinks-feeding scripts in a single RD, give clearLinks and defineLinks an extra linkSource='someString' argument, where someString would identify the script. This might also help with incremental imports.
The bibliographic identifiers you pass in are assumed to be bibcode. If what you pass in are DOIs, pass bibFormat='doi' to defineLinks. The function does not validate the bibFormat, so in principle you can pass in anything. However, harvesters will fairly certainly not understand anything not defined in VOResoure's content/source/@format.
src_id = base.getMetaText( table.tableDef.rd.getById("line_tap"), "identifier")
This computes the destination of the biblink, i.e., the ivoid of the resource we want to point to. In this case, it is the LineTAP table; in general, use the id of the element that contains the publish or register element and request its identifier meta. You can also use http URLs here; defineLinks will turn ivoids passed in to http URLs using GAVO's landing page service at http://dc.g-vo.org/LP.
pubs_mentioned = [r[0] for r in table.connection.query("SELECT DISTINCT pub FROM toss.data")]
This is what you will definitely want to change: here, I am using a database query to see what publications should link to this resource. You could also hard-code bibcodes, read them from some file, or whatever. The result in each case should be a sequence of bibcodes or DOIs.
biblinks.defineLinks(table.connection, table.tableDef.rd, [(pub, "IsSupplementedBy", src_id) for pub in pubs_mentioned])
This is where the links are ingested. The first two arguments are as for clearLinks above, the last argument is a sequence of triples of bibligraphic source, its relationship to src_id (as explained in BibVO, usually either IsSupplementedBy or Cites), and the thing to link to (in this case the ivoid of the published table).
Update dc.biblinks by dachs imp -m-ing this and inspect the effect as discussed above.
This is for observatory bibliographies and the like, when a single publication might have used (“Cites“) hundreds of datasets. At least the ADS does not want to receive that many datasets for a single publication, and thus the data centres report only summarily something like “I have 121 data links for this publication” and provide a link at which these links are formatted for consumption with a web browser, giving users a way to retrieve the datasets wholesale or individually.
This case is a bit more complicated than the Small-Size, Per-Resource Biblinks discussed above but re-uses the same concepts in its biblinks-filling script; please see there for the basics.
In this case, you will probably want to use a postCreation script in a make element, as this sort of metadata will likely need an update any time the data is changed. Consider this example taken from lswscans/res/positions:
<make table="bibliography"> <script type="postCreation" lang="python" name="make biblinks"> from gavo.protocols import biblinks rd = table.tableDef.rd biblinks.clearLinks(table.connection, rd) base_link = rd.getById("biblanding").getURL("qp") new_links = [] for nlinks, bibref in table.connection.query(""" SELECT count(*) as nlinks, bibref FROM {} GROUP BY bibref""".format(table.tableDef.getQName())): new_links.append( (bibref, "Cites", f"{base_link}/"+urllib.parse.quote(bibref), nlinks)) biblinks.defineLinks(table.connection, table.tableDef.rd, new_links) </script> </make>
In this case, this sits in the data element for an extra table containing publication-dataset links; if appropriate, you can just as well run this in the import script of the data itself. As in the small-size case, we first clear any pre-existing biblinks, and we again read the links from a database table, except this time we aggregate by the bibref and compute the number of links per bibref.
This is then collected into new_links, which now is a quadruple (rather than a triple as before); the fourth element of the quadruple is what is called cardinality in BibVO.
The dataset-ref, i.e., the target link, is now computed as:
f"{base_link}/"+urllib.parse.quote(bibref).
Here, base_link has been established before as the qp-rendered result of the biblanding service, and we quote bibref to make it well-behaved in URIs.
What is this odd service? Well, the qp Renderer takes arguments from the query path, so that our bibref now becomes an argument. The core of the service now must take this argument and format the associated links. DaCHS does not provide anything canned for that yet, but it is likely that the example service in lswscans/res/positions will bring you a good way towards what you would need:
<service id="biblanding" allowed="qp"> <meta name="title">HDAP Plates Per Publication</meta> <property name="queryField">bibref</property> <template key="resultline">//productselect.html</template> <template key="resulttable">//productselect.html</template> <fancyQueryCore queriedTable="plates"> <query> select plateid, accref, dateObs, bandpassLo, bandpassHi, exposure, centerAlpha, centerDelta, '\getConfig{web}{serverURL}/\rdId/dl/dlmeta?ID=' || gavo_urlescape(pub_did) as datalink, accref as checks from lsw.plates join lsw.bibliography using (plateid) %s </query> <condDesc> <inputKey original="bibliography.bibref"/> </condDesc> <outputTable autoCols="plateID, accref, dateObs, exposure, centerAlpha, centerDelta"> <outputField name="checks" type="text" tablehead="Select"> <formatter> return T.input(type="checkbox", name="accref", value=data) </formatter> </outputField> <outputField name="datalink" type="text" ucd="meta.ref.url" tablehead="DL" description="URL of a datalink document for this dataset." displayHint="type=url" verbLevel="1"> <property name="targetType" >application/x-votable+xml;content=datalink</property> <property name="targetTitle">Datalink</property> </outputField> </outputTable> </fancyQueryCore> </service>
Let's look at the less common items here:
<property name="queryField">bibref</property>
This tells the qp renderer to pretend that its query path was passed in in a URL parameter named bibref.
<template key="resultline">//productselect.html</template> <template key="resulttable">//productselect.html</template>
This is more configuration of the qp renderer; it says that regardless of whether there is just a single dataset or there are multiple of them, it should use the built-in productselect.html template. Of course, you can use your own templates (cf. templating.html). The one coming with DaCHS lets people select datasets using checkboxes and then bulk-retrieve them as a tar file, which seems to be something of an industry standard in the business of observatory bibliographies.
<fancyQueryCore queriedTable="plates">
Ignore the slightly silly name: this core lets you send a hand-written query to the database while leaving the computation of the WHERE clause to DaCHS' usual mechanisms (it's written as %s in the query; don't worry about proper quoting, DaCHS has you covered there). The query in this case is a join between the table of bibref-data pairs and the actual data table, which will probably be a rather common case. The most complicated expression in there produces IVOA datalinks from pubDIDs that already are in the table.
<condDesc> <inputKey original="bibliography.bibref"/> </condDesc>
This is where you say how to come from the input parameter to a WHERE clause. Do not use the condDesc's buildFrom attribute here – for this use case, you want a simple, blind string match, which you get by building an input key directly from a column.
<outputTable autoCols="plateID, accref, dateObs, exposure, centerAlpha, centerDelta"> <outputField name="checks" type="text" tablehead="Select"> <formatter> return T.input(type="checkbox", name="accref", value=data) </formatter> </outputField>
With FancyQueryCore-s, you need to define an output table, and all columns mentioned in there must have a corresponding item in the top-level select clause of query. Any select attributes on output fields are ignored in this context, as the query is already user-defined. You will usually use the autoCols (and yes, the mixed-case column identifiers here are deep legacy; don't do mixed case in database columns yourself, as you will regret that), where the column names are resolved within the core's queriedTable.
You can define further output fields. The one shown here, named checks (the name does not matter), has a special role in that it produces the checkboxes that let users select what to bulk-download. The important parts: First, there is accref as checks in the query's select clause, which makes the value=data in the formatter render the dataset's accref into the checkbox's value. And second, name="accref", as accref is the parameter name DaCHS' getproduct renderer expects its input in.
In case you are curious how this integrates into the bigger picture, have a look at the output of dachs adm dumpDF templates/productselect.html. You will notice that the table is rendered within a:
<form method="GET" action="/__system__/products/getTar/get">
– the form's action is the service that will consume all the accrefs that users have checked and format them into a tar file.
DaCHS has built-in machinery to generate previews from normal, 2D FITS and JPEG files, where these are versions of the original dataset scaled to be about 200 pixels in width, delivered as JPEG files. These previews are shown on mousing over product links in the web interface, and they turn up as preview links in datalink interfaces. This also generates previews for cutouts.
For any other sort of data, DaCHS does not automatically generate previews. To still provide previews – which is highly recommended – there is a framework allowing you to compute and serve out custom previews. This is based on the preview and preview_mime columns which are usually set using parameters in //products#define.
You could use external previews by having http (or ftp) URLs, which could look like this:
<rowfilter procDef="//products#define"> ... <bind key="preview">("http://example.org/previews/" +"/".join(\inputRelativePath.split("/")[2:]))</bind> <bind key="preview_mime">"image/jpeg"/bind> </rowfilter>
(this assumes takes away to path elements from the relative paths, which typically reproduces an external hierarchy). If you need to do more complex manipulations, you can have a custom rowfilter, maybe like this if you have both FITS files (for which you want DaCHS' default behaviour selected with AUTO) and .complex files with some external preview:
<rowfilter name="make_preview_paths"> <code> srcName = os.path.basename(rowIter.sourceToken) if srcName.endswith(".fits"): row["preview"] = 'AUTO' row["preview_mime"] = None else: row["preview"] = ('http://example.com/previews' +os.path.splitext(srcName)[0]+"-preview.jpeg") row["preview_mime"] = 'image/jpeg' yield row </code> </rowfilter> <rowfilter procDef="//products#define"> ... <bind key="preview">@preview</bind> <bind key="preview_mime">@preview_mime</bind> </rowfilter>
More commonly, however, you'll have local previews. If they already exist, use a static renderer and enter full local URLs as above.
If you don't have pre-computed previews, let DaCHS handle them for you. You need to do three things:
define where the preview files are. This happens via a previewDir property on the importing data descriptor, like this:
<data id="import"> <property key="previewDir">previews</property> ...
say that the previews are standard DaCHS generated in the //products#define rowfilter. The main thing you have to decide here is the MIME type of the previews you're generating. You will usually use either the macro standardPreviewPath (preferable when you have less than a couple of thousand products) or the macro splitPreviewPath to fill the preview path, but you can really enter whatever paths are convenient for you here:
<rowfilter procDef="//products#define"> <bind key="table">"\schema.data"</bind> <bind key="mime">"image/fits"</bind> <bind key="preview_mime">"image/jpeg"</bind> <bind key="preview">\standardPreviewPath</bind> </rowfilter>
actually compute the previews. This is usually not defined in the RD but rather using DaCHS' processing framework. Precomputing previews in the processor documentation covers this in more detail; the upshot is that this can be as simple as:
from gavo.helpers import processing class PreviewMaker(processing.SpectralPreviewMaker): sdmId = "build_sdm_data" if __name__=="__main__": processing.procmain(PreviewMaker, "flashheros/q", "import")
When you keep the data you want to preview in the database – as is sensible for shortish spectra or time series – it hurts to create files for what otherwise would be neatly in arrays in the database, much more so since such collections are often large, and thus the overwhelming majority of generated files would probably never be retrieved.
So, we would much rather generate the images on the fly. DaCHS can do this, too. The major ingredient is the qp renderer, which lets you write a service taking a single argument from the URL path. This keeps preview URLs tidy.
Here is an example for a preview generating service reading spectral and flux points from the database:
<service id="preview" allowed="qp"> <meta name="title">DFBS spectra preview maker"</meta> <property name="queryField">specid</property> <pythonCore> <inputTable> <inputKey name="specid" type="text" required="True" description="ID of the spectrum to produce a preview for"/> </inputTable> <coreProc> <setup imports="gavo.helpers.processing.SpectralPreviewMaker, gavo.svcs"/> <code> with base.getTableConn() as conn: res = list(conn.query("SELECT spectral, flux" " FROM \schema.spectra" " WHERE specid=%(specid)s", inputTable.args)) if not res: raise svcs.UnknownURI("No such spectrum known here") return ("image/png", SpectralPreviewMaker.get2DPlot( zip(res[0][0], res[0][1]), linear=True)) </code> </coreProc> </pythonCore> </service>
Essentially, we define a service that sticks the rest of a query path pointing to it into a field specid in the input table. For instance, when the query path coming in is myspecs/q/preview/qp/foo/bar and the thing sits in the RD myspecs/q, then specid will be foo/bar.
The python core fetches this specid and does a database query to pull the spectral and flux points out of the database table; if there is an accref in the table, it is probably a good idea to just use that for what specid does here.
Finally, we use the SpectralPreviewMaker mentioned above; this has a static method doing a 2D plot or (x,y) tuples (see the source if you have to) and returning a PNG in a string.
When a core returns a 2-tuple, most DaCHS renderers will interpret the first element as a media type and the second as a byte string to deliver; qp certainly does, and so the last line simply ensures the data is handed back to the client as an image/png.
What's left to do is tell DaCHS where to find the previews. That you'll do in the products#define rowfilter. In all likelihood, you'll be building some artificial accref in such cases. Right now, you will have to repeat such expressions when declaring the URL at which the preview is found, perhaps like this:
<rowfilter procDef="//products#define"> <bind key="table">"\schema.spectra"</bind> <bind key="accref">"\rdId/%s-%s"%(@plate, @objectid[5:])</bind> <bind key="path">[...]</bind> <bind key="preview_mime">"image/png"</bind> <bind key="preview">makeAbsoluteURL("\rdId/preview/qp/%s-%s"%( @plate, @objectid[5:]))</bind> </rowfilter>
Universal Worker Systems (UWSes) allow the asynchronous operation of services, i.e., the server runs a job on behalf of the user without the need for a persistent connection.
DaCHS supports async operations of TAP and datalink out of the box. If you want to run async services defined by your own code, there are a few things to keep in mind.
(1) You'll need to prepare your database to keep track of your custom jobs (just once):
dachs imp //uws enable_useruws
(2) You'll have to allow the uws.xml renderer on the service in question.
(3) Things running within a UWS are fairly hard to debug in DaCHS right now. Until we have good ideas on how to make these things a bit more accessible, it's a good idea to at least for debugging also allow synchronous renderers, for instance, form or api. If something goes wrong, you can do a sync query that then drops you in a debugger in the usual manner (see the debugging chapter in the tutorial).
(4) For now, the usual queryMeta is not pushed into the uws handler (there's no good reason for that). We do, however, transport on DALI-type RESPONSEFORMAT. To enable that on automatic results (see below), say:
<inputKey name="responseformat" description="Preferred output format" type="text"/>
in your input table.
(5) All UWS parameters are lowercased and only available in lowercased form to server-side code. To allow cores to run in both sync and async without further worries, just have lowercase-only parameters.
(6) As usual, the core may return either a pair of (media type, content) or a data item, which then becomes a UWS result named result with the proper media type. You can also return None (which will make the core incompatible with most other renderers). That may be a smart thing to do if you're producing multiple files to be returned through UWS. To do that, there's a job attribute on the inputTable that has an addResult(source, mediatype, name) method. Source can be a string (in which case the string will be the result) or a file open for reading (in which case the result will be the file's content). Input tables of course don't have that attribute unless they come from the uws rendererer. Hence, a typical pattern to use this would be:
if hasattr(inputTable, "job"): with inputTable.job.getWritable() as wjob: wjob.addResult("Hello World.\\n", "text/plain", "aux.txt")
or, to take the results from a file that's already on-disk:
if hasattr(inputTable, "job"): with inputTable.job.getWritable() as wjob: with open("other-result.txt") as src: wjob.addResult(src, "text/plain", "output.txt")
Right now, there's no facility for writing directly to UWS result files. Ask if you need that.
(7) UWS lets you add arbitrary files using standard DALI-style uploads. This is enabled if there are file-typed inputKeys in the service's input table. These inputKeys are otherwise ignored right now. See [DALI] for details on how these inputs work. To create an inline upload from a python client (e.g., to write a test), it's most convenient to use the requests package, like this:
import requests requests.post("http://localhost:8080/data/cores/pc/uws.xml/D2hFEJ/parameters", {"UPLOAD": "stuff,param:upl"}, files = {"upl": open("zw.py")})
From within your core, use the file name (the name of the input key) and pull the file from the UWS working directory:
with open(os.path.join(inputTable.job.getWD(), "mykey")) as f: ...
Hint on debugging: dachs uwsrun doesn't check the state the job is in, it will just try to execute it anyway. So, if your job went into error and you want to investicate why, just take its id and execute something like:
dachs --traceback uwsrun i1ypYX
While DaCHS isn't actually intended to be an all-purpose server for web applications, sometimes you want to have some gadget for the browser that doesn't need VO protocols. For that, there is customPage, which is essentially a bare-bones nevow page. Hence, all (admittedly sparse) nevow documentation applies. Nevertheless, here are some hints on how to write a custom page.
First, in the RD, define a service allowing a custom page. These normally have a null core (the customPage renderer will ignore it either way):
<service id="ui" allowed="custom" customPage="res/registration.py"> <meta name="shortName">DOI registration</meta> <meta name="title">VOiDOI DOI registration web service</meta> <nullCore/> </service>
The python module referred to in customPage must define a MainPage nevow resource. The recommended pattern is like this:
from nevow import tags as T from gavo import web from gavo.imp import formal class MainPage( formal.ResourceMixin, web.CustomTemplateMixin, web.ServiceBasedPage): name = "custom" customTemplate = "res/registration.html" workItems = None @classmethod def isBrowseable(self, service): return True def form_ivoid(self, ctx, data={}): form = formal.Form() form.addField("ivoid", formal.String(required=True), label="IVOID", description="An IVOID for a registred VO resource"), form.addAction(self.submitAction, label="Next") return form def render_workItems(self, ctx, data): if self.workItems: return ctx.tag[T.li[[m for m in self.workItems]]] return "" def submitAction(self, ctx, form, data): self.workItems = ["Working on %s"%data["ivoid"]] return self
The formal.ResourceMixin lets you define and interpret forms. The web.ServiceBasedPage does all the interfacing to the DaCHS (e.g., credential checking and the like). The web.CustomTemplateMixin lets you get your template from a DaCHS template (cf. templating guide) from a resdir-relative directory given in the customTemplate attribute. For widely distributed code, you should additionally provide some embedded stan fallback in the defaultDocFactory attribute -- of course, you can also give the template in stan in the first place.
On form_invoid and submitAction see below.
This template could, for this service, look like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:n="http://nevow.com/ns/nevow/0.1"> <head> <title>VOiDOI: Registration</title> <n:invisible n:render="commonhead"/> </head> <body n:render="withsidebar"> <h1>VOiDOI: Register your VO resource</h1> <ul n:render="workItems"/> <p>VOiDOI lets you obtain DOIs for registered VO services.</p> <p>In the form below, enter the IVOID of the resource you want a DOI for. If the resource is known to our registry but has no DOI yet, the registred contact will be sent an e-mail to confirm DOI creation.</p> <n:invisible n:render="form ivoid"/> </body> </html>
Most of the details are explained in the templating guide. The exception is the form ivoid. This makes the formal.ResourceMixin call the form_ivoid in MainPage and put in whatever HTML/stan that returns. If nevow detects that the request already results from filling out the form, it will execute what your registred in addAction -- in this case, it's the submitAction method.
Important: anything you do within addAction runs within the (cooperative) server thread. If it blocks or performs a long computation, the server is blocked. You will therefore want to do non-trivial things either using asynchronous patterns or using deferToThread. The latter is less desirable but also easier, so here's how this looks like:
def submitAction(self, ctx, form, data): return threads.deferToThread( runRegistrationFor, data["ivoid"] ).addCallback(self._renderResponse ).addErrback(self._renderErrors) def _renderResponse(self, result): # do something to render a success message (or return Redirect) return self def _renderErrors(self, failure): # do something to render an error message, e.g., from # failure.getErrorMessage() return self
The embedding RD is available in the custom pages's global namespace as RD. Thus, the standard pattern for creating a read only table is:
with api.getTableConn() as conn: table = api.TableForDef(RD.getById("my_table"), connection=conn)
If you need write access, you would write:
with api.getWritableAdminConn() as conn: table = api.TableForDef(RD.getById("my_table"), connection=conn)
The RD attribute is not avalailable during module import. This is a bit annoying if you want to load resources from an RD-dependent place; this, in particular, applies to importing dependent modules. To provide a workaround, DaCHS calls a method initModule(**kwargs) after loading the module. You should accept arbitrary keyword arguments here so you code doesn't fail if we find we want to give initModule some further information.
The common case of importing a module from some RD-dependent place thus becomes:
from gavo import utils def initModule(**kwargs): global oai2datacite modName = RD.getAbsPath("doitransfrom/oai2datacite") oai2datacite, _ = utils.loadPythonModule(modName)
TODO: Update this for Datalink
Compared to images, the formats situation with spectra is a mess. Therefore, in all likelihood, you will need some sort of conversion service to VOTables compliant to the spectral data model. DaCHS has a facility built in to support you with doing this on the fly, which means you only need to keep a single set of files around while letting users obtain the data in some format convenient to them. The tutorial contains examples on how to generate metadata records for such additional formats.
First, you will have to define the "instance table", i.e., a table definition that will contain a DC-internal representation of the spectrum according to the data model. There's a mixin for that:
<table id="spectrum"> <mixin ssaTable="hcdtest">//ssap#sdm-instance</mixin> </table>
In addition to adding lots and lots of params, the mixin also defines two columns, spectral and flux; these have units and ucds as taken from the SSA metadata. You can add additional columns (e.g., a flux error depending on the spectral coordinate) as required.
The actual spectral instances can be built by sdmCores and delivered through DaCHS' product interface.
sdmCores, while potentially useful with common services, are intended to be used by the product renderer for dcc product table paths. They contain a data item that must yield a primary table that is basically sdm compliant. Most of this is done by the //ssap#feedSSAToSDM apply proc, but obviously you need to yield the spectral/flux pairs (plus potentially more stuff like errors, etc, if your spectrum table has more columns. This comes from the data item's grammar, which probably must always be an embedded grammar, since its sourceToken is an SSA row in a dictionary. Here's an example:
<sdmCore queriedTable="hcdtest" id="mksdm"> <data id="getdata"> <embeddedGrammar> <iterator> <code> labels = ("spectral", "flux") relPath = self.sourceToken["accref"].split("?")[-1] with self.grammar.rd.openRes(relPath) as inF: for ln in inF: yield dict(zip(labels,ln.split())) </code> </iterator> </embeddedGrammar> <make table="spectrum"> <parmaker> <apply procDef="//ssap#feedSSAToSDM"/> </parmaker> </make> </data> </sdmCore>
Note: spectral, flux, and possibly further items coming out of the iterator must be in the units units promised by the SSA metadata (fluxSI, spectralSI). Declarations to this effect are generated by the //ssap#sdm-instance mixin for the spectral and flux columns.
The sdmCores are always combined with the sdm renderer. It passes an accref into the core that gets turned into an row from queried table; this must be an "ssa" table (i.e., right now something that mixes in //ssap#hcd). This row is the input to the embedded data descriptor. Hence, this has no sources element, and you must have either a custom or embedded grammar to deal with this input.
Echelle spectrographs "fold" a spectrum into several orders which may be delivered in several independent mappings from spectral to flux coordinate. In this split form, they pose some extra problems, dealt with in an extra system RD, //echelle. For merged Echelle spectra, just use the standard SSA framework.
Echelle spectra have additional metadata that should end up in their SSA metadata table – these are things like the number of orders, the minimum and maximum (Echelle) order, and the like. To pull these columns into your metadata table, use the ssacols stream, for example like this:
<table id="ordersmeta" onDisk="True" adql="True"> <meta name="description">SSA metadata for split-order Flash/Heros Echelle spectra</meta> <mixin [...] statSpectError="0.05" spectralResolution="2.5e-11" >//ssap#hcd</mixin> <mixin calibLevel="1">//obscore#publishSSAPMIXC</mixin> <column name="localKey" type="text" ucd="meta.id" tablehead="Key" description="Local observation key." verbLevel="1"/> <STREAM source="//echelle#ssacols"/> </table>
You may want extra, locally-defined columns in your obscore tables. To support this, there are three hooks in obscore that you can exploit. The hooks are in userconfig.rd (see Userconfig RD in the operator's guide to where it is and how to get started with it) It helps to have a brief look at the //obscore RD (e.g., using dachs admin dumpDF //obscore) to get an idea what these hooks do.
Within the template userconfig.rd, there are already three STREAMs with ids starting with obscore.; these are referenced from within the system //obscore RD. Here's an somewhat more elaborate example:
<STREAM id="obscore-extracolumns"> <column name="fill_factor" description="Fill factor of the SED" verbLevel="20"/> </STREAM> <STREAM id="obscore-extrapars"> <mixinPar name="fillFactor" description="The SED's fill factor">NULL</mixinPar> </STREAM> <STREAM id="obscore-extraevents"> <property name="obscoreClause" cumulate="True"> , CAST(\\\\fillFactor AS real) AS fill_factor </property> </STREAM>
(to be on the safe side: there need to be four backslashes in front of fillFactor; this is just a backslash doubly-escaped. Sorry about this).
The way this is used in an actual mixin would be like this:
<table id="specs" onDisk="True"> <mixin ...>//ssap#hcd</mixin> <mixin ... (all the usual parameters) fillFactor="0.3">//obscore#publishSSAPMIXC</mixin> </table>
What's going on here? Well, obscore-extracolumns is easy – this material is directly inserted into the definition of the obscore view (see the table with id ObsCore within the //obscore RD). You could abuse it to insert other stuff than columns but probably should not.
The tricky part is obscore-extraevents. This goes into the //obscore#_publishCommon STREAM and ends up in all the publish mixins in obscore. Again, you could insert mixinPars and similar at this point, but the only thing you really must do is add lines to the big SQL fragment in the obscoreClause property that the mixin leaves in the table. This is what is made into the table's contribution to the big obscore union. Just follow the example above and, in particular, always CAST to the type you have in the metadata, since individual tables might have NULLs in the values, and you do not want misguided attempts by postgres to do type inference then.
If you actually must know why you need to double-escape fillFactor and what the magic with the cumulate="True" is, ask.
Finally, obscore-extrapars directly goes into a core component of obscore, one that all the various publish mixins there use. Hence, all of them grow your functionality. That is also why it is important to give defaults (i.e., element content) to all mixinPars you give in this way – without them, all those other publish mixins would fail unless their applications in the RDs were fixed.
If you change %#obscore-extracolumns, all the statement fragments contributed by the obscore-published tables need to be fixed. To spare you the effort of touching a potentially sizeable number of RDs, there's a data element in //obscore that does that for you; so, after every change just run:
dachs imp //obscore refreshAfterSchemaUpdate
This may fail if you didn't clean up properly after deleting a resource that once contributed to ivoa.obscore. In that case you'll see an error message like:
*** Error: table u'whatever.main' could not be located in dc_tables
In that case, just tell DaCHS to forget the offending table:
dachs purge whatever.main
Another problem can arise when a table once was published to obscore but now no longer is while still existing. DaCHS in that case will still have an entry for the table in ivoa._obscoresources, which results in an error like:
Table definition of whatever.main> has no property 'obscoreClause' set
The fastest way to fix this situation is to drop the offending line in the database manually:
psql gavo -c "delete from ivoa._obscoresources where tablename='whatever.main'"
Note
Before DaCHS 2.6.2, you had to import CustomRowIterator from gavo.grammars.customgrammar rather than gavo.api.
A custom grammar simply is a python module located within a resource directory defining a row iterator class derived from gavo.api.CustomRowIterator. This class must be called RowIterator. You want to override the _iterRows method. It will have to yield row dictionaries, i.e., dictionaries mapping string keys to something (preferably strings, but you will usually get away with returning complete values even without fancy rowmakers).
So, a custom grammar module could look like this:
from gavo.api import CustomRowIterator class RowIterator(CustomRowIterator): def _iterRows(self): for i in xrange(int(self.sourceToken)): yield {'index': i, 'square': i**2}
This would be used with a data material like:
<sources><item>4</item><item>40</item></sources> <customGrammar module="res/sillygrammar"/>
– self.sourceToken simply contains whatever the sources element produces. One RowIterator will be constructed for each item.
Do not override magic methods, since you may lose row filters, sourceFields, and the like if you do. An exception is the constructor. If you must, you can override it, but you must call the parent constructor, like this:
class RowIterator(CustomRowIterator): def __init__(self, grammar, sourceToken, sourceRow=None): CustomRowIterator.__init__(self, grammar, sourceToken, sourceRow) <your code>
In practice (i.e., with <sources pattern="*"/>) self.sourceToken will often be a file name. When you call makeData manually and pass a forceSource argument, its value will show up in self.sourceToken instead.
Also look into EmbeddedGrammar, which may be a more convenient way to achieve the same thing.
A fairly complex example for a custom grammar is a provisional Skyglow grammar .
It is highly recommended to keep track of the current position so DaCHS can give more useful error messages. When an error occurs, DaCHS will call the iterator's getLocator method. This returns an arbitrary string, where obviously it's a good idea if that leads users to somewhere close to where the problem has shown up. Here's a custom grammar reading space-separated key-value pairs from a file:
class RowIterator(CustomRowIterator): def _iterRows(self): self.lineNumber = 0 with open(self.sourceToken) as f: for self.lineNumber, line in enumerate(f): yield dict(zip(["key", "value"], line.split(" ", 1))) def getLocator(self): return f"line {self.lineNumber}"
Note that getLocator does not include the source file name; that will be inserted into the error message by DaCHS.
For development, it may be convenient to execute your custom grammar as a python module. To enable that, just append a:
if __name__=="__main__": import sys from gavo.api import CustomGrammar ri = RowIterator(CustomGrammar(None), sys.argv[1]) for row in ri: print(row)
to your module. You can then run things like:
python res/mygrammar.py data/inhabitedplanet.fits
and see the rows as they're generated.
A row iterator will be instantiiated for each source processed. Thus, you should usually not perform expensive operations in the constructor unless they depend on sourceToken. Instead, you should rather define a function makeDataPack in the module. Whatever is returned by this function is available as self.grammar.dataPack in the row iterator.
The function receives an instance of the customGrammar as an argument. This means you can access the resource descriptor and properties of the grammar. As an example of how this could be used, consider this RD fragment:
<table id="defTable"> ... </table> <customGrammar module="res/grammar"> <property name="targetTable">defTable</property> </customGrammar>
Then you could have the following in res/grammar.py:
def makeDataPack(grammar): return grammar.rd.getById(grammar.getProperty("targetTable"))
and access the table in the row iterator.
If you want to do Debugging outside of DaCHS in custom grammars that require data packs, you need to be a bit more careful when you construct your custom grammar, as it will need a proper RD as its parent. This means you will have hard-code your RD id, perhaps like this:
if __name__=="__main__": import sys from gavo.api import CustomGrammar grammar = CustomGrammar(api.getRD("MYRES/q")) ri = RowIterator(grammar, sys.argv[1]) ...
With normal grammars, all rows are fed to all rowmakers of all makes within a data object. The rowmakers can then decide to not process a given row by raising IgnoreThisRow or using the trigger mechanism. However, when filling complex data models with potentially dozens of tables, this becomes highly inefficient.
When you write your own grammars, you can do better. Instead of just yielding a row from _iterRows, you yield a pair of a role (as specified in the role attribute of a make element) and the row. The machinery will then pass the row only to the feeder for the table in the corresponding make.
Currently, the only way to define such a dispatching grammar is to use a custom grammar or an embedded grammar. For these, just change your _iterRows and say isDispatching="True" in the customGrammar element. If you implement getParameters, you can return either pairs of role and row or just the row; in the latter case, the row will be broadcast to all parmakers.
Special care needs to be taken when a dispatching grammar parses products, because the product table is fed by a special make inserted from the products mixin. This make of course doesn't see the rows you are yielding from your dispatching grammar. This means that without further action, your files will not end up in the product table at all. In turn, getproducts will return 404s instead of your products.
To fix this, you need to explicitly yield the rows destined for the products table with a products role, from within your grammar. Where the grammar yield rows for the table with metadata (i.e., rows that actually contain the fields with prodtblAccref, prodtblPath, etc), yield to the products table, too, like this: yield ("products", newRow).
As much as it is desirable to describe tables in a declarative manner, there are quite a few cases in which some imperative code helps a lot during table building or teardown. Resource descriptors let you embed such imperative code using script elements. These are children of the make elements since they are exclusively executed when actually importing into a table.
Currently, you can enter scripts in SQL and python, which may be called at various phases during the import.
In SQL scripts, you separate statements with semicolons. Note that no statements in an SQL script may fail since that will invalidate the transaction. Use the AC_SQL language to simply ignore failures.
You can use table macros in the SQL scripts to parametrize them; the most useful among those probably is \qName containing the fully qualified name of the table being processed.
You cannot easily produce output from SQL scripts. If you want to give user feedback in long-running scripts, use RAISE NOTICE in procedures or, outside of procedures:
do $$BEGIN raise notice 'My message'; END$$;
Python scripts can be indented by a constant amount.
The table object currently processed is accessible as table. In particular, you can use this to issue queries using table.connection.execute(query, arguments) (parallel to dbapi.execute) and to delete rows using table.deleteMatching(condition, pars). The current RD is accessible as table.tableDef.rd, so you can access items from the RD as table.tableDef.rd.getById("some_id"), and the recommended way to read stuff from the resource directory is table.tableDef.rd.openRes("res/some_file).
Some types of scripts may have additional names available. Currently:
The type of a script corresponds to the event triggering its execution. The following types are defined right now:
Note that preImport, preIndex, and postCreation scripts are not executed when the make's table is being updated, in particular, in data items with updating="True". The only way to run scripts in such circumstances is to use newSource and sourceDone scripts.
This snippet sets a flag when importing some source (in this case, that's an RD, so we can access sourceToken.sourceId:
<script type="newSource" lang="python" id="markDeleted"> table.connection.execute("UPDATE %s SET deleted=True" " WHERE sourceRD=%%(sourceRD)s"%id, {"sourceRD": sourceToken.sourceId}) </script>
This is a hacked way of ensuring some sort of referential integrity: When a table containing "products" is dropped, the corresponding entries in the products table are deleted:
<script type="beforeDrop" lang="SQL" name="clean product table"> DELETE FROM products WHERE sourceTable='\qName' </script>
Note that this is actually quite hazardous because if the table is dropped in any way not using the make element in the RD, this will not be executed. It's usually much smarter to tell the database to do the housekeeping. Rules are typically set in postCreation scripts:
<script type="postCreation" lang="SQL"> CREATE OR REPLACE RULE cleanupProducts AS ON DELETE TO \qName DO ALSO DELETE FROM products WHERE key=OLD.accref </script>
The decision if such arrangements are made before the import, before the indexing or after the table is finished needs to be made based on the script's purpose.
Another use for scripts is SQL function definition:
<script type="postCreation" lang="SQL" name="Define USNOB matcher"> CREATE OR REPLACE FUNCTION usnob_getmatch(alpha double precision, delta double precision, windowSecs float ) RETURNS SETOF usnob.data AS $$ DECLARE rec RECORD; BEGIN FOR rec IN (SELECT * FROM usnob.data WHERE q3c_join(alpha, delta, raj2000, dej2000, windowSecs/3600.)) LOOP RETURN NEXT rec; END LOOP; END; $$ LANGUAGE plpgsql; </script>
You can also load data, most usefully in preIndex scripts (although beforeImport would work as well here):
<script type="preIndex" lang="SQL" name="create USNOB-PPMX crossmatch"> SET work_mem=1000000; INSERT INTO usnob.ppmxcross ( SELECT q3c_ang2ipix(raj2000, dej2000) AS ipix, p.localid FROM ppmx.data AS p, usnob.data AS u WHERE q3c_join(p.alphaFloat, p.deltaFloat, u.raj2000, u.dej2000, 1.5/3600.)) </script>
Text needing some amount of markup within DaCHS is almost always input as ReStructuredText (RST). The source versions of the DaCHS documentation give examples for such markup, and DaCHS users should at least briefly skim the ReStructuredText primer.
DaCHS contains some RST extensions. Those specifically targeted at writing DALI-compliant examples of them are discussed with the examples renderer
Generally useful extensions include:
This text role formats the argument as a link into ADS when rendered as HTML. For technical reasons, this currently ignores the configured ADS mirror and always uses the Heidelberg one. Complain if this bugs you. To use it, you'd write:
See also :bibcode:`2011AJ....142....3H`.
Extensions for writing DaCHS-related documentation include:
(if you add anything here, please also amend the document source's README).
User extension code (e.g., custom cores, custom grammars, processors) for DaCHS should only use DaCHS functions from its api as described below. We will try to keep it stable and at any rate warn in the release notes if we change it. For various reasons, the module also contains a few modules. These, and in particular their content, are not part of the API.
Note that this ”api” at this point is not what is in the namespace of rowmakers, rowfilters, and similar in-RD procedures. We do not, at this point, recommend importing the api there. If you do it anyway, we'd appreciate if you told us.
Before using non-API DaCHS functions, please inquire on the dachs-support mailing list (cf. http://docs.g-vo.org/DaCHS).
To access DaCHS API functions, say:
from gavo import api
(perhaps adding an as dachsapi if there is a risk of confusion) and reference symbols with the explicit module name (i.e., api.makeData rather than picking individual names) in order to help others understand what you've written.
In this chapter, we first give the functions that code in row makers see and then document the api available to extension code.
In principle, you can use arbitrary python expressions in var, map and proc elements of row makers. In particular, the namespace in which these expressions are executed contains math, os, re, time, datetime, and urllib.parse (for urllib.parse.quote, in particular) modules as well as gavo.base, gavo.utils, and gavo.coords; in addition, there's NaN (which simply is float('nan')).
However, much of the time you will get by using the following functions that are immediately accessible in the namespace:
returns a datetime.datetime instance for a fractional Besselian year.
This uses the formula given by Lieske, J.H., A&A 73, 282 (1979).
returns the mean value between two values.
Beware: Integer division done here for the benefit of datetime calculations.
>>> computeMean(1.,3) 2.0 >>> computeMean(datetime.datetime(2000, 10, 13), ... datetime.datetime(2000, 10, 12)) datetime.datetime(2000, 10, 12, 12, 0)
converts a float angle in degrees to a sexagesimal string.
This takes a lot of optional arguments:
>>> degToDms(-3.24722, "", 0, True, True) '-031449' >>> degToDms(0) '+0 00 00.00' >>> degToDms(0, addSign=False) '0 00 00.00' >>> degToDms(-0.25, sepChar=":") '-0:15:00.00' >>> degToDms(-23.50, secondFracs=4) '-23 30 00.0000' >>> "%.4f"%dmsToDeg(degToDms(-25.6835, sepChar=":"), sepChar=":") '-25.6835'
converts a float angle in degrees to an time angle (hh:mm:ss.mmm).
This takes a lot of optional arguments:
>>> degToHms(0, sepChar=":") '00:00:00.000' >>> degToHms(122.057, secondFracs=1) '08 08 13.7' >>> degToHms(122.057, secondFracs=1, truncate=True) '08 08 13.6' >>> degToHms(-0.055, secondFracs=0) '-00 00 13' >>> degToHms(-0.055, secondFracs=0, truncate=True) '-00 00 13' >>> degToHms(-1.056, secondFracs=0) '-00 04 13' >>> degToHms(-1.056, secondFracs=0) '-00 04 13' >>> degToHms(359.9999999) '24 00 00.000' >>> degToHms(359.2222, secondFracs=4, sepChar=":") '23:56:53.3280' >>> "%.4f"%hmsToDeg(degToHms(256.25, secondFracs=9)) '256.2500'
returns the degree minutes seconds-specified dmsAngle as a float in degrees.
>>> "%3.8f"%dmsToDeg("45 30.6") '45.51000000' >>> "%3.8f"%dmsToDeg("45:30.6", ":") '45.51000000' >>> "%3.8f"%dmsToDeg("-45 30 7.6") '-45.50211111' >>> dmsToDeg("junk") Traceback (most recent call last): ValueError: Invalid dms value with sepChar None: 'junk'
returns an accref from a standard DaCHS PubDID.
This is basically the inverse of getStandardPubDID. It will raise NotFound if pubdid "looks like a URI" (implementation detail: has a colon in the first 10 characters) and does not start with ivo://<authority>/~?. If it's not a URI, we assume it's a local accref and just return it.
The function does not check if the remaining characters are a valid accref, much less whether it can be resolved.
authBase's default will reflect you system's settings on your installation, which probably is not what's given in this documentation.
returns a datalink URL for the product referenced through accref with the datalink service dlSvc.
This assumes that dlSvc uses the standard DaCHS pubDIDs. dlSvc needs to be the service element.
A typical use is in a metaMaker and would look like this:
getDatalinkMetaLink(rd.getById("dl"), descriptor.accref)
returns the file stem of a file path.
The base name is what remains if you take the base name and split off extensions. The extension here starts with the last dot in the file name, except up to one of some common compression extensions (.gz, .xz, .bz2, .Z, .z) is stripped off the end if present before determining the extension.
>>> getFileStem("/foo/bar/baz.x.y") 'baz.x' >>> getFileStem("/foo/bar/baz.x.gz") 'baz' >>> getFileStem("/foo/bar/baz") 'baz'
returns a unix-compatible file name for an access reference.
The file name will not contain terrible characters, let alone slashes. This is used to, e.g., keep all previews in one directory.
returns absath relative to the DaCHS inputsDir.
If absPath is not below inputsDir, a ValueError results. On liberalChars, we see the function getRelativePath.
In rowmakers and rowfilters, you'll usually use the macro \inputRelativePath that inserts the appropriate code.
returns a query meta object from somewhere up the stack.
This is for row makers running within a service. This can be used to, e.g., enforce match limits by writing getQueryMeta()["dbLimit"].
returns rest if fullPath has the form rootPath/rest and raises a ValueError otherwise.
This accepts either strings or pathlib.Path-s and returns an object of the type of fullPath (pathlib functionality since 2.9.3).
Pass liberalChars=False to make this raise a ValueError when URL-dangerous characters (blanks, amperands, pluses, non-ASCII, and similar) are present in the result. This is mainly for products.
returns the standard DaCHS PubDID for path.
The publisher dataset identifier (PubDID) is important in protocols like SSAP and obscore. If you use this function, the PubDID will be your authority, the path component ~, and the inputs-relative path of the input file as the parameter.
path can be relative, in which case it is interpreted relative to the DaCHS inputsDir.
You can define your PubDIDs in a different way, but you'd then need to provide a custom descriptorGenerator to datalink services (and might need other tricks). If your data comes from plain files, use this function.
In a rowmaker, you'll usually use the standardPubDID macro.
returns a WCSAxis instance from an axis index and a FITS header.
If the axis is mentioned in a transformation matrix (CD or PC), a ValueError is raised (use forceSeparable to override).
The axisIndex is 1-based; to get a transform for the axis described by CTYPE1, pass 1 here.
The object returned has methods like pixToPhys, physToPix (and their pix0 brethren), and getLimits.
Note that at this point WCSAxis only supports linear transforms (it's a DaCHS-specific implementation). We'll extend it on request.
returns the time angle (h m s.decimals) as a float in degrees.
>>> "%3.8f"%hmsToDeg("22 23 23.3") '335.84708333' >>> "%3.8f"%hmsToDeg("22:23:23.3", ":") '335.84708333' >>> "%3.8f"%hmsToDeg("222323.3", "") '335.84708333' >>> hmsToDeg("junk") Traceback (most recent call last): ValueError: Invalid time with sepChar None: 'junk'
returns a time span in hours in sexagesmal time (h:m:s).
The optional arguments are as for degToHms.
>>> hoursToHms(0) '00:00:00' >>> hoursToHms(23.5) '23:30:00' >>> hoursToHms(23.55) '23:33:00' >>> hoursToHms(23.525) '23:31:30' >>> hoursToHms(23.553, secondFracs=2) '23:33:10.80' >>> hoursToHms(123.553, secondFracs=2) '123:33:10.80'
iterates over (physLineNumber, line) in f with some usual conventions for simple data files.
You should use this function to read from simple configuration and/or table files that don't warrant a full-blown grammar/rowmaker combo. The intended use is somewhat like this:
with open(rd.getAbsPath("res/mymeta")) as f: for lineNumber, content in iterSimpleText(f): try: ... except Exception, exc: sys.stderr.write("Bad input line %s: %s"%(lineNumber, exc))
The grammar rules are, specifically:
returns the string literal with all blanks removed.
This is useful when numbers are formatted with blanks thrown in.
Nones are passed through.
imports fqName and returns the (module, spec).
Do not use this function to import DC-internal modules; this may mess up singletons since you could bypass python's mechanisms to prevent multiple imports of the same module.
fqName is a fully qualified path to the module without the .py, unless relativeTo is given, in which case it is interpreted as a relative path. This for letting modules in resdir/res import each other by saying:
mod, _ = api.loadPythonModule("foo", relativeTo=__file__)
The python path is temporarily amended with the path part of the source module.
If the module is in /var/gavo/inputs/foo/bar/mod.py, Python will know the module as foo_bar_mod (the last two path components are always added). This is to keep Python from using the module when someone writes import mod.
returns a fully qualified URL for a rooted local part.
This will reflect the http/https access mode unless you pass canonical=True, in which case [web]serverURL will be used unconditionally.
returns an (equatorial) IAU identifier for an object at long and lat.
The rules are given on https://cds.unistra.fr/Dic/iau-spec.html
The prefix, including the system identifier, you have to pass in. You cannot build identifiers using only minutes precision. If you want to include sub-arcsec precision, pass in longSec and/or latSec (the number of factional seconds to preserve).
returns the URL at which a product can be retrieved.
key can be an accref string or an RAccref.
Note that this is using the preferred host as the basic URL. If you are running dual-protocol http/https and you ingest results of this function into the database, it is advisable to cut off the scheme part of the URI (e.g., split(":", 1)[-1]). In data products served, DaCHS will then put in the scheme used for the query.
DaCHS (almost always) also allows full http URIs as accrefs. These will be returned unchanged.
returns a rooted local part for a server-internal URL.
uri itself needs to be server-absolute; a leading slash is recommended for clarity but not mandatory.
returns a datetime.datetime instance for a modified julian day number.
Beware: This loses a couple of significant digits due to transformation to jd.
converts the various forms angles might be encountered to degrees.
format is one of hms, dms, fracHour. For sexagesimal/time angles, you can pass a sepChar (default: split at blanks) that lets you specify what separates hours/degrees, minutes, and seconds.
>>> "%.8f"%(parseAngle("23 59 59.95", "hms")) '359.99979167' >>> "%10.5f"%parseAngle("-20:31:05.12", "dms", sepChar=":") ' -20.51809' >>> "%010.6f"%parseAngle("21.0209556", "fracHour") '315.314334'
returns a python boolean from some string.
Boolean literals are strings like True, false, on, Off, yes, No in some capitalization.
returns a datetime.date object of literal parsed according to the strptime-similar format.
The function understands the special dateFormat !!jYear (stuff like 1980.89).
returns a float from a literal, or None if literal is None or an empty string.
Temporarily, this includes a hack to work around a bug in psycopg2.
>>> parseFloat(" 5e9 ") 5000000000.0 >>> parseFloat(None) >>> parseFloat(" ") >>> parseFloat("wobbadobba") Traceback (most recent call last): ValueError: could not convert string to float: 'wobbadobba'
returns a datetime object for a ISO time literal.
There's no real timezone support yet, but we accept and ignore various ways of specifying UTC.
By default, this uses plain python datetime because it usually covers a large date range than the time module. The downside is that it does not know about leap seconds. Pass useTime=True to go through time tuples, which know how to deal with them (but may not deal with dates far in the past or future).
>>> parseISODT("1998-12-14") datetime.datetime(1998, 12, 14, 0, 0) >>> parseISODT("1998-12-14T13:30:12") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("1998-12-14T13:30:12Z") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("1998-12-14T13:30:12.224Z") datetime.datetime(1998, 12, 14, 13, 30, 12, 224000) >>> parseISODT("19981214T133012Z") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("19981214T133012+00:00") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("2016-12-31T23:59:60") Traceback (most recent call last): ValueError: second must be in 0..59 >>> parseISODT("2016-12-31T23:59:60", useTime=True) datetime.datetime(2017, 1, 1, 0, 0) >>> parseISODT("junk") Traceback (most recent call last): ValueError: Bad ISO datetime literal: junk (required format: yyyy-mm-ddThh:mm:ssZ)
returns an int from a literal, or None if literal is None or an empty string.
>>> parseInt("32") 32 >>> parseInt("") >>> parseInt(None)
returns a datetime.timedelta object for literal parsed according to format.
For format, you can the magic values !!secondsSinceMidnight, !!decimalHours or a strptime-like spec using the H, M, and S codes.
>>> parseTime("89930", "!!secondsSinceMidnight") datetime.timedelta(days=1, seconds=3530) >>> parseTime("23.4", "!!decimalHours") datetime.timedelta(seconds=84240) >>> parseTime("3.4:5", "%H.%M:%S") datetime.timedelta(seconds=11045) >>> parseTime("20:04", "%H:%M") datetime.timedelta(seconds=72240)
returns a datetime.datetime object from a literal parsed according to the strptime-similar format.
A ValueError is raised if literal doesn't match format (actually, a parse with essentially DALI-standard ISO representation is always tried)
returns default if literal is nullLiteral, else baseParser(literal).
If checker is non-None, it must be a callable returning True if its argument is a null value.
nullLiteral is compared against the unprocessed literal (usually, a string). The intended use is like this (but note that often, a nullExcs attribute on a rowmaker map element is the more elegant way:
>>> parseWithNull("8888.0", float, "8888") 8888.0 >>> print(parseWithNull("8888", float, "8888")) None >>> print(parseWithNull("N/A", int, "N/A")) None
returns key as getproduct URL-part.
If key is a string, it is quoted as a naked accref so it's usable as the path part of an URL. If it's an RAccref, it is just stringified. The result is something that can be used after getproduct in URLs in any case.
returns val*factor+offset if val is not None, None otherwise.
This is when you want to manipulate a numeric value that may be NULL. It is a somewhat safer alternative to using nullExcs with scaled values.
returns a modified julian date made from some datetime representation.
Valid representations include:
A facade for an ADQL-based async TAP job.
Construct it with the URL of the async endpoint and a query.
Alternatively, you can give the endpoint URL and a jobId as a keyword parameter. This only makes sense if the service has handed out the jobId before (e.g., when a different program takes up handling of a job started before).
See adql.html for details.
A file processor for calibrating FITS frames using astrometry.net.
It might provide calibration for "simple" cases out of the box. You will usually want to override some solver parameters. To do that, define class attributes sp_<parameter name>, where the parameters available are discussed in helpers.anet's docstring. sp_indices is one thing you will typically need to override.
To use SExtractor rather than anet's source extractor, override sexControl, to use an object filter (see anet.getWCSFieldsFor), override the objectFilter attribute.
To add additional fields, override _getHeader and call the parent class' _getHeader method. To change the way astrometry.net is called, override the _solveAnet method (it needs to return some result anet.of getWCSFieldsFor) and call _runAnet with your custom arguments for getWCSFieldsFor.
See processors#astrometry-net for details.
raised to initiate an authentication request.
Authenticates are optionally constructed with the realm the user shall authenticate in. If you leave the realm out, the DC-wide default will be used.
A constant, valued 1.380649e-23
is raised when some code could not be compiled.
BadCodes are constructed with the offending code, a code type, the original exception, and optionally a hint and a position.
OpenSSL API wrapper.
is raised when no FITS header was generated by a HeaderProcessor.
Specifically, this is what gets raised when _getHeader returns None.
A definition of the "active" part of a service.
A core will receive input from a renderer in the form of a svcs.CoreArgs (see Core Args). A core will return a table or perhaps directly data as discussed in DaCHS' Service Interface .
The abstract core element will never occur in resource descriptors. See Cores Available for concrete cores. Use the names of the concrete cores in RDs.
A Grammar with a user-defined row iterator taken from a module.
See the Writing Custom Grammars (in the reference manual) for details.
is a base class for custom row iterators.
Implement at least _iterRows. And pass on any keyword args to __init__ to the next constructor.
Base class for error exceptions.
An interface to a table in the database.
These are usually created using api.TableForDef(tableDef) with a table definition obtained, e.g., from an RD, saying onDisk=True.
When constructing a DBTable, it will be created if necessary (unless create=False is passed), but indices or primary keys keys will only be created on a call to importFinished.
The constructor does not check if the schema of the table on disk matches the tableDef. If the two diverge, all kinds of failures are conceivable; use dachs val -c to make sure on-disk structure match the RDs.
You can pass a nometa boolean kw argument to suppress entering the table into the dc_tables table.
You can pass an exclusive boolean kw argument; if you do, the iterQuery (and possibly similar methods in the future) method will block concurrent writes to the selected rows ("FOR UPDATE") as long as the transaction is active.
DbTables will run preCreation, preIndex, postCreation, and beforeDrop scripts, both from the table definition and the make they are being created from. No scripts except beforeDrop are run when an existing table is operated on from an updating dd.
The main attributes (with API guarantees) include:
A constant, valued 0.017453292519943295
A constant, valued 0.0002777777777777778
A constant, valued 2.7777777777777776e-07
A collection of tables.
Data, in essence, is the instantiation of a DataDescriptor.
It is what makeData returns. In typical one-table situations, you just want to call the getPrimaryTable() method to obtain the table built.
These also have an attribute contributingMetaCarriers, a list of base.MetaCarrier-s used by votablewrite to create Data Origin INFO-s. By default, that's the first table. You can add to that attribute
is raised when something is wrong with a data set.
When facing the web, these yield HTTP status 406.
A value that compares equal based on RE matches.
This is a helper mainly for GetHasXPathsTests. Use an instance of this class to check against an RE rather than a plain string.
>>> EqualingRE("(ab)+") == "ababab" True >>> EqualingRE("(ab)+$") == "ababa" False >>> EqualingRE("(ab)+$") != "ababa$" True >>> "ababa" == EqualingRE("(ab)+$") False
The base class for all exceptions that can be expected to escape a module.
Apart from the normal message, you can give a hint constructor argument.
An abstract base for a source file processor.
In concrete classes, you need to define a process(accref) method receiving a source as returned by the dd (i.e., usually an inputsDir-relative file name).
You can override the method _createAuxiliaries(dataDesc) to compute things like source catalogues, etc. Thus, you should not need to override the constructor.
These objects are usually constructed thorough api.procmain as discussed in processing.html.
raised to generate an HTTP 403 response.
A base for processors doing FITS header manipulations.
The processor builds naked FITS headers alongside the actual files, with an added extension .hdr (or whatever is in the headerExt attribute). The presence of a FITS header indicates that a file has been processed. The headers on the actual FITS files are only replaced if necessary.
The basic flow is: Check if there is a header. If not, call _getNewHeader(srcFile) -> hdr. Store hdr to cache. Insert cached header in the new FITS if it's not there yet.
You have to implement the _getHeader(srcName) -> pyfits header object function. It must raise an exception if it cannot come up with a header. You also have to implement _isProcessed(srcName) -> boolean returning True if you think srcName already has a processed header.
This basic flow is influenced by the following opts attributes:
- reProcess -- even if a cache is present, recompute header values
- applyHeaders -- actually replace old headers with new headers
- reHeader -- even if _isProcessed returns True, write a new header
- compute -- perform computations
The idea is that you can:
- generate headers without touching the original files: proc
- write all cached headers to files that don't have them proc --apply --nocompute
- after a bugfix force all headers to be regenerated: proc --reprocess --apply --reheader
All this leads to the messy logic. Sorry 'bout this.
can be raised by user code to indicate that a row should be skipped when building a table.
Note: To skip an entire source, raise SkipThis (usually in a rowfilter or so).
Also note that the non-code way to skip things, Triggers, is preferred when you don't already use code.
An base for processors doing simple FITS manipulations to the primary FITS header.
To define these, override _isProcessed(self, srcName, hdr) and _changeHeader(self, hdr).
_changeHeader can change the pyfits header hdr in place. It will then be replaced on the actual file.
For complex operations, it is probably advisable to use HeaderProcessor which gives you a two-step process of first having the detached headers that you can check before applying them.
Error related to database integrity.
A constant, valued 299792458.0
is raised if an attribute literal is somehow bad.
LiteralParseErrors are constructed with the name of the attribute that was being parsed, the offending literal, and optionally a parse position and a hint.
Signature: MS(structClass, **kwargs)
creates a parentless instance of structClass with **kwargs.
You can pass in a parent_ kwarg to force a parent, and a ctx_ if you need a parse context.
This is the preferred way to create struct instances in DaCHS, as it will cause the sequence of completers and validators run. Use it like this:
MS(rscdef.Column, name="ra", type="double precision)
A constant, valued nan
is raised when a meta key does not exist (and raiseOnFail is True).
is raised when something is asked for something that does not exist.
lookedFor can be an arbitrary object, so be careful when your repr it -- that may be long.
A table that has outputFields for columns.
Cores always have one of these, but they are implicitly defined by the underlying database tables in case of dbCores and such.
Services may define output tables to modify what is coming back from the core. Note that this usually only affects the output to web browsers. To use the output table also through VO protocols (and when producing VOTables, FITS files, and the like), you need to set the service's votableRespectsOutputTable property to True.
A constant, valued 6.62607015e-34
An Observer spitting out most info to the screen.
This is to configure the UI. Enable it by calling api.PlainUI(api.ui).
A file processor for generating previews.
For these, define a method getPreviewData(accref) -> string returning the raw preview data.
A class keeping information on the query environment.
It is constructed with a plain dictionary (there are alternative constructors for t.w requests are below) mapping certain keys (you'll currently have to figure out which from the source) to values, mostly strings, except for the keys listed in listKeys, which should be sequences of strings.
If you pass an empty dict, some sane defaults will be used. You can get that "empty" query meta as common.emptyQueryMeta, but make sure you don't mutate it.
QueryMetas constructed from request will have the user and password items filled out.
If you're using formal, you should set the formal_data item to the dictionary created by formal. This will let people use the parsed parameters in templates.
You can do some amount of auth through queryMeta by using getAuthUser(); this may be an expensive operation, though, so only do it when you need to, in particular only if user_pretend is nonempty. It's only available if queryMeta was built from a request and simply hands through there.
A string-like thing basically representing SQL delimited identifiers.
This has some features that make handling these relatively painless in ADQL code.
The most horrible feature is that these hash and compare as their embedded names, except to other QuotedNamess.
SQL-92, in 5.2, roughly says:
delimited identifiers compare literally with each other, delimited identifiers compare with regular identifiers after the latter are all turned to upper case. But since postgres turns everything to lower case, we do so here, too.
>>> n1, n2, n3 = QuotedName("foo"), QuotedName('foo"l'), QuotedName("foo") >>> n1==n2,n1==n3,hash(n1)==hash("foo") (False, True, True) >>> print(n1, n2) "foo" "foo""l" >>> "Foo"<n1, n1>"bar" (False, True) >>> QuotedName('7oh-no"+rob').makeIdentifier() 'id7oh2dno222brob'
A resource descriptor.
RDs collect all information about how to parse a particular source (like a collection of FITS images, a catalogue, or whatever), about the database tables the data ends up in, and the services used to access them.
This is the root element of all RDs.
To give your schema a utype, set a utype meta on resource.
To set the schema_index in TAP_SCHEMA, put some integer to a schema-rank meta; lower-ranked schemas are displayed further up in supporting clients (since version 2.9.3).
is raised when an RD cannot be located.
is raised when something decides it can come up with an error message that should be presented to the user as-is.
UIs should, consequently, just dump the payload and not try adornments. The content should be treated as a unicode string.
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
raised to redirect a user agent to a different resource (HTTP 303).
SeeOthers are constructed with the destination URL that can be relative (to webRoot) or absolute (starting with http).
They are essentially like WebRedirect, except they put out a 303 instead of a 301.
the base class for renderers turning service-based info into character streams.
You will need to provide some way to give nevowc.TemplatedPage templates, either by supplying a loader or (usually preferably) mixing in CustomTemplateMixin -- or just override renderHTTP to make do without templates.
You can set an attribute checkedRenderer=False for renderers that are "generic" and do not need to be enumerated in the allowed attribute of the underlying service ("meta renderers").
You can set a class attribute openRenderer=True to make a renderer work even on restricted services (which may make sense for stuff like metadata inspection).
This class overrides t.w.template's renderer so renderers defined in the service (e.g., via an RD) are found, too.
is caught in rsc.makeData. You can raise this at any place during source processing to skip the rest of this source but the go on.
You should pass something descriptive as message so upstream can potentially report something is skipped and why.
Note: in a rowmaker, you probably usually want to raise IgnoreThisRow instead; it's rare that you want to ignore the rest of a source just because you don't like a row.
is raised when some syntax error occurs during a source parse.
They are constructed with the offending input construct (a source line or similar, None in a pinch) and the result of the row iterator's getLocator call.
An Observer swallowing infos, warnings, and the like.
This is to configure the UI. Enable it by calling api.StingyPlainUI(api.ui).
is raised if an error occurs during the construction of structures.
You can construct these with pos; this is an opaque object that, when stringified, should expand to something that gives the user a rough idea of where something went wrong.
Since you will usually not know where you are in the source document when you want to raise a StructureError, xmlstruct will try to fill pos in when it's still None when it sees a StructureError. Thus, you're probably well advised to leave it blank.
Signature: TAItoTT(tai)
returns TDT for a (datetime.datetime) TAI.
Signature: TTtoTAI(tdt)
returns TAI for a (datetime.datetime) TDT.
A definition of a table, both on-disk and internal.
Some attributes are ignored for in-memory tables, e.g., roles or adql.
Properties for tables:
If you give multiple data model names or URIs, the sequences of names and URIs must be identical (in particular, each name needs a URI). But, really, both of these are on the way out.
Somewhat inconsistently, to set a table's utype if you have to, set its utype meta.
Tables within a schema can have a rank, with lower ranks displayed first in clients that support that. So set that rank, put a positive number into the table-rank meta (since version 2.9.3).
Signature: TableForDef(tableDef, suppressIndex=False, parseOptions=<ParseOptions validateRows=False maxRows=None keepGoing=False>, **kwargs)
returns a table instance suitable for holding data described by tableDef.
This is the main interface to table instancation.
suppressIndex=True can be used to suppress index generation on in-memory tables with primary keys. Use it when you are sure you will not need the index (e.g., if staging an on-disk table).
See the function getParseOptions for what you can pass in as parseOptions; arguments there can also be used here.
raised to generate an HTTP 404 response.
A simple interface to querying the database through a connection managed by someone else.
This is typically used as in:
with base.getTableConn() as conn: q = UnmanagedQuerier(conn) ...
This contains numerous methods abstracting DB functionality a bit. Documented ones include:
A context object for writing VOTables.
The constructor arguments work as keyword arguments to getAsVOTable. Some other high-level functions accept finished contexts.
This class provides management for unique ID attributes, the value mapper registry, and possibly additional services for writing VOTables.
VOTableContexts optionally take
- a value mapper registry (by default, valuemappers.defaultMFRegistry)
- the tablecoding (currently, td, binary, or binary2)
- version=(1,1) to order a 1.1-version VOTable, (1,2) for 1.2. (default is now 1.4).
- acquireSamples=False to suppress reading some rows to get samples for each column
- suppressNamespace=False to leave out a namespace declaration (mostly convenient for debugging)
- overflowElement (see votable.tablewriter.OverflowElement)
There's also an attribute produceVODML that will automatically be set for VOTable 1.5; you can set it to true manually, but the resulting VOTables will probably be invalid.
If VO-DML processing is enabled, the context also manages models declared; that's the modelsUsed dictionary, mapping prefix -> dm.Model instances
The base class of VOTable-related errors.
is raised when the validation of a field fails.
ValidationErrors are constructed with a message, a column name, and optionally a row (i.e., a dict) and a hint.
raised to redirect a user agent to a different resource (HTTP 301).
WebRedirects are constructed with the destination URL that can be relative (to webRoot) or absolute (starting with http).
Signature: addHistoryCard(header, entry, recognizer)
adds a history card to header, overwriting a previous version if it's present.
This will reject entries longer than 72 characters, as these would create cruft on overwriting. Since we always prepend today's date to entry, the net payload size is 61 characters.
regcognizer is a string that is searched within the history string. If it is found, entry is put into the card. If none such card is found, a new history card is written.
This is mainly for processors; in particular during development, but quite likely also when reprocessing, you don't want extra history entries each time the processor runs (yeah, there are situations when you would want to know about reprocessing, but weight these against the horrible cruft, I know what I want.
Signature: bYearToDateTime(bYear)
returns a datetime.datetime instance for a fractional Besselian year.
This uses the formula given by Lieske, J.H., A&A 73, 282 (1979).
Signature: bytify(s: Union[str, bytes]) -> bytes
returns s utf-8 encoded if it is a string, unmodified otherwise.
Signature: computeMean(val1, val2)
returns the mean value between two values.
Beware: Integer division done here for the benefit of datetime calculations.
>>> computeMean(1.,3) 2.0 >>> computeMean(datetime.datetime(2000, 10, 13), ... datetime.datetime(2000, 10, 12)) datetime.datetime(2000, 10, 12, 12, 0)
Signature: createDump(tableIds, destFile, binary=True)
writes a DaCHS dump of tableIds to destFile.
tableIds is a list of rd-id#table-id identifiers (all must resolve), destFile is a file object opened for writing.
Signature: cutoutFITS(hdu: astropy.io.fits.hdu.image.ImageHDU, *cuts: Tuple[Optional[float], ...]) -> astropy.io.fits.hdu.image.ImageHDU
returns a cutout of hdu restricted to cuts.
hdu is a primary FITS hdu. cuts is a list of cut specs, each of which is a triple (axis, lower, upper). axis is between 1 and naxis, lower and upper a 1-based pixel coordinates of the limits, and "border" pixels are included. Specifications outside of the image are legal and will be cropped back. Open limits are supported via a specification of None.
If an axis would vanish (i.e. length 0 or less), the function fudges things such that the axis gets a length of 1.
axis is counted here in the FORTRAN/FITS sense, not in the C sense, i.e., axis=1 cuts along NAXIS1, which is the last index in a numpy array.
WCS CRPIXes in hdu's header will be updated. Axes and specified will not be touched. It is an error to specify cuts for an axis twice (behaviour is undefined).
Note that this will lose all extensions the original FITS file might have had.
Signature: dateTimeToJYear(dt)
returns a fractional (julian) year for a datetime.datetime instance.
Signature: dateTimeToJdn(dt)
returns a julian day number (including fractionals) from a datetime instance.
Signature: dateTimeToMJD(dt)
returns a modified julian date for a datetime instance.
Signature: degToDms(deg: float, sepChar: str = ' ', secondFracs: int = 2, preserveLeading: bool = False, truncate: bool = False, addSign: bool = True) -> str
converts a float angle in degrees to a sexagesimal string.
This takes a lot of optional arguments:
>>> degToDms(-3.24722, "", 0, True, True) '-031449' >>> degToDms(0) '+0 00 00.00' >>> degToDms(0, addSign=False) '0 00 00.00' >>> degToDms(-0.25, sepChar=":") '-0:15:00.00' >>> degToDms(-23.50, secondFracs=4) '-23 30 00.0000' >>> "%.4f"%dmsToDeg(degToDms(-25.6835, sepChar=":"), sepChar=":") '-25.6835'
Signature: degToHms(deg: float, sepChar: str = ' ', secondFracs: int = 3, truncate: bool = False) -> str
converts a float angle in degrees to an time angle (hh:mm:ss.mmm).
This takes a lot of optional arguments:
>>> degToHms(0, sepChar=":") '00:00:00.000' >>> degToHms(122.057, secondFracs=1) '08 08 13.7' >>> degToHms(122.057, secondFracs=1, truncate=True) '08 08 13.6' >>> degToHms(-0.055, secondFracs=0) '-00 00 13' >>> degToHms(-0.055, secondFracs=0, truncate=True) '-00 00 13' >>> degToHms(-1.056, secondFracs=0) '-00 04 13' >>> degToHms(-1.056, secondFracs=0) '-00 04 13' >>> degToHms(359.9999999) '24 00 00.000' >>> degToHms(359.2222, secondFracs=4, sepChar=":") '23:56:53.3280' >>> "%.4f"%hmsToDeg(degToHms(256.25, secondFracs=9)) '256.2500'
Signature: dmsToDeg(dmsAngle: str, sepChar: Optional[str] = None) -> float
returns the degree minutes seconds-specified dmsAngle as a float in degrees.
>>> "%3.8f"%dmsToDeg("45 30.6") '45.51000000' >>> "%3.8f"%dmsToDeg("45:30.6", ":") '45.51000000' >>> "%3.8f"%dmsToDeg("-45 30 7.6") '-45.50211111' >>> dmsToDeg("junk") Traceback (most recent call last): ValueError: Invalid dms value with sepChar None: 'junk'
Signature: document(origFun: 'Any')
is a decorator that adds a "buildDocsForThis" attribute to its argument.
This attribute is evaluated by documentation generators.
Signature: formatData(formatName, table, outputFile, acquireSamples=True, **moreFormatterArgs)
writes a table to outputFile in the format given by key.
Table may be a table or a Data instance. formatName is a format shortcut (formats.iterFormats() gives keys available) or a media type. If you pass None, the default VOTable format will be selected.
This raises a CannotSerializeIn exception if formatName is not recognized. Note that you have to import the serialising modules from the format package to make the formats available (fitstable, csvtable, geojson, jsontable, texttable, votable; api itself already imports the more popular of these).
If a client knows a certain formatter understands additional arguments, it can hand them in as keywords arguments. This will raise an error if another formatter that doesn't understand the argument is being used.
Signature: formatISODT(dt: datetime.datetime) -> str
returns some ISO8601 representation of a datetime instance.
The reason for preferring this function over a simple str is that datetime's default representation is too difficult for some other code (e.g., itself); hence, this code suppresses any microsecond part and always adds a Z (where strftime works, utils.isoTimestampFmt produces an identical string).
The behaviour of this function for timezone-aware datetimes is undefined.
For convenience, None is returned as None.
Also for convenience, you can pass in a string; this will then be parsed first, which provides both some basic format validation and guaranteed DALI-compliant serialisation.
>>> formatISODT(datetime.datetime(2015, 10, 20, 12, 34, 22, 250)) '2015-10-20T12:34:22Z' >>> formatISODT(datetime.datetime(1815, 10, 20, 12, 34, 22, 250)) '1815-10-20T12:34:22Z' >>> formatISODT(datetime.datetime(2018, 9, 21, 23, 59, 59, 640000)) '2018-09-22T00:00:00Z'
Signature: genLimitKeys(inputKey)
yields _MAX and _MIN inputKeys from a single input key.
This also tries to sensibly fix descriptions and ucds. This is mainly for datalink metaMakers; condDescs may use a similar thing, but that's not exposed to RDs.
Don't use this function any more. It will go away soon.
Signature: getAccrefFromStandardPubDID(pubdid, authBase='ivo://org.gavo.dc/~?')
returns an accref from a standard DaCHS PubDID.
This is basically the inverse of getStandardPubDID. It will raise NotFound if pubdid "looks like a URI" (implementation detail: has a colon in the first 10 characters) and does not start with ivo://<authority>/~?. If it's not a URI, we assume it's a local accref and just return it.
The function does not check if the remaining characters are a valid accref, much less whether it can be resolved.
authBase's default will reflect you system's settings on your installation, which probably is not what's given in this documentation.
Signature: getAsVOTable(data, ctx=None, **kwargs)
returns a string containing a VOTable representation of data.
kwargs can be constructor arguments for VOTableContext.
Signature: getDBConnection(profile, debug=False, autocommitted=False)
returns an enhanced database connection through profile.
You will typically rather use the context managers for the standard profiles (getTableConnection and friends). Use this function if you want to keep your connection out of connection pools or if you want to use non-standard profiles.
profile will usually be a string naming a profile defined in GAVO_ROOT/etc.
Signature: getDatalinkMetaLink(dlSvc, accref)
returns a datalink URL for the product referenced through accref with the datalink service dlSvc.
This assumes that dlSvc uses the standard DaCHS pubDIDs. dlSvc needs to be the service element.
A typical use is in a metaMaker and would look like this:
getDatalinkMetaLink(rd.getById("dl"), descriptor.accref)
Signature: getFileStem(fPath: str)
returns the file stem of a file path.
The base name is what remains if you take the base name and split off extensions. The extension here starts with the last dot in the file name, except up to one of some common compression extensions (.gz, .xz, .bz2, .Z, .z) is stripped off the end if present before determining the extension.
>>> getFileStem("/foo/bar/baz.x.y") 'baz.x' >>> getFileStem("/foo/bar/baz.x.gz") 'baz' >>> getFileStem("/foo/bar/baz") 'baz'
Signature: getFlatName(accref)
returns a unix-compatible file name for an access reference.
The file name will not contain terrible characters, let alone slashes. This is used to, e.g., keep all previews in one directory.
Signature: getFormatted(formatName, table, acquireSamples=False)
returns a string containing a representation of table in the format given by formatName.
This is just wrapping the function formatData; se there for formatName. This function will use large amounts of memory for large data.
Signature: getInputsRelativePath(absPath, liberalChars=True)
returns absath relative to the DaCHS inputsDir.
If absPath is not below inputsDir, a ValueError results. On liberalChars, we see the function getRelativePath.
In rowmakers and rowfilters, you'll usually use the macro \inputRelativePath that inserts the appropriate code.
Signature: getMetaText(ob, key, default=None, **kwargs)
returns the meta item key form ob in text form if present, default otherwise.
You can pass getMeta keyword arguments (except default).
Additionally, there's acceptSequence; if set to true, this will return the first item of a sequence-valued meta item rather than raising an error.
ob will be used as a macro package if it has an expand method; to use something else as the macro package, pass a macroPackage keyword argument.
Signature: getParseOptions(validateRows=True, doTableUpdates=False, batchSize=1024, maxRows=None, keepGoing=False, dropIndices=False, dumpRows=False, metaOnly=False, buildDependencies=True, systemImport=False, commitAfterMeta=False, dumpIngestees=False, suppressMeta=False, metaPlusIndex=False)
returns an object with some attributes set.
This object is used in the parsing code in dddef. It's a standin for the the command line options for tables created internally and should have all attributes that the parsing infrastructure might want from the optparse object.
So, just configure what you want via keyword arguments or use the prebuilt objects parseValidating and and parseNonValidating below.
See commandline.py for the meaning of the attributes.
The exception is buildDependencies. This is true for most internal builds of data (and thus here), but false when we need to manually control when dependencies are built, as in user.importing and while building the dependencies themselves.
Signature: getQueryMeta()
returns a query meta object from somewhere up the stack.
This is for row makers running within a service. This can be used to, e.g., enforce match limits by writing getQueryMeta()["dbLimit"].
Signature: getReferencedElement(refString, forceType=None, **kwargs)
returns the element for the DaCHS reference refString.
refString has the form rdId[#subRef]; rdId can be filesystem-relative, but the RD referenced must be below inputsDir anyway.
You can pass a structure class into forceType, and a StructureError will be raised if what's pointed to by the id isn't of that type.
You should usually use base.resolveCrossId instead of this from within DaCHS. This is intended for code handling RD ids from users.
This supports further keyword arguments to getRD.
Signature: getRelativePath(fullPath: Union[str, pathlib.Path], rootPath: Union[str, pathlib.Path], liberalChars: bool = True) -> Union[str, pathlib.Path]
returns rest if fullPath has the form rootPath/rest and raises a ValueError otherwise.
This accepts either strings or pathlib.Path-s and returns an object of the type of fullPath (pathlib functionality since 2.9.3).
Pass liberalChars=False to make this raise a ValueError when URL-dangerous characters (blanks, amperands, pluses, non-ASCII, and similar) are present in the result. This is mainly for products.
Signature: getStandardPubDID(path)
returns the standard DaCHS PubDID for path.
The publisher dataset identifier (PubDID) is important in protocols like SSAP and obscore. If you use this function, the PubDID will be your authority, the path component ~, and the inputs-relative path of the input file as the parameter.
path can be relative, in which case it is interpreted relative to the DaCHS inputsDir.
You can define your PubDIDs in a different way, but you'd then need to provide a custom descriptorGenerator to datalink services (and might need other tricks). If your data comes from plain files, use this function.
In a rowmaker, you'll usually use the standardPubDID macro.
Signature: getTableDefForTable(connection, tableName)
returns a TableDef object for a SQL table name.
connection needs to be TableConnection or something with higher privileges.
This really has little to do with resolving identifiers, but this module already has getRDs and similar, so it seemed the least unnatural place.
Signature: getTemplateForName(templateName)
returns the FITS template sequence for templateName.
A NotFoundError is raised if no such template exists.
Signature: getWCSAxis(header: astropy.io.fits.header.Header, axisIndex: int, forceSeparable: bool = False) -> gavo.utils.fitstools.WCSAxis
returns a WCSAxis instance from an axis index and a FITS header.
If the axis is mentioned in a transformation matrix (CD or PC), a ValueError is raised (use forceSeparable to override).
The axisIndex is 1-based; to get a transform for the axis described by CTYPE1, pass 1 here.
The object returned has methods like pixToPhys, physToPix (and their pix0 brethren), and getLimits.
Note that at this point WCSAxis only supports linear transforms (it's a DaCHS-specific implementation). We'll extend it on request.
Signature: getXMLTree(xmlString, debug=False)
returns an libxml2 etree-like object for xmlString, where, for convenience, all namespaces on elements are nuked.
This will only accept strings.
The libxml2 etree lets you do xpath searching using the xpath method.
Nuking namespaces is of course not a good idea in general, so you might want to think again before you use this in production code.
To facilitate writing tests, in addition to lxml.etree methods the returned object also has the following methods:
Signature: hmsToDeg(hms: str, sepChar: Optional[str] = None) -> float
returns the time angle (h m s.decimals) as a float in degrees.
>>> "%3.8f"%hmsToDeg("22 23 23.3") '335.84708333' >>> "%3.8f"%hmsToDeg("22:23:23.3", ":") '335.84708333' >>> "%3.8f"%hmsToDeg("222323.3", "") '335.84708333' >>> hmsToDeg("junk") Traceback (most recent call last): ValueError: Invalid time with sepChar None: 'junk'
Signature: hoursToHms(decimal_hours: float, sepChar: str = ':', secondFracs: int = 0) -> str
returns a time span in hours in sexagesmal time (h:m:s).
The optional arguments are as for degToHms.
>>> hoursToHms(0) '00:00:00' >>> hoursToHms(23.5) '23:30:00' >>> hoursToHms(23.55) '23:33:00' >>> hoursToHms(23.525) '23:31:30' >>> hoursToHms(23.553, secondFracs=2) '23:33:10.80' >>> hoursToHms(123.553, secondFracs=2) '123:33:10.80'
Signature: isMJD(col)
returns True if the rscdef.Column instance col likely contains MJD values.
This has a long and winding history in DaCHS, and so this is a disaster of heuristics.
Signature: iterSimpleText(f: <class 'TextIO'>) -> Generator[Tuple[int, str], NoneType, NoneType]
iterates over (physLineNumber, line) in f with some usual conventions for simple data files.
You should use this function to read from simple configuration and/or table files that don't warrant a full-blown grammar/rowmaker combo. The intended use is somewhat like this:
with open(rd.getAbsPath("res/mymeta")) as f: for lineNumber, content in iterSimpleText(f): try: ... except Exception, exc: sys.stderr.write("Bad input line %s: %s"%(lineNumber, exc))
The grammar rules are, specifically:
Signature: jYearToDateTime(jYear)
returns a datetime.datetime instance for a fractional (julian) year.
This refers to time specifications like J2001.32.
Signature: jdnToDateTime(jd)
returns a datetime.datetime instance for a julian day number.
Signature: killBlanks(literal)
returns the string literal with all blanks removed.
This is useful when numbers are formatted with blanks thrown in.
Nones are passed through.
Signature: lastSourceElements(path, numElements)
returns a path made up from the last numElements items in path.
Signature: loadPythonModule(fqName: 'Filename', relativeTo: 'Optional[Filename]' = None) -> 'Tuple[ModuleType, Any]'
imports fqName and returns the (module, spec).
Do not use this function to import DC-internal modules; this may mess up singletons since you could bypass python's mechanisms to prevent multiple imports of the same module.
fqName is a fully qualified path to the module without the .py, unless relativeTo is given, in which case it is interpreted as a relative path. This for letting modules in resdir/res import each other by saying:
mod, _ = api.loadPythonModule("foo", relativeTo=__file__)
The python path is temporarily amended with the path part of the source module.
If the module is in /var/gavo/inputs/foo/bar/mod.py, Python will know the module as foo_bar_mod (the last two path components are always added). This is to keep Python from using the module when someone writes import mod.
Signature: makeAbsoluteURL(path, canonical=False)
returns a fully qualified URL for a rooted local part.
This will reflect the http/https access mode unless you pass canonical=True, in which case [web]serverURL will be used unconditionally.
Signature: makeData(dd, parseOptions=<ParseOptions validateRows=False maxRows=None keepGoing=False>, forceSource=None, connection=None, data=None, runCommit=True)
returns a data instance built from dd.
It will arrange for the parsing of all tables generated from dd's grammar.
If database tables are being made, you must pass in a connection. The entire operation will then run within a single transaction within this connection (except for building dependents; they will be built in separate transactions).
The connection will be rolled back or committed depending on the success of the operation (unless you pass runCommit=False, in which case even a successful import will not be committed)..
You can pass in a data instance created by yourself in data. This makes sense if you want to, e.g., add some meta information up front.
makeData will usually iterate over the sources given in dd. You can override this with forceSource, which can contain a single source passed to a grammar. If you need to pass in multiple sources, use a MultiForcedSources object (or anything that has an iterSources(dbConnection) method).
Signature: makeDependentsFor(dds, parseOptions, connection, sysCatChanged)
rebuilds all data dependent on one of the DDs in the dds sequence.
Signature: makeHeaderFromTemplate(template, originalHeader=None, **values)
returns a new pyfits.Header from template with values filled in.
template usually is the name of a template previously registered with registerTemplate, or one of DaCHS predefined template names (currently, minimal and wfpdb). In a pinch, you can also pass in an immediate headers.
originalHeader can be a pre-existing header; the history and comment cards are copied over from it, and if any of its other cards have not yet been added to the header, they will be added in the order that they apprear there.
values for which no template item is given are added in random order after the template unless an originalHeader is passed. In that case, they are assumed to originate there and are ignored.
Signature: makeIAUId(prefix: str, long: float, lat: float, longSec: int = 0, latSec: int = 0) -> str
returns an (equatorial) IAU identifier for an object at long and lat.
The rules are given on https://cds.unistra.fr/Dic/iau-spec.html
The prefix, including the system identifier, you have to pass in. You cannot build identifiers using only minutes precision. If you want to include sub-arcsec precision, pass in longSec and/or latSec (the number of factional seconds to preserve).
Signature: makeProductLink(key, withHost=True, useHost=None)
returns the URL at which a product can be retrieved.
key can be an accref string or an RAccref.
Note that this is using the preferred host as the basic URL. If you are running dual-protocol http/https and you ingest results of this function into the database, it is advisable to cut off the scheme part of the URI (e.g., split(":", 1)[-1]). In data products served, DaCHS will then put in the scheme used for the query.
DaCHS (almost always) also allows full http URIs as accrefs. These will be returned unchanged.
Signature: makeSitePath(path)
returns a rooted local part for a server-internal URL.
uri itself needs to be server-absolute; a leading slash is recommended for clarity but not mandatory.
Signature: makeStruct(structClass, **kwargs)
creates a parentless instance of structClass with **kwargs.
You can pass in a parent_ kwarg to force a parent, and a ctx_ if you need a parse context.
This is the preferred way to create struct instances in DaCHS, as it will cause the sequence of completers and validators run. Use it like this:
MS(rscdef.Column, name="ra", type="double precision)
Signature: makeTimestamp(date, time)
makes a datetime instance from a date and a time.
Signature: mjdToDateTime(mjd)
returns a datetime.datetime instance for a modified julian day number.
Beware: This loses a couple of significant digits due to transformation to jd.
Signature: originalOrIdentity(soup)
returns soup.original or soup if there is no original attribute.
This is for cooperation with BinaryItem coming in from the web into ContextGrammars.
Signature: parseAngle(literal, format, sepChar=None)
converts the various forms angles might be encountered to degrees.
format is one of hms, dms, fracHour. For sexagesimal/time angles, you can pass a sepChar (default: split at blanks) that lets you specify what separates hours/degrees, minutes, and seconds.
>>> "%.8f"%(parseAngle("23 59 59.95", "hms")) '359.99979167' >>> "%10.5f"%parseAngle("-20:31:05.12", "dms", sepChar=":") ' -20.51809' >>> "%010.6f"%parseAngle("21.0209556", "fracHour") '315.314334'
Signature: parseBooleanLiteral(literal)
returns a python boolean from some string.
Boolean literals are strings like True, false, on, Off, yes, No in some capitalization.
Signature: parseBytes(literal)
returns bytes from a literal.
This will interpret hex and octal byte escapes, and it'll support lists of integer-like things; not sure if that's actually more harmful than good. But then people can always override the default behaviour.
>>> parseBytes("abc") b'abc' >>> parseBytes(r"\xab\000") b'\xab\x00' >>> parseBytes([123, 231, 23]) b'{\xe7\x17' >>> parseBytes([10002]) Traceback (most recent call last): ValueError: bytes must be in range(0, 256)
Signature: parseCooPair(soup)
returns a pair of RA, DEC floats if they can be made out in soup or raises a value error.
No range checking is done (yet), i.e., as long as two numbers can be made out, the function is happy.
>>> parseCooPair("23 12") (23.0, 12.0) >>> parseCooPair("23.5,-12.25") (23.5, -12.25) >>> parseCooPair("3.75 -12.125") (3.75, -12.125) >>> parseCooPair("3 25,-12 30") (51.25, -12.5) >>> ["{:.9f}".format(v) for v in parseCooPair("12 15 30.5 +52 18 27.5")] ['183.877083333', '52.307638889'] >>> parseCooPair("3.39 -12 39") Traceback (most recent call last): ValueError: Invalid time with sepChar None: '3.39' >>> parseCooPair("12 15 30.5 +52 18 27.5e") Traceback (most recent call last): ValueError: 12 15 30.5 +52 18 27.5e has no discernible position in it >>> parseCooPair("QSO2230+44.3") Traceback (most recent call last): ValueError: QSO2230+44.3 has no discernible position in it
Signature: parseDate(literal, format='%Y-%m-%d')
returns a datetime.date object of literal parsed according to the strptime-similar format.
The function understands the special dateFormat !!jYear (stuff like 1980.89).
Signature: parseDefaultDate(literal: Union[str, datetime.date, NoneType]) -> Optional[datetime.date]
parseDefaultDatetime's little sister.
Signature: parseDefaultDatetime(literal: Union[str, datetime.datetime, NoneType]) -> Optional[datetime.datetime]
returns a datetime from string or passes through datetimes and Nones.
The function will try to parse a string in various ways; we will try not to drop formats from one minor version to the next.
Signature: parseDefaultTime(literal: Union[str, datetime.time, NoneType]) -> Optional[datetime.time]
parseDefaultDatetime's other little sister.
Signature: parseFloat(literal)
returns a float from a literal, or None if literal is None or an empty string.
Temporarily, this includes a hack to work around a bug in psycopg2.
>>> parseFloat(" 5e9 ") 5000000000.0 >>> parseFloat(None) >>> parseFloat(" ") >>> parseFloat("wobbadobba") Traceback (most recent call last): ValueError: could not convert string to float: 'wobbadobba'
Signature: parseFromString(rootStruct, inputString, context=None)
parses a DaCHS RD tree rooted in rootStruct from a string.
It returns the root element of the resulting tree. You would use this like this:
parseFromString(rscdef.Column, "<column name='foo'/>")
Signature: parseISODT(literal: str, useTime: bool = False) -> datetime.datetime
returns a datetime object for a ISO time literal.
There's no real timezone support yet, but we accept and ignore various ways of specifying UTC.
By default, this uses plain python datetime because it usually covers a large date range than the time module. The downside is that it does not know about leap seconds. Pass useTime=True to go through time tuples, which know how to deal with them (but may not deal with dates far in the past or future).
>>> parseISODT("1998-12-14") datetime.datetime(1998, 12, 14, 0, 0) >>> parseISODT("1998-12-14T13:30:12") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("1998-12-14T13:30:12Z") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("1998-12-14T13:30:12.224Z") datetime.datetime(1998, 12, 14, 13, 30, 12, 224000) >>> parseISODT("19981214T133012Z") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("19981214T133012+00:00") datetime.datetime(1998, 12, 14, 13, 30, 12) >>> parseISODT("2016-12-31T23:59:60") Traceback (most recent call last): ValueError: second must be in 0..59 >>> parseISODT("2016-12-31T23:59:60", useTime=True) datetime.datetime(2017, 1, 1, 0, 0) >>> parseISODT("junk") Traceback (most recent call last): ValueError: Bad ISO datetime literal: junk (required format: yyyy-mm-ddThh:mm:ssZ)
Signature: parseInt(literal)
returns an int from a literal, or None if literal is None or an empty string.
>>> parseInt("32") 32 >>> parseInt("") >>> parseInt(None)
see function getParseOptions .
Signature: parseSPoint(soup)
returns an SPoint for a coordinate pair.
The coordinate pair can be formatted in a variety of ways; see the function parseCooPair. Input is always in degrees.
Signature: parseTime(literal, format='%H:%M:%S')
returns a datetime.timedelta object for literal parsed according to format.
For format, you can the magic values !!secondsSinceMidnight, !!decimalHours or a strptime-like spec using the H, M, and S codes.
>>> parseTime("89930", "!!secondsSinceMidnight") datetime.timedelta(days=1, seconds=3530) >>> parseTime("23.4", "!!decimalHours") datetime.timedelta(seconds=84240) >>> parseTime("3.4:5", "%H.%M:%S") datetime.timedelta(seconds=11045) >>> parseTime("20:04", "%H:%M") datetime.timedelta(seconds=72240)
Signature: parseTimestamp(literal, format='%Y-%m-%dT%H:%M:%S')
returns a datetime.datetime object from a literal parsed according to the strptime-similar format.
A ValueError is raised if literal doesn't match format (actually, a parse with essentially DALI-standard ISO representation is always tried)
see function getParseOptions .
Signature: parseWithNull(literal, baseParser, nullLiteral=<Undefined>, default=None, checker=None)
returns default if literal is nullLiteral, else baseParser(literal).
If checker is non-None, it must be a callable returning True if its argument is a null value.
nullLiteral is compared against the unprocessed literal (usually, a string). The intended use is like this (but note that often, a nullExcs attribute on a rowmaker map element is the more elegant way:
>>> parseWithNull("8888.0", float, "8888") 8888.0 >>> print(parseWithNull("8888", float, "8888")) None >>> print(parseWithNull("N/A", int, "N/A")) None
Signature: procmain(processorClass, rdId, ddId)
The "standard" main function for processor scripts.
The function returns the instantiated processor so you can communicate from your processor back to your own main.
See processors.html for details.
Signature: quoteProductKey(key)
returns key as getproduct URL-part.
If key is a string, it is quoted as a naked accref so it's usable as the path part of an URL. If it's an RAccref, it is just stringified. The result is something that can be used after getproduct in URLs in any case.
Signature: reloadLocal()
reloads the local namespace.
This is material an operator defines in $GAVO_CONFIG/local.py. If that file is missing or unreadable, api.local will be a stub that raises a constant error regardless of what you try to getattr from it.
Signature: renderDCErrorPage(flr, request)
renders a resource a twisted failure.
This finishes request itself. It returns t.w.server.NOT_DONE_YET because of that, so you can write return renderDCErrorPage from a render method (or similar).
Signature: requireValue(val, fieldName)
returns val unless it is None, in which case a ValidationError for fieldName will be raised.
Signature: resolveCrossId(id, forceType=None, **kwargs)
resolves id, where id is of the form rdId#id.
forceType, if non-None must be a DaCHS struct type (e.g., rscdef.Table); a StructureError will be raised if the reference resolves to something else than an instance of that type.
id can also be a simple rd id.
kwargs lets you pass additional keyword arguments to the getRD calls that may be triggered by this.
Signature: restoreDump(dumpFile)
restores a dump.
dumpFile is an open file object containing a file created by createDump.
This comprises recrating all mentioned tables, copying in the associated data, and re-creating all indices.
Each table is handled in a separate transaction, we do not stop if a single restore has failed.
Signature: safeReplaced(fName: Union[str, pathlib.Path], *, binary: bool = True) -> Generator
opens fName for "safe replacement".
Safe replacement means that you can write to the object returned, and when everything works out all right, what you have written replaces the old content of fName, where the old mode is preserved if possible. When there are errors, however, the old content remains.
Signature: scale(val, factor, offset=0)
returns val*factor+offset if val is not None, None otherwise.
This is when you want to manipulate a numeric value that may be NULL. It is a somewhat safer alternative to using nullExcs with scaled values.
sets a configuration item to a value.
arg1 can be a section, in which case arg2 is a key and arg3 is a value; alternatively, if arg3 is not given, arg1 is a key in the defaultSection, and arg2 is the value.
All arguments are strings that must be parseable by the referenced item's _parse method.
Origin is a tag you can use to, e.g., determine what to save.
Signature: setUserAgent(userAgent: str) -> None
sets the user agent string for requests through urlopenRemote.
This is a global setting and thus, in particular, nowhere near thread-safe.
Signature: toMJD(literal)
returns a modified julian date made from some datetime representation.
Valid representations include:
is the central event dispatcher.
Events are posted by using notify* methods. Various handlers can then attach to them.
Signature: updateTemplatedHeader(hdr, templateName=None, **kwargs)
return hdr updated with kwargs.
hdr is assumed to have been created with makeHeaderFromTemplate and contain the template name in a history entry.
You can pass in templateName to keep DaCHS from trying to get things from the header.
[It is probably better to use makeHeaderFromTemplate directly, passing in the orginalHeader; that preserves the order of non-templated headers].
Signature: urlopenRemote(url: str, *, data: Union[NoneType, dict, str, bytes] = None, creds: Tuple[Optional[str], Optional[str]] = (None, None), timeout: int = 100) -> <class 'BinaryIO'>
works like urllib.urlopen, except only http, https, and ftp URLs are handled.
The function also massages the error messages of urllib a bit. urllib errors always become IOErrors (which is more convenient within DaCHS).
creds may be a pair of username and password. Those credentials will be presented in http basic authentication to any server that cares to ask. For both reasons, don't use any valuable credentials here.
Signature: writeAsVOTable(data, outputFile, ctx=None, **kwargs)
writes data to the outputFile.
data can be a table or Data item.
ctx can be a VOTableContext instance; alternatively, VOTableContext constructor arguments can be passed in as kwargs.
DaCHS uses a number of tables to manage services and implement protocols. Operators should not normally be concerned with them, but sometimes having a glimpse into them helps with debugging.
If you find yourself wanting to change these tables' content, please post to dachs-support first describing what you're trying to do. There should really be commands that do what you want, and it's relatively easy to introduce subtle problems by manipulating system tables without going through those.
Having said that, here's a list of the system tables together with brief descriptions of their role and the columns contained. Note that your installation might not have all of those; some only appear after a dachs imp of the RD they are defined in -- which you of course only should do if you know you want to enable the functionality provided.
The documentation given here is extracted from the resource descriptors, which, again, you can read in source using dachs admin dumpDF //<rd-name>.
Defined in //services
A table that contains the (slightly processed) creator.name metadata from published services. It is used by the shipped templates of the root pages.
Manipulate through gavo pub; to remove entries from this table, remove the publication element of the service or table in question and re-run gavo pub on the resource descriptor.
Defined in //biblinks
This table contains links between bibliographic items and local datasets or data collections. It follows https://www.ivoa.net/documents/BibVO and is intended to be harvested by bibliography services.
Defined in //datalink
A table managing datalink jobs submitted asynchronously (the dlasync renderer)
Defined in //dc_tables
Discrete values found in string-valued columns.
This is usually filled by dachs limits. Only columns with a statistics property of "enumerate" are considered here. Values found here are
Defined in //users
Assignment of users to groups.
Conceptually, each user has an associated group of the same name. A user always is a member of her group. Other users can be added to that group, essentially as in the classic Unix model.
Manipulate this table through gavo admin addtogroup and gavo admin delfromgroup.
Defined in //services
A table that has "interfaces", i.e., actual URLs under which services are accessible. This is in a separate table, as services can have multiple interfaces (e.g., SCS and form).
Manipulate through gavo pub; to remove entries from this table, remove the publication element of the service or table in question and re-run gavo pub on the resource descriptor.
Defined in //dc_tables
A table for storing all kinds of key-value pairs. Key starting with an underscore are for use by user RDs.
Only one pair per key is supported, newer keys overwrite older ones.
Currently, this is only used for schemaversion, the version of the DaCHS system tables as used by gavo upgrade to figure out what to change. gavo upgrade manages this.
From your code, you can use base.getDBMeta(key) and base.setDBMeta(connection, key, value) to put persistent, string-valued metadata in here; if you use this, would you tell us your use case?
Defined in //products
The products table keeps information on "products", i.e. datasets delivered to the users.
It is normally fed through the //products#define rowfilter something like the //products#table mixin .
/getproducts inspects this table before handing out data to enforce embargoes and similar restrictions, and this is also where it figures out where to go for previews.
Defined in //rds
This table lists the RDs DaCHS has imported or otherwise manipulated.
Additionally, on import, we add the schema and whether there's ADQL tables in the RD; this helps when several RDs share a single schema.
Defined in //services
An RD-level map of dependencies, meaning that before generating resource records from rd, prereq should be imported (think: TAP needs the metadata of all dependent tables).
This is managed by gavo pub and used in the OAI-PMH interface.
Defined in //services
The table of published "resources" (i.e., services, tables, data collections) within this data center. There are separate tables of the interfaces these resources have, their authors, subjects, and the sets they belong to.
Manipulate through gavo pub; to remove entries from this table, remove the publication element of the service or table in question and re-run gavo pub on the resource descriptor.
Defined in //services
A join of resources, interfaces, and sets used internally.
Defined in //services
A table that contains set membership of published resources. For DaCHS, the sets ivo_managed ("publish to the VO") and local ("show on a generated root page" if using one of the shipped root pages) have a special role.
Manipulate through gavo pub; to remove entries from this table, remove the publication element of the service or table in question and re-run gavo pub on the resource descriptor.
Defined in //dc_tables
Simple (one-column) statistics of orderable columns
This is usually filled by dachs limits, which might use estimates rather than actual statistics for large tables. Also, values elements on the column definitions themselves override what may be given here.
Defined in //services
A table that contains the subject metadata for published services. It is used by the shipped templates of the root pages ("...by subject").
Manipulate through gavo pub; to remove entries from this table, remove the publication element of the service or table in question and re-run gavo pub on the resource descriptor.
Defined in //services
A join of resources, subjects, and sets used internally.
Defined in //dc_tables
A table mapping table names and schemas to the resource descriptors they come from and whether they are open to ADQL queries.
This is used wherever DaCHS needs to go from a database name to the resource description, e.g., when generating tableinfo.
The table is maintained through gavo imp; to force things out of here, there's gavo drop (for RDs; use -f if the RD is gone or moved away) or gavo purge (for single tables).
Defined in //users
Users known to the data center, together with their credentials.
Right now, DaCHS only supports user/password credentials. Passwords are stored as scrypt hashes in a custom format combining the id scrypt:, 16 bytes of salt, and finally the hash.
Manipulate this table through gavo admin adduser, gavo admin deluser, and gavo admin listusers.
Defined in //obscore
The IVOA-defined obscore table, containing generic metadata for datasets within this data centre.
Defined in //obs-radio
This table contains the SQL fragments that make up this site's ivoa.obs_radio view.
Manipulate this table through gavo imp on tables that have an obs_radio mixin, or by dropping RDs or purging tables that are part of obscore.
Defined in //obscore
This table contains the SQL fragments that make up this installation's ivoa.obscore view. Whenever a participating table is re-made, the view definition is renewed with a statement made up of a union of all sqlFragments present in this table.
Manipulate this table through gavo imp on tables that have an obscore mixin, or by dropping RDs or purging tables that are part of obscore.
Defined in //obscore
An empty table having all columns of the obscore table. Useful internally, and sometimes for tricky queries.
Defined in //obs-radio
An IVOA-defined metadata table for radio measurements, with extra metadata for interferometric measurements ("visibilities") as well as single-dish observations. You will almost always want to join this table to ivoa.obscore (do a natural join).
Defined in //tap
Columns in tables available for ADQL querying.
Defined in //tap
Columns that are part of groups within tables available for ADQL querying.
Defined in //tap
Columns participating in foreign key relationships between tables available for ADQL querying.
Defined in //tap
Foreign key relationships between tables available for ADQL querying.
Defined in //tap
Schemas containing tables available for ADQL querying.
Defined in //tap
Standard data models supported by this service.
This is a non-standard tap_schema table used by DaCHS in the creation of registry records. It is manipulated through gavo imp on tables with supportsModel and supportsModelURI properties.
Defined in //tap
Tables available for ADQL querying.
Defined in //tap
A non-standard (and not tap-accessible) table used for managing asynchronous TAP jobs. It is manipulated through TAP job creation and destruction internally. Under very special circumstances, operators can use the gavo admin cleantap command to purge jobs from this table.
Note that such jobs have corresponding directories in $STATEDIR/uwsjobs, which will be orphaned if this table is manipulated through SQL.
Defined in //tap_user
This table contains metadata for the tables uploaded by users.
Defined in //uws
The jobs table for user-defined UWS jobs. As the jobs can come from all kinds of services, this must encode the jobClass (as the id of the originating service).
[RMI] | (1, 2, 3) Hanisch, R., et al, "Resource Metadata for the Virtual Observatory", http://www.ivoa.net/Documents/latest/RM.html |
[VOTSTC] | Demleitner, M., Ochsenbein, F., McDowell, J., Rots, A.: "Referencing STC in VOTable", Version 2.0, http://www.ivoa.net/Documents/Notes/VOTableSTC/20100618/NOTE-VOTableSTC-2.0-20100618.pdf |
[DALI] | Dowler, P, et al, "Data Access Layer Interface Version 1.0", http://ivoa.net/documents/DALI/20131129/ |
[SODA] | (1, 2) Bonnarel, F., et al, "IVOA Server-side Operations for Data Access", http://ivoa.net/documents/SODA/ |
[Datalink] | Dowler, P., et al, "IVOA DataLink", http://ivoa.net/documents/DataLink/ |