This chapter gives somewhat more in-depth information on what to do to
create services compliant to the various DAL protocols by data product
type. While dachs start should give you a fair chance of getting a
service running without reading any of this, it is still a good idea to
read the section for the sort of data you want to publish before setting
out.
SCS, the simple cone search, is the simplest IVOA DAL protocol – it is
just HTTP with RA, DEC, and SR parameters, a slightly constrained
VOTable response, plus a special way to encode errors (in a way somewhat
different from what has been specified for later DAL protocols).
The service discussed in the DaCHS Basics is a combined
SCS/form service. This section just briefly recapitulates what was
discussed there. For a quick start, just follow Quick start with
DaCHS.
SCS can expose any table that has exactly one column
each with the UCDs pos.eq.ra;meta.main, pos.eq.dec;meta.main,
and meta.id;meta.main, where the coordinates must be real or double
precision, and the id must be either some integral type or text; the
standard requires the id to be text, but the renderer will automatically
convert integral types. The main query is then ran against the position
specified in this way.
You almost always want to have a spatial index on these columns. To do
that, use the //scs#pgs-pos-index mixin on the tables, like
this:
<table id="forSCS" onDisk="true" mixin="//scs#pgs-pos-index"> ...
The “pgs” in pgs-pos-index refers to pgSphere, a postgres database
extension for spherical geometry. You may see RDs around that use
the //scs#q3cindex mixin instead here. It does the same
thing (dramatically speed up spatial queries) but uses a different
scheme. It's faster and takes up less space, but it's also less
general, which is why we are trying to phase it out. Only use it when
you are sure you cannot afford the (reasonable, i.e., mostly within a
factor of two) cost of pgSphere.
Note that to have a valid SCS service, you must make sure the
output table always contains the three required columns (as defined by
the UCDs) discussed above. To ensure that, these columns' verbLevel
attribute must be 10 or less (we advise to have it at 1).
SCS could work with a dbCore, but friendly cone search services include
a field with the distance between the object found and the position
passed in; this is added by the special element scsCore.
You (in effect) must include the some pre-defined condDescs that make
up the SCS protocol, like this:
<scsCore queriedTable="main">
<FEED source="//scs#coreDescs"/>
</scsCore>
This will provide the RA, DEC, and SR parameters for most renderers. The
form renderer, however will show a nice input box that lets humans enter
object names or, if they cannot live without them, sexagesimal
positions (if you are curious: this works by setting the
onlyForRenderer and notForRenderer attributes on
Element inputKey).
In addition, //scs#coreDescs gives you a paramter MAXREC to
limit or raise the number of matches returned. This parameter is not
required by SCS, but it is useful if people with sufficient technical
skills (they'll need those because common SCS clients don't support
MAXREC yet) want to raise or lower DaCHS' default match limit (which is
configured in [ivoa]dalDefaultLimit and can be raised up to
[ivoa]dalHardLimit).
SCS allows more query parameters; you can usually use condDesc's
buildFrom attribute to directly make one from an input column. If you
want to add a larger number of them, you might want to use active tags:
<dbCore id="xlcore" queriedTable="main">
<FEED source="//scs#coreDescs"/>
<LOOP listItems="ipix bmag rmag jmag pmra pmde">
<events>
<condDesc buildFrom="\item"/>
</events>
</LOOP>
</dbCore>
Note that most current SCS clients are not good at discovering such
additional parameters, since for SCS this requires going through the
Registry. In TOPCAT, for example, users would have to manually edit the
cone search URL.
Also note that SCS does not really define the syntax of these
parameters, which is relevant because most of the time they will be
float-valued, and hence you will generally need to use intervals as
constraints. The interval syntax used by the SCS renderer is DALI, so a
bounded interval would be 22.3 30e5, and you'd build half-bounded
intervals with IEEE infinity literals, like -Inf -1. Of course,
when accessed through a form, the usual VizieR parameter syntax applies.
To expose that core through a service, just allow the scs.xml renderer
on it. With the extra human-oriented positional constraint and mainly
builtFrom condDescs, you can usually have a web-based form interface
for free:
<service id="cone" allowed="scs.xml,form">
<meta name="title">Nice Catalogue Cone Search</meta>
<meta name="shortName">NC Cone</meta>
<meta name="testQuery.ra">10</meta>
<meta name="testQuery.dec">10</meta>
<meta name="testQuery.sr">0.01</meta>
<scsCore queriedTable="main">
<FEED source="//scs#coreDescs"/>
<LOOP listItems="ipix bmag rmag jmag pmra pmde">
<events>
<condDesc buildFrom="\item"/>
</events>
</LOOP>
</scsCore>
</service>
The meta information given is used when generating registry
records; the idea is that a query with the ra, dec, and sr you give
actually returns some data.
SIAPv2 as described here is only available in DaCHS 2.7.3 and later.
If you want to create new image services, please make sure you update
to that.
In the VO, there are currently two versions of the Simple Image Access
Protocol SIAP. While DaCHS still supports SIAP version 1, you should
only use it for new services if you know exactly what you are doing.
Hence, this tutorial (and, actually, the reference documentation, too)
only talks about SIAP2.
To generate a template RD for an image collection published through
SIAP, run:
dachs start siap
See Starting from Scratch for a discussion on how to fill out this
template.
While you can shoehorn DaCHS to pull the necessary information from many
different types of images, anything but FITS files with halfway sane WCS
headers is going to be fiddly – and of course, FITS+modern WCS is about
the only thing that will work nicely on all relevant clients.
If you have to have images of a different sort, it is probably a good
idea to inquire on the dachs-support mailing list before spending a
major effort on local development.
Check out a sample resource directory:
cd `dachs config inputsDir`
svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/emi
cd emi
mkdir data
Now fetch some files to populate the data directory so you have
something to import:
cd data
curl -FLANG=ADQL -FFORMAT=votable/td \
-FQUERY="SELECT TOP 5 access_url FROM emi.main WHERE weighting='uniform'" \
http://dc.g-vo.org/tap/sync \
| tr '<>' '\n' \
| grep getproduct \
| xargs -n1 curl -sO
(no, this is no way to operate TAP services; use a
proper client for real work, and we didn't show you this).
This RD also publishes to obscore, so make sure you have the obscore table:
dachs imp //obscore
If you do not plan to publish via obscore yourself (which is reasonably
unlikely) and you try this on a box that the Registry will see later
(you shouldn't), be sure to dachs drop obscore again when done.
Run the import:
cd ..
dachs imp q
Now start the server as necessary (see above), and start TOPCAT and
Aladin. In TOPCAT, open VO/SIA Query, enter your new service's access
URL (it's http://localhost:8080/emi/q/s/siap.xml unless you did
something cunning and should know better yourself) under “SIA URL”
pretty far down in the dialog.
Then have “Lockman Hole” as Object Name and resolve it, or manually
enter 161.25 and 58.0 as RA and Dec, respectively, and have 2 as Angular
Size. Send off the request. You'll get back a table that you can send
to Aladin (Interop/Send to/Aladin), which will respond by presenting a
load dialog. Doubleclick and load as you like. Yes, the images look a
bit like static noise. That's all right here – but do combine these
images with, say, DSS colored optical imagery and marvel at the wonders
of modern VLBI interferometry.
Indicentally, we made the detour through TOPCAT since there's no nice UI
to query non-registred SIAP services in Aladin.
SIAP-capable tables should mix in //siap2#pgs. This mixin provides
all the columns necessary for valid SIAP responses, and it will prepare
the table so that spatial queries (which are the most common) will use a
pg_sphere index.
So, in the simplest case, a table published through a SIAP service
would be declared like this:
<table id="images" onDisk="True" mixin="//siap#pgs"/>
This only has the minimal SIAP metadata. The net result of this is that
your new images table will have all the mandatory columns of
Obscore. You can add your own, custom columns as usual.
The //siap2#pgs mixin also takes care that anything added to
the table also ends up in the products table. This means that the
grammar filling the table needs a //products#define rowfilter.
When filling SIAP tables, you will almost always use:
In practice, this might look like this (taken from the emi/q RD from the
quick start):
<data id="import_main">
<sources recurse="True">
<pattern>*.fits</pattern>
</sources>
<fitsProdGrammar qnd="True">
<maxHeaderBlocks>80</maxHeaderBlocks>
<mapKeys>
<map key="object">OBJECT</map>
<map key="obsdec">OBSDEC</map>
<map key="obsra">OBSRA</map>
</mapKeys>
<rowfilter procDef="__system__/products#define">
<bind key="table">"emi.main"</bind>
</rowfilter>
</fitsProdGrammar>
<make table="main" >
<rowmaker id="gen_rmk" idmaps="obsra, obsdec">
<apply name="fixObjectName">
<setup imports="csv">
<code>
with open(rd.getAbsPath("res/namemap.csv")) as f:
nameMap = dict(csv.reader(f))
</code>
</setup>
<code>
@target_name = nameMap[@object]
</code>
</apply>
<apply procDef="//siap2#computePGS"/>
<apply procDef="//siap2#setMeta">
<bind name="em_min">0.207</bind>
<bind name="em_max">0.228</bind>
<!-- since the images are fairly complex mosaics, there's no way we
can have sensible dates; this one here "plays a special role
in the calibration" (Middelberg) -->
<bind name="t_min"
>dateTimeToMJD(datetime.datetime(2010, 7, 4))</bind>
<bind name="t_max"
>dateTimeToMJD(datetime.datetime(2010, 7, 4))</bind>
<bind name="obs_title">"VLBA 1.4 GHz "+@target_name</bind>
<bind name="calib_level">3</bind>
<bind name="obs_collection">"VLBA LH sources"</bind>
<bind name="o_ucd">"phot.flux.density;em:radio.750-1500MHz;phys.polarisation.Stokes.I"</bind>
<bind name="pol_states">"/I/"</bind>
<bind name="pol_xel">1</bind>
<bind name="target_name">@target_name</bind>
<bind name="t_resolution">5000000</bind>
<bind name="target_class">"agn"</bind>
</apply>
<map key="weighting">\inputRelativePath.split("_")[-1][:-5]</map>
</rowmaker>
</make>
</data>
This does, step by step:
- The sources element is as always – with image collections, the
recurse attribute often comes in particularly handy.
- When ingesting images, you will very typically read from
FITS primary headers. That is what element
fitsProdGrammar does unless told otherwise: Its rawdicts simply are
the (astropy.io.fits) headers turned into plain python dictionaries.
- The qnd attribute of the grammar is recommended as long as you
get away with it. It makes some (weak) assumptions to yield
significant speedups, but it limits you to the primary header. You
cannot use qnd with compressed FITS images. Also, note the
hdusField attribute when you have more complex FITSes to process.
- The fitsProdGrammar will map keys with hyphens to names with
underscores, which allows for smoother action with them in rowmakers.
The mapKeys element can produce additional mappings; in this case,
we abuse it a bit to let us have idmaps (rather than simplemaps)
in the rowmaker. And, actually, to illustrate the feature, as this
data does not need key mapping, really.
- Since we are defining a table serving data products here,
the grammar needs the //products#define rowfilter
discussed in the products table.
- We have mentioned the //siap2#computePGS apply above; as
long as astropy can deal with your WCS, it's all automatic (though
you may want to pass parameters when you have cubes with suboptimal
WCS or want to keep products without WCS in your table). And if
you don't have proper WCS: see above on checking with dachs-supoort.
- The second apply you want when feeding SIAP tables is
//siap2#setMeta. This has a lot of parameters, because
depending on what you are publishing – SIAP2 explicitly was designed
to push out cubes, too – you may want to give metadata like the length
of a wavelength axis. dachs start siap puts in default mappings
for most of them; feel free to delete them if they don't make sense
for your kind of data.
- There are two non-obscore parameters in setMeta. For one, instead of
em_min and em_max, you can set bandpassId and have the
//siap#getBandFromFilter look them up. Whether a band is known
you can find out by running dachs admin dumpDF data/filters.txt –
and we are grateful for further contributions to that file.
- You can also set dateObs to the central time of the observation
and give t_exposure a sensible value in seconds. DaCHS will then
compute t_min and t_max for you, assuming a single, contiguous
observation.
- Typically, many values you find in the FITS headers will be messy and
fouled up. You'll spend some quality time fixing these values in the
typical case. Here, we translate somewhat broken object names using
a simple mapping file that was provided by the author. In many other
situations, the //procs#mapValue or
//procs#dictMap applys let you do fixes with less code.
- As is usual in DaCHS procedures, you can access the embedding RD as rd.
In our object name fixer, we use that to let DaCHS find the input
file independently of where the program was started.
Somewhat regrettably, //siap2#setMeta cannot be used with idmaps="*";
as settings from setMeta would be overwritten then. That's a compromise
for backwards compatibility.
For SIAP1, there was the Element siapCutoutCore that returned
cutouts rather than full images. Nothing of this sort is possible for
SIAP2, because spatial constraints are optional for the latter. If you
really need that functionality, you'll either have to go back to SIAP1
or (probably preferably, although client support isn't all that great
yet) return datalinks rather than full images (see SIAP and
datalink).
That said, the core to use for SIAP2 is the Element dbCore.
Add the condDescs necessary for SIAP manually (and add any custom
condDescs as you see fit). See the next section for an example.
The service uses the siap2.xml renderer, for which you need some
additional metadata for VO registration as described in
the siap2.xml renderer.
With this, a service definition would look like this (again taken from
emi/q):
<service id="s" allowed="form,siap2.xml">
<meta name="shortName">VLBI-Lockman</meta>
<meta name="sia.type">Pointed</meta>
<meta name="testQuery.pos.ra">163.3</meta>
<meta name="testQuery.pos.dec">57.8</meta>
<meta name="testQuery.size.ra">0.1</meta>
<meta name="testQuery.size.dec">0.1</meta>
<publish render="siap2.xml" sets="ivo_managed"/>
<dbCore queriedTable="main">
<FEED source="//siap2#parameters"/>
<condDesc buildFrom="weighting"/>
</dbCore>
</service>
You probably do not want to point real users at the from rendering of
this service. If you want to have image services working in the
browser, write an extra service; the output of dachs start siap has
a starting point for that.
The SIAP2 metadata schema is exactly the one of Obscore, except we have
no preview column in the siap2 mixin. Hence, to
publish a SIAP table to Obscore, the minimal requirement is:
<mixin preview="NULL">//obscore#publishObscoreLike</mixin>
in your table element. Normally (at least for FITSes in the product
table, DaCHS will be able to come up with a preview (or, rather,
thumbnail). In that case, prefer:
<mixin preview="access_url || '?preview=True'"
>//obscore#publishObscoreLike</mixin>
Note that for tables of non-trivial size, you really should have a
spatial index on s_ra and s_dec (use:
<FEED source="//scs#spoint-index-def" ra="s_ra" dec="s_dec"/>
for that) and on s_region (use <index columns="s_region"
method="GIST"/>) and otherwise index at least obs_publisher_did,
em_min, em_max, t_min, and t_max.
If you have larger images or cubes and serve them through SIAP, consider
offering datalinks or perhaps even have Datalinks as Products.
The latter case is particularly attractive if your images are so large
that people just clicking on some row in Aladin might not expect a
download of that size (in 2020, I'd set that limit at, perhaps, 100 MB).
In both cases, people can select and download only parts of the image or
request scaled versions of it.
Defining a datalink service for normal FITS images is not hard. In the
simplest case, you just give a bit of metadata, use the
//soda#fits_genDesc descriptor generator (you don't need to
understand exactly what that is; if you are curious:
Datalink and SODA has the full story) and FEED
//soda#fits_standardDLFuncs. Done.
The following example, taken from
lswscans/res/positions, adds a fixed link to a scaled
version, which might work a bit smoother with unsophisticated Datalink
clients, using a meta maker:
<service id="dl" allowed="dlget,dlmeta">
<meta name="title">HDAP Datalink</meta>
<meta name="description">This service lets you access cutouts
from HDAP plates and retrieve scaled versions.</meta>
<datalinkCore>
<descriptorGenerator procDef="//soda#fits_genDesc">
<bind key="accrefPrefix">lswscans</bind>
</descriptorGenerator>
<FEED source="//soda#fits_standardDLFuncs"/>
<metaMaker semantics="#science">
<code>
yield descriptor.makeLink(
makeProductLink(descriptor.accref+"?scale=4"),
contentType="image/fits",
description="FITS, scaled by 1/4",
contentLength=descriptor.estimateSize()/16.)
</code>
</metaMaker>
</datalinkCore>
</service>
See Meta Makers for more information on what is going on
inside the meta maker. The remaining material is either stereotypical
or pure metadata: title and description are as for any other serivce,
and the accrefPrefix should in general reflect your resource
directory name. DaCHS' datalink machinery will reject any publisher DID
asking for an accref not starting with that string. The idea here is
to avoid applying code to datasets it is not written for.
When you attach the datalink functionality (rather than having datalink
links as access URLs), declare your table as having datalink support;
both SIAP and TAP will pick that up and add the necessary declarations
so datalink-aware clients will know they can run datalink queries
against your service:
<meta name="_associatedDatalinkService">
<meta name="serviceId">dl</meta>
<meta name="idColumn">pub_did</meta>
</meta>
This block needs to sit in the table element. The serviceId meta
contains the id of the datalink service.
If you also produce HTML forms and tables, see datalinks in columns.
Publishing spectra is harder than publishing catalogues or images; for
one, the Simple Spectral Access Protocol comes with a large bunch of
metadata, quite a bit of which regrettably repeats VOResource. And
there is no common format for spectra, just a few contradicting
loose conventions.
That is why dachs start produces a template that contains an
embedded datalink service. This lets you push out halfway rational
VOTables that most interesting clients can reliably deal with, while
still giving access to whatever upstream data you got.
In the past, we have tried to cope with the large and often constant
metadata set of SSAP
using various mixins that have a certain part of the metadata in PARAMs
(which is ok by the standard). These were, specifically, the mixins
//ssap#hcd and //ssap#mixc. Do not use them any more in new
data and ignore any references to them in the documentation.
The modern way to deal with SSAP – both for spectra and for time series
– is to use the //ssap#view mixin. In essence, this is
a relatively shallow way to map your own metadata to SSA metadata using
a SQL view. This is also what the dachs start template does.
Check out the feros resource directory into your inputs directory:
cd `dachs config inputsDir`
svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/feros
cd feros
mkdir data
As recommended, the checkout does not contain actual data, so let's
fetch a file:
cd data
curl -O http://dc.g-vo.org/getproduct/feros/data/f04031.fits
cd ..
This RD also publishes to obscore, so make sure you have the obscore table:
dachs imp //obscore
If you do not plan to publish via obscore yourself (which is reasonably
unlikely) and you try this on a box that the Registry will see later
(you shouldn't), be sure to dachs drop obscore again when done.
Run the input and the regression tests:
dachs imp q
dachs test q
One regression test should fail since you've not yet pre-generated the
previews (which are optional but recommended for your datasets, too):
python3 bin/makepreviews.py
dachs test q
If the regressions tests don't pass now, please complain to the authors
of this tutorial.
From here on, you can point your favourite spectral client (fallback:
TOPCAT's SSA client; note that TOPCAT itself cannot open this service's
native format and you'll have to go through datalink, which TOPCAT knows
how to do since about version 4.6) to
http://localhost:8080/feros/q/ssa/ssap.xml and do your queries (if you
don't know anything else, do a positional query for 149.734, 2.28216).
Please drop the dataset again when you're done playing with it:
dachs drop feros/q
Since the dataset is in the obscore table, it would otherwise be
globally discoverable, and that'd be bad.
Contrary to what DaCHS does with the relatively small
SCS, SIAP, and SLAP models, due to the size of the SSAP model, spectral
services are always based on a database view on top of a table the
structure of which is controlled by you; you're saving quite a bit of
work if you keep your own table's columns as close to their SSA form as
possible, though.
Another special situation is that most spectra are delivered in fairly
crazy formats, which means that it's usually a good idea to have a
datalink service that serves halfway standard files – in DaCHS, these
comply to the VO's spectral data model, which is a VOTable with a bit of
standard metadata. It's certainly not beautiful or very sensible, but
it sure beats the IRAF-style 1D images people still routinely push around.
So, to start a spectral service, use the ssap+datalink template:
$ mkdir myspectra; cd myspectra
$ dachs start ssap+datalink
This will result in the usual q.rd file as per starting from
scratch; see there for how to efficiently edit this and for
explanations on the common metadata items.
SSAP-specific material starts at the meta definitions for
ssap.dataSource and ssap.creationType. These are actually used
in service discovery, so you should be careful to select the right
words. Realistically, using survey/archival for
observational data and theory/archival for theoretical spectra
should be the right thing most of the time.
Next, you define the raw_data table; this should contain all
metadata “unpredictably” varying between datasets (or, for large data
collections, anything that needs to be indexed for good performance).
For instance, for observational data, the observed location is going to
change from row to row. The start and the end of the spectrum is
probably going to be fixed for a given instrument, and so if you have a
homogeneous data collection you probably will
not have columns for them and rather provide constant values when
defining the view.
To conveniently define the table, it is recommended to pull the SSA
columns for raw_data by name from DaCHS' SSAP definitions and use
SSAP conventions (e.g., units). The generated RD is set up for this by
giving namePath="//ssap#instance", which essentially means “if
someone requests an element by id, first look in the instance table
of the RD”. This is then used in the following LOOP (cf. Active
Tags). As generated, this will look somewhat like:
<LOOP listItems="ssa_dateObs ssa_dstitle ssa_targname ssa_length
ssa_specres ssa_timeExt">
– this will pull in the enumerated columns as if you had defined them
literally in the table. Depending on the nature of your data, you may
want to pull in more columns if they vary for your datasets (or throw
out ones you don't need, as ssa_dateObs for theoretical data).
To see what is available, refer to the reference documentation of the
the //ssap#view mixin. Any parameter that starts with
ssa_ can be pulled in as a column.
The template RD then mixes in //products#table (which you pretty
certainly want; see The Products Table for an explanation),
//ssap#plainlocation (which at this point you must have to make
a valid SSA service) and //ssap#simpleCoverage
(which you want if you want to publish your observational spectra
through obscore). The template then defines:
<FEED source="//scs#splitPosIndex"
long="degrees(long(ssa_location))"
lat="degrees(lat(ssa_location))"/>
This again is mainly useful for obscore as long as DaCHS' ADQL engine
may turn queries into q3c statements; just leave it if you have
positions, and remove it if you don't.
You can, of course, define further columns here, both for later use in
the view and for local management. SSAP lets you return arbitrary local
columns, and in particular for theory services, you will have to (to
define the physics of your model). As a DaCHS convention, please don't
use the ssa_ prefix on your custom columns. See
theossa/q for an example of a table with many extra
columns.
The SSA template then goes on with a data item filling the raw_data
table. The template assumes you're parsing from IRAF-style 1D images.
You will have to use a different grammar if that is not what you have,
and in that case you in particular you cannot use the specAx var
defined in the rowmaker.
The data item has <recreateAfter>make_view</recreateAfter> quite
early on; this simply makes sure that the SSA view will be regenerated
after you import the table itself.
The rowfilter in the grammar is fairly complex here because we will
completely hide the original; if you simply want to serve your upstream
format, just cut it down to just giving table, mime, preview
and preview_mime. If you do that, use the following strings in
mime:
- image/fits for IRAF-style 1D image spectra
- application/fits for spectra in FITS tables
- application/x-votable+xml for spectra in VOTables
Please do not put anything else into SSA tables, because you will most
certainly overstrain most SSA clients; if you have a different
upstream format and you want to make it available, turn it into SDM
VOTables and use datalink to link to the original source.
Hence, for most cases (including also ASCII spectra), here's what we
recommend as the product definition rowfilter (it's roughly what's in
the template) to isolate your clients from the odd upstream formats:
<rowfilter procDef="//products#define">
<bind name="table">"\\schema.main"</bind>
<bind name="path">\\fullDLURL{"sdl"}</bind>
<bind name="fsize">%typical size of an SDM VOTable%</bind>
<bind name="datalink">"\\rdId#sdl"</bind>
<bind name="mime">"application/x-votable+xml"</bind>
<bind name="preview">\\standardPreviewPath</bind>
<bind name="preview_mime">"image/png"</bind>
</rowfilter>
This is pointing the path accessed to a datalink URL using the
fullDLURL macro, which expands to a URL retrieving the full dataset;
the “sdl” argument to the macro references the datalink service defined
further down. Since the data returned is generated on the fly, you will
have to give an estimate of how large the VOTable will be (overriding
DaCHS' default of the size of the source file). Don't sweat this too
much, just don't claim something is 1e9 bytes when you're really just
returning a few kilobytes. The rowfilter expects the size in bytes.
The bindings already prepare for making and serving previews, which is
discussed in more detail in Product Previews in the DaCHS
reference; see there for everything mentioning “preview“.
SSAP has a feature that lets users request certain formats, and for
clients that don't know Datalink, this may be a good idea. In that
scheme, you use a rowfilter to return a description of your native data
and the processed SDM-compliant dataset as used here. See
theossa/q for an example how that would look like. Our
recommendation: don't bother, it's a misfeature that will most likely
just confuse your users.
The rowmaker is fairly standard; we should perhaps mention the
elements:
<var name="specAx">%use getWCSAxis(@header_, 1) for IRAF FITSes
(delete otherwise)%</var>
<map key="ssa_specstart">%typically, @specAx.pixToPhys(1)*1e-10%</map>
<map name="ssa_length">%typically, @specAx.axisLength%</map>
getWCSAxis is a function that looks at a FITS image's WCS
information to let you transform pixel to physical coordinates. This
currently uses a simplified DaCHS implementation that only does a small
part of WCS (but we may change that, keeping the interface stable). The
var, anyway, binds the resulting object to specAx. You can use
that later to find out the limits of the spectrum. The way it is written
here, you will still have convert the value to metres manually. But as
said above, if you're publishing a homogeneous collection of spectra,
both values are probably constant, and you'll want to remove both maps
from the template.
The template goes on defining the data table that will serve as the
basis of the service. It starts with the declaration:
<meta name="_associatedDatalinkService">
<meta name="serviceId">sdl</meta>
<meta name="idColumn">ssa_pubDID</meta>
</meta>
This is lets DaCHS add a link to the datalink service to results
generated from this (both via SSAP and TAP). There's nothing you need to
change here (unless you chuck datalink); see SSAP and Datalink and
Datalink for details.
The main body of the table definition is the //ssap#view
mixin. In it, you need to write the SSA parameters as SQL literals (i.e.,
strings need single quotes around them) or expressions involving column
references. To keep things fast, you should have SSA-ready
columns in the source
table, so you will usually have column references only here. Most of
these items default to NULL, so if you do not have a piece of metadata,
it is reasonably safe to just remove the attribute.
A few of the mixin's parameters deserve extra discussion:
- sourcetable – this is a reference, i.e., this must resolve to
the id of some table element. It can be cross-RD if really
necessary. It is not the SQL table reference (that would include
a schema reference).
- copiedcolumns – this lets you copy over columns from the source
table, i.e., the one you just defined using (comma-separated) shell
patterns of their names (yes, that's just like idmaps in
rowmakers). The * given in the template should work in most
cases, but if you have private columns in the source table, you can
suppress them in the view; a useful convention might be to start all
private columns with a p; you'd then say copiedcolumns="[^p]*".
Note that copied columns are automatically added in the view as 1:1
maps, and you cannot use view arguments to override them. Use
different column names in the source table and the view if you
(think you) have to do view-level processing of your values.
- customcode – use this if you have extra material to enter in the
view definition generated by the mixin. We hope you won't need that
and would be interested in your use case if you find yourself using
this.
- ssa_spectralunit, ssa_fluxunit – these are the only
mandatory parameters starting with ssa_ (but their values are
still overwritten if they are in copiedcolumns). There really is no
point in having them vary from row to row because their values are
metadata for the corresponding error columns (which is one of the
many spec bugs in SSAP).
- ssa_spectralucd, ssa_fluxucd – these are like the unit
parameters in that they contain data collection-level metadata. The only
reason they are not mandatory is that there are defaults that seem
sensible for a large number of cases. Check them, and again, you
cannot really let them vary from row to row.
- ssa_fluxSI, ssa_spectralSI, ssa_timeSI – these were an
attempt to work around a missing specification for unit strings in
the VO. Since we now have VOUnit, just ignore them.
The data item making this table is trivial. You should set it to
auto="False" (i.e., don't build this on an unadorned dachs imp).
The building of this data will normally be triggered by the
recreateAfter of the source table import.
Use the element ssapCore for SSAP services. You must
feed in the condition descriptors for the SSAP parameters you want to
support (some are mandatory). The
simplest way to do that is to FEED the
//ssap#hcd_condDescs stream. It
includes condition descriptors for all mandatory and optional parameters
that we can remotely see in use.
Some of them may not be relevant to your service because your table
never has values for them. For example, theoretical spectra will
typically not give information on positions. The SSAP spec says that
such a service should ignore POS rather than returning the empty set.
We consider that an unfortunate recommendation that you should
ignore; if someone queries your theoretical service with a position, it
is highly likely they do not want to see all your spectra.
If you nevertheless think you must ignore
certain conditions, you can use the PRUNE
active tag. This looks like this:
<ssapCore queriedTable="newdata">
<FEED source="//ssap#hcd_condDescs">
<PRUNE id="coneCond"/>
<PRUNE id="bandCond"/>
</FEED>
</ssapCore>
Again, do not do this just because you don't have, say position
information.
Here is a table of parameter names and ids; you can always check them
by inspecting the output of dachs adm dumpDF //ssap:
Parameter name |
condDesc id |
POS, SIZE |
coneCond |
BAND |
bandCond |
TIME |
timeCond |
For APERTURE, SNR, REDSHIFT, TARGETNAME, TARGETCLASS, PUBDID,
CREATORDID, and MTIME, the condDesc id simply is <keyname>_cond,
e.g., APERTURE_cond.
To have custom parameters, simply add condDesc elements as usual:
<ssapCore queriedTable="newdata">
<FEED source="//ssap#hcd_condDescs"/>
<condDesc buildFrom="t_eff"/>
</ssapCore>
For SSAP cores, buildFrom will enable “PQL”-like query syntax such
that users can post arguments like 20000/30000,35000 to t_eff.
This is in keeping with the general SSAP parameter style, while
more modern VO services use 2-arrays for intervals (“DALI style”).
To expose SSAP cores, use the ssap.xml renderer.
Using the
form renderer on SSAP cores is not terribly useful, because the core
returns XML directly, and there are far too many parameters no human
will ever be interested in anyway.
Hence, you will typically define extra browser-based
services. The example RD shows a compact way to do that:
<service id="web" defaultRenderer="form">
<meta name="shortName">\\schema Web</meta>
<dbCore queriedTable="main">
<condDesc buildFrom="ssa_location"/>
<condDesc buildFrom="ssa_dateObs"/>
<condDesc>
<inputKey original="data.ssa_targname" tablehead="Star">
<values fromdb="ssa_targname from theossa.data
order by ssa_targname"/>
</inputKey>
</condDesc>
</dbCore>
<outputTable>
<autoCols>accref, mime, ssa_targname,
ssa_aperture, ssa_dateObs, datalink</autoCols>
<FEED source="//ssap#atomicCoords"/>
<outputField original="ssa_specstart" displayHint="displayUnit=Angstrom"/>
<outputField original="ssa_specend" displayHint="displayUnit=Angstrom"/>
</outputTable>
</service>
Essentially, we only select a few fields people might want to query
against, and we directly build them out of the query fields; the SSA
condDescs are bound to the funny and insufficiently defined SSA input
syntax and probably not very useful in interactive applications.
The extra selector for object names with the names actually present in
the database is a nice service as long as you only have a few hundred
objects or so. Since the query over ssa_targname is executed at
each load of the RD, it should be fast, which means that even for
medium-sized tables, you should have an index on the object names in
the raw_data table, probably like this:
<index columns="ssa_targname"/>
On DaCHS newer than 2.5.1, instead of fromDb, it is usually better
to say:
<column original="ssa_targname">
<property key="statistics">enumerate</property>
</column>
in the table element. This will make dachs limits create the
necessary metadata once and not on every RD reload. Note that DaCHS
will only create statistics for views if the forceStats property on
the table definition is set. However, if you create statistics on the
metadata table, these will be inherited by columns copied over into the
view, which means that you will in general not have to do this.
In the output table, we only give a few of the many dozen SSAP output
fields, and we change the units of the spectral limits to Angstroms,
which will look nicer for optical spectra. For
educational reasons you might want to change this to nm (nanometer).
In the template, this form-based service is published as a capability of
the SSA service. This is done using the service attribute in the
Element publish in the SSAP service element:
<publish render="form" sets="ivo_managed,local" service="web"/>
See Registering Web Interfaces to DAL Services for more background.
The SSA metadata is not far from the Obscore metadata (cf. publishing
anything through obscore), and so an Obscore publication of SSAP data
almost comes for free: Minimally, just mix in
//obscore#publishSSAPMIXC and set calibLevel. The
template does a bit more:
<mixin
calibLevel="%likely one of 1 for uncalibrated or 2 for uncalibrated data%"
coverage="%ssa_region -- or remove this if you have no ssa_region%"
sResolution="ssa_spaceres"
oUCD="ssa_fluxucd"
emUCD="ssa_spectralucd"
>//obscore#publishSSAPMIXC</mixin>
– if you use one of the old hcd or mixc mixins, you do not want
oUCD and emUCD.
In particular for larger spectral collections, it is highly recommended
to also have the //ssap#simpleCoverage mixin in an
obscore-published spectral table; only then will you get indexed queries
when there are constraints on s_region, and having these non-indexed
will lead to really slow obscore queries.
Similarly, you should probably add:
<FEED source="//ssap#obscore-time-index"/>
to large SSAP tables, which creates an index useful for obscore queries
over t_min and t_max (cf.
//ssap#obscore-time-index).
Whenever the spectral coverage is not constant on large-ish SSAP tables,
you should also have:
<index columns="ssa_specstart"/>
<index columns="ssa_specen"/>
Given that you will usually get fairly bizarre inputs and will probably
want to publish “repaired” spectra, using Datalink to provide both
native and SDM (“Spectral Data Model”) compliant spectra without having
to resort to SSAP's ill-thought-out FORMAT feature is a fairly natural
thing to do. That is why the SSAP+datalink template comes with almost
all that you need to do that; what is left is mainly to
write an embedded grammar to parse the spectra (if the parsing is
complex, you might to go for an Element customGrammar, which
lets you keep the source outside of the RD).
Other than that, it is just a few formalities.
So, you first define the table that will later hold your spectrum.
Use the //ssap#sdm-instance mixin for that (this continues
material from the template):
<table id="instance" onDisk="False">
<mixin ssaTable="main"
spectralDescription="%something like 'Wavelength' or so%"
fluxDescription="%something like 'Flux density' or so%"
>//ssap#sdm-instance</mixin>
<meta name="description">%a few words what a spectrum represents%</meta>
</table>
The descriptions you need to enter here typically end up being labels on
axes of spectral plots, so it is particularly important to be concise and
precise here.
If your spectrum has additional columns (e.g., errors, noise estimates,
bin widths), just put more columns in here. The mixin pulls in all the
various params that SDM wants to see from the row of the spectrum in the
SSAP table.
Note that the table does not have onDisk="True"; these tables are
only made for the brief moment it takes to serialise them into what the
user receives.
As usual, to fill tables, you want a data element. The template just
gives a few hints on how that might work. As a working example,
zcosmos/q parses from 1D FITS images like this:
<data id="build_sdm_data" auto="False">
<embeddedGrammar>
<iterator>
<setup imports="gavo.protocols.products, gavo.utils.pyfits"/>
<code>
fitsPath = products.RAccref.fromString(
self.sourceToken["accref"]).localpath
hdus = pyfits.open(fitsPath)
ax = utils.getWCSAxis(hdus[0].header, 1)
for spec, flux in enumerate(hdus[0].data[0]):
yield {"spectral": ax.pix0ToPhys(spec), "flux": flux}
hdus.close()
</code>
</iterator>
</embeddedGrammar>
<make table="spectrum">
<parmaker>
<apply procDef="//ssap#feedSSAToSDM"/>
</parmaker>
</make>
</data>
The way this figures out the file from which to parse will work if you
have the actual file path in the product table. When you hide the
upstream format as recommended, you have to follow some custom
convention. The SSAP+datalink template has:
sourcePath = urllib.decode(
self.sourceToken["ssa_pubDID"].split('?', 1)[-1])
This works if you use the macro standardPubDID as in the
template.
But you can do arbitrary things here; see the califa/q3 RD
for an example for how you can decode the accref to database rows.
A more complex scenario with ASCII data stored externally and cached is
in theossa/q, a case where multiple orders of echelle
spectra are being processed in flashheros/q.
You will notice that the rowmaker in both the example and the template
is missing. DaCHS will then fill in the
default one, which essentially is idmaps="*". Since you are writing
the grammar from scratch, just use the names of the columns defined in
the instance table and be done with it. The predefined column names are
spectral and flux, so make sure you always have keys for them in
the dictionaries you yield from your grammars.
While there is no rowmaker, the make does have
an Element parmaker; this is stereotypical,
just always use the procDef as here. It copies values from the SSA
input row to the params in the instance table.
Finally, you would need to write the service. For SSAP and SODA,
what's in the template ought to just work. Add Element
metaMaker-s to provide links to, e.g., your raw input files if you want.
In that case, please skim the endless
chapter on Datalink and SODA in the reference documentation
to get an idea of how
descriptor generators, data functions, and meta makers play together.
For reasons discussed in Datalinks in Columns,
it may be a good idea to include a custom
column with a datalink URL in the SSAP table. The ssap+datalink
template already has such a column in its source table and fills it in
import's rowmaker.
You will usually want to touch up the instance table's metadata a bit.
For example, the VOTable name of these tables by default will be
constant (instance, usually), which will then show up, e.g., in the
spectrum browser of SPLAT, where it is not terribly useful. It is hence
usually a good idea to have an apply in then parmaker of your sdm
instance-making data item.
Note that the SSAP row with your metadata is not in vars here as
usual elsewhere. Instead, you will find it as the sourceToken on
the parser. The following example, lifted from gaia/s3,
ought to help you figure things out:
<make table="instance">
<parmaker>
<apply procDef="//ssap#feedSSAToSDM"/>
<apply name="update_metadata">
<code>
sourceId = vars["parser_"].sourceToken["source_id"]
targetTable.setMeta("description",
base.getMetaText(targetTable, "description")
+" for Gaia DR3 object {}".format(sourceId))
targetTable.setMeta("name", str(sourceId))
</code>
</apply>
</parmaker>
<rowmaker idmaps="*"/>
</make>
As long as there is no anointed successor to SSAP explicitly
catering to time series, you
can use SSAP to publish time series. It is a bit of a hack, but
clients like SPLAT do about the right thing with them.
As to examples, check out k2c9vst/q (which
parses the data from ASCII files) and
gaia/q2 (which stores the actual time series in the
database, a technique we believe is a very good idea).
To notify the Registry (and possibly clients, too), that you are
producing time series, do two things:
Globally declare that you serve time series by setting:
<meta name="productType">timeseries</meta>
near the top of your RD. The ssap+datalink template has more
information on the productType meta.
- have 'timeseries' as ssa_dstype in your view definition.
This section is under construction
Since there is no actual agreed-upon standard for the serialisation of
time series, you will probably have to produce time series on the fly;
DaCHS helps you to produce something that will hopefully follow
standardisation efforts in the VO without much additional work later
on, using, you guessed it, a mixin. For now, the only mixin available
is for photometric timeseries: See the //timeseries#phot-0
mixin to build things. If you have other time series, please write
mail to dachs-support.
To see things in action, refer to k2c9vst/q, the instance
table and the corresponding makes.
However, there is an additional complication, showcased in
gaia/q2 and bgds/l:
it is quite common to have time series in multiple bands in one
resource. For DaCHS, this is a bit of a problem, because the band
influences quite a bit of the table metadata in DaCHS – this is in the
mixin, and what you set there is fixed once the table instance is made.
To get around this, look at the technique shown in bgds/l.
This first defines a STREAM time-series-template with a macros where
the items very between bands:
<STREAM id="time-series-template">
<table id="instance-\band_short">
<mixin
effectiveWavelength="\effective_wavelength"
filterIdentifier="\band_human"
longitude="@ra"
latitude="@dec"
It then uses a LOOP to fill these slots and create one table definition
per band:
<LOOP>
<csvItems>
band_short, band_human, band_ucd, effective_wavelength
i, SDSS/i, em.opt.I, 7.44e-7
r, SDSS/r, em.opt.R, 6.12e-7
</csvItems>
<events source="time-series-template"/>
</LOOP>
The dispatch between the different table templates then happens in the
data function of the tsdl service, using a somewhat obscure feature of
rsc.Data: when using the createWithTable class function, you can
pass in the table that the data item should make. This obviously only
works in specialised circumstances like the one here, but then it's
really convenient. So, while the make in make_instance claim to
build instance-i, this is really overridden in the datalink
service's data function to select the actual table definition:
<dataFunction>
<setup imports="gavo.rsc"/>
<code>
dd = rd.getById("make_instance")
descriptor.data = rsc.Data.createWithTable(dd,
rd.getById("instance_"+descriptor.band))
descriptor.data = rsc.makeData(
dd,
data=descriptor.data,
forceSource=descriptor)
</code>
</dataFunction>
(where descriptor.band has been pulled from the dataset identifier
in the custom descriptor generator of that datalink service; that
pattern is probably a good idea when you are in a similar situation).
This will not scale well to many dozens of bands – if you have that,
you probably want somewhat more hardcore means –, but for the usual
handful of bands this is a relatively reasonable way to produce
time series with nice metadata.
“Obscore”, in VO jargon, refers to a publication of datasets by putting
their metadata into a TAP-queriable database table with a bespoke set of
columns. It lets people pose very complex constraints, even using
uploaded tables, and it is flexible enough to support almost any sort of
data the typed services (SIAP, SSAP) serve and a lot more.
You may ask: Why have the S*APs in the first place? The answer is,
mainly, history. Had we had TAP from the start, it is likely we had not
bothered with defining standards for typed services. But that's not how
things worked out, and thus client support of Obscore still is inferior
to that of the typed services.
However, with a view to a future migrating towards obscore, it is
certainly a good idea to publish data through obscore, too. The good
news is that in DaCHS, that is generally close to trivial.
You will sometimes see something called ObsTAP mentioned. This was
meant to refer to “Obscore queried through TAP”, but since, really,
everyone uses Obscore through TAP, people do not say ObsTAP much any
more. If you see it somewhere, pretend it is really saying Obscore.
Before you can do anything with obscore, you have to run:
dachs imp //obscore
This will also declare support for the obscore data model in your TAP service's
registry record, which will make all-VO obscore queries use your
service. Avoid that if you do not really publish anything through
Obscore.
To drop Obscore if you have accidentally imported it, run:
dachs drop --system //obscore
Internally, the ivoa.obscore table is implemented as a view. If this
view contains bad SQL or tables that have been dropped, working with
obscore can result in rather confusing messages. If that happens, try:
dachs imp //obscore recover
This should remove the bad content from the view statement.
Obscore defines a single table called ivoa.obscore. In DaCHS, that
table typically contains data from a multitude of resources with
different metadata structures. To keep that manageable, DaCHS
implements the table as a view, where the individual tables are mapped
onto that common schema. These mappings are almost always created using
a mixin from the //obscore RD. Filling out its parameters will result
in SQL DDL fragments that are eventually combined to the view
definition. In case you are curious: The fragments are kept in the
ivoa._obscoresources table.
There is some documentation on what to
put where in the mixin documentation, but frankly, as a publisher, you
should have at least passing knowledge of the obscore data model
(2017ivoa.spec.0509L).
When you start with a table underlying a typed service, you can get away
with just saying something like (using SIAP as an example):
mixin="//obscore#publishSIAP"
to the table definition's start tag. You do not have to re-import a table to
publish it to Obscore when you have already imported it – dachs imp -m
<rd id> && dachs imp //obscore will include an existing table in the
obscore view.
When you import data without the -m flag, the mixins arrange for
everything, so you do not need the extra step of importing //obscore.
Since the Obscore data model is quite a bit richer than SIAP's and just a bit
richer than SSAP's, you will usually want to add extra metadata through
the mixin, for instance:
<mixin
sResolution="0.5"
calibLevel="2"
>//obscore#publishSIAP</mixin>
Again, a dachs imp -m followed by an import of //obscore would
be enough to make these changes visible in ivoa.obscore.
See SIAP and Obscore and SSAP and Obscore for more information
on how to Obscore-publish typed data.
Dataset Identifiers
Obscore uses the concept of dataset identifiers rather extensively, and
it is not unlikely that queries against the obs_publisher_did column
will be run – not the least in connection with datalink, in which the
DID has the role of something like a primary key. DaCHS'
obscore-associated datalink service, for instance, will do such queries,
and will be slow if postgres has to seqscan large tables for pubDIDs.
While DaCHS probably does a good job with creating usable (and globally
unique) publisher DIDs, it will not index them by default. Use the
createDIDIndex parameter of the various mixins to make one if your
data collection contains more than just a few hundered entries and there
is no index on it anyway.
On the other hand, the creator DID would be assigned by whoever wrote
the data file, and you should not change or invent it. It was intended
to let people track who republishes a given data set, weed out
duplicates, and the like. Regrettably, only very few data provides
assign creator DIDs, so it's probably not worth bothering.
If you are in a position in which you could make your data provider
generate creator DIDs, you could make them set a good precendent. DaCHS
helps you by letting you claim an authority for them (which would be the
first step). See tutreg/gavo_edu_auth for an example RD
that, when dachs pub-ed, will claim an authority for your publishing
registry, and see Claiming an Authority for the background on
authorities.
target_class
The obscore model has the notion of a target class for pointed
observations; this is intended to cover use cases like “get me spectra
of Galaxies“ or so. Of course, this only works with a common vocabulary
of object types, which does not actually exist in the VO at this time.
The next best thing is SIMBAD's types, which are to be used until
then.
s_region
Obscore has two ways to do spatial queries: using s_ra, s_dec, and
perhaps s_fov on the one hand, and using s_region on the other. That is
a bit unfortunate because in practice you have to have two indices over
at least three columns. Also, DaCHS really likes it if columns are
type-clean, and thus the mixins take quite a bit of pain to make sure
only polygons are in s_region. Given s_region in our obscore
has an xtype of adql:REGION and is thus polymorphic, you might get
away with having other types in there. No proimises on the long term,
though.
Having said all that: please make sure that whenever there is a position
of some kind, you also fill s_region; this is not a problem in SIAP, but
where you only have a position and aperture, in a pinch fill in
something like:
<map key="s_region">pgsphere.SCircle.fromDALI(
[alpha, delta, aperture]).asPoly(6)</map>
(the //ssap#setMeta mixin already does that when both a
position and an aperture are available).
See also Creating pgSphere Geometries for more information on how to
fill geometry-valued columns.
You can also have “pure” Obscore tables which do not build on protocol
mixins. A live example is the cubes table in the
califa/q3 RD within
the GAVO data centre. Here is a brief explanation of how this works.
Somewhat like with the SSA view, you define a table for
the obscore columns varying for your particular data collection.
In that tables' definition re-use the metadata given in
the global obscore table. A compact way to do that is through a LOOP
(see Active Tags) and original references, exploiting the
namePath on Element Table:
<table id="cubes" onDisk="True" namePath="//obscore#ObsCore">
<LOOP listItems="obs_id obs_title obs_publisher_did
target_name t_exptime t_min t_max s_region
t_exptime em_min em_max em_res_power">
<events>
<column original="\item"/>
</events>
</LOOP>
adql="True" is absent here as the obscore mixin will set it later.
To just have all obscore columns in your table, you can write:
<FEED source="//obscore#obscore-columns"/>
instead of the LOOP.
If you do not have any additional columns
(which you can of course have) and just want to have your datasets in
the obscore table, consider having <adql>hidden</adql> after the
obscore mixin. This will make your table invisible to but still
readable by TAP. This is desirable in such a situation because the
entire information of the table would already be contained in the
obscore table, and thus there is no real reason to query the extra table. In
the Califa example cited above, that is not the case; there is
a wealth of additional columns in the custom, non-obscore table.
We believe this will be the rule rather than the exception.
For a quick overview over what column names you can have in the
listItems above, see the obscore table description.
Even with a custom obscore-like table, you will
almost always want to have DaCHS manage your products. This
works even when all your files are external (i.e., you're entering http
URLs in //products#define's path), and so use the
//products#table mixin (which you don't see with SIAP and SSAP as their
mixins pull it in for you):
<mixin>//products#table</mixin>
Then, apply the //obscore#publish mixin, which is like the
protocol-specific mixins except it doesn't pre-set parameters based on
what is already in protocol-specific tables:
<mixin
access_estsize="10"
access_format="'application/x-votable+xml;content=datalink'"
access_url="dlurl"
calib_level="3"
dataproduct_type="'cube'"
em_max="7e-7"
em_min="3.7e-7"
em_res_power="4000/red_disp_mean"
facility_name="'Calar Alto'"
instrument_name="'PMAS/PPAK at 3.5m Calar Alto'"
o_ucd="'phot.flux;em.opt'"
obs_collection="'CALIFA'"
obs_title="obs_title"
s_dec="s_dec"
s_fov="0.01"
s_ra="s_ra"
s_region="s_region"
s_resolution="0.0002778"
t_exptime="t_exptime"
t_max="t_max"
t_min="t_min"
target_class="'Galaxy'"
target_name="target_name"
>//obscore#publish</mixin>
Essentially, what is constant is given in literals, what is variable is
given as a column reference. It is a bit unfortunate that you have to
enter quite a few identity mappings in here, but pre-setting them won't
help in most cases.
That's about it for defining the table. To fill the table, just have a
normal rowmaker; since the table contains products, don't forget the
//products#define rowfilter in the grammar.
Most of the time, you do not need to worry about telling the Registry
anything about what you do with obscore. As long as you have the
obscore table, your TAP registry record will tell the Registry about it,
and as long as that is published, clients looking for obscore-published
data will find your service and thus your datasets (if they match
the constraints in the obscore query, that is).
In the other direction, when you register a service for a data
collection published via a typed protocol, DaCHS will add a reference
such that clients can see that the data is available through obscore,
too.
But when you do not register a typed service for your data collection
for some reason, you should also register the standalone table as
described in publishing DaCHS-managed tables via TAP
SIAP version 2 is just a a thin layer of parameters on top of obscore.
To publish with SIAP version 2, simply ingest your data as described in
publishing images via SIAP and add the
//obscore#publishSIAP mixin.
In contrast to SIAP version 1, you do not define or register a service
for a SIAPv2-published data collection. Instead,
there is a sitewide SIAPv2 service at <root
URL>/__system__/siap2/sitewide/siap.xml. It is always there,
but it is unpublished by default. To publish it, you should furnish
some extra metadata in the userconfig RD and then run:
dachs pub //siap2
Specifically, get the sitewidesiap2-extras stream and follow the
instructions there to update the meta items as appropriate; at this
point, they are exactly analogous to the ones for SIAP version 1.
EPN-TAP is a standard for publishing planetary data via TAP.
The EPN-TAP recommendation is now in version 2.0, and support
for that is provided by //epntap2. There's an official web-based client
for EPN-TAP at http://vespa.obspm.fr. A hands-on guide on how to do
EPN-TAP publications from scratch with a view to this custom client is
available on the VO Paris wiki. If you have not at least skimmed
this document's Introduction and the Dachs Basics, by all means
start there.
You can use EPN-TAP to publish data without any associated datasets;
this happens, for instance, in the catalogue of minor planets,
mpc/q. More commonly, however, there are data files
(“products”) associated to each row. In this case, have at least:
optional_columns="access_url access_format access_estsize"
– these are required to manage such products.
When publishing datasets, there are two basic scenarios:
- local files; you let DaCHS find the sources, parse them, and infer
metadata from this; DaCHS will then serve them. That's what's shown
in the quick start example below. We believe that is the more robust
model overall.
- ingest from pre-destilled metadata; this is when you don't have the
files locally (or at least DaCHS should not serve them). Instead,
you read metadata from dumps from other databases, metadata
stores, or whatever. The titan/q RD shows an example
for that.
To start an EPN-TAP service, do as per Starting from Scratch and use
the epntap template:
dachs start epntap
Data in planetary sciences often comes in PDS format, which
superficially resembles FITS but is quite a bit more sophisticated.
Unfortunately, python support for PDS is underwhelming. At least there
is PyPDS, which needs to be installed for DaCHS' Element
pdsGrammar to work.
Install PyPDS if you don't have it anyway:
curl -LO https://github.com/RyanBalfanz/PyPDS/archive/master.zip
unzip master.zip
cd PyPDS
python setup.py build
sudo python setup.py install
Get the sample data:
cd `dachs config inputsDir`
curl -O http://docs.g-vo.org/epntap-example.tar.gz
tar -xvzf epntap-example.tar.gz
cd lutetia
Import it and build the previews from the PDS images:
dachs imp q
python bin/makePreview.py
Start the server as necessary. If you go to your local ADQL endpoint
(something like http://localhost:8080/adql) and execute queries like:
SELECT * FROM lutetia.epn_core
there.
For access through a standard protocol, start TOPCAT, select VO/TAP
Query, and at the bottom of the dialog enter http://localhost:8080/tap
(or whatever you configured) in “TAP URL”. Hit “Use Service”, wait
until the table metadata is in and then again query something like:
SELECT * FROM lutetia.epn_core
Hit “Run Query”,open the table and play with it. As a little visual treat,
in TOPCAT's main window hit “Activation Action”, and configure the
preview_url column under “View URL as Image”. Then click on the table rows.
To get into Vespa's query interface, you will have to register your
table. Do not do this with the sample data.
In essence, EPNcore is just a set of columns, some mandatory, some
optional. The mandatory ones are pulled into a table by using
the //epntap2#table-2_0 mixin,
which needs the spatial_frame_type
parameter (see the reference for what's supported for it) since that
determines the metadata on the spatial columns. Optional columns can be
pulled in through the optional_columns mixin parameter, and, as said
above, a few of these optional columns are actually required if you want
to publish data products through EPN-TAP. The reference
documentation lists what is available. You can, of course, define
further, non-standard columns as usual.
So, an EPN-TAP-publishable table might be defined like this:
<table id="epn_core">
<mixin spatial_frame_type="body"
optional_columns= "access_url access_format access_estsize
access_md5 alt_target_name publisher
bib_reference" >//epntap2#table-2_0</mixin>
<column name="acquisition_id" type="text"
tablehead="Acquisition_id"
description="Extra: ID of the data file in the original archive"
ucd="meta.id"
verbLevel="2"/>
</table>
To populate EPNcore tables, use the //epntap2#populate-2_0,
apply identifying the parameters applying to your data collection and
setting them as usual (cf. Mapping Data). You may need to refer to
the EPN-TAP proposed specification now and then while doing that.
Note again that
parameter values are python expressions, and so you have to use quotes
when specifying literal strings.
If you have to evaluate complex expressions, it is recommended to do the
computations in Element var-s and then use the variables set
there in the bind``s (as ``@myvar). This also lets you re-use
values once computed. Even more complex, multi-statement computations
can be done in Element apply with custom code.
Serving Local Products
When DaCHS is intended to serve local files itself (which is
preferable),
use the //products#define rowfilter in the grammar as usual
(cf. The Products Table). Note that this assumes by default that
you are serving FITS files, which in EPN-TAP most likely is not the
case. Hence, you will usually have to set the mime parameter as
in, perhaps:
<bind name="mime">"image/x-pds"</bind>
in your row maker, the use the epntap2#populate-localfile-2_0
apply (if this gives you errors, make sure you have the optional
columns for products as described above).
Incidentally, you could still use that even for external products, which
is useful if you have DaCHS-generated previews or want to attach a
datalink service. In that case, however, you have to invent some accref
for DaCHS (probably the remote file path) and set that in
products#define's accref parameter. The remote URI then needs to go
into the path parameter.
Serving External Products
When all you have are external URLs, you do not need to go through the
products table (though you still can, as described in Serving Local
Products). It is simpler, however, to just directly fill the
access_url, access_format and access_estsize columns using
plain Element map-s.
The s_region parameter (see
//epntap2#populate-2_0) is essentially
a footprint describing the area covered by 2D spatially extended data products.
It uses pgshpere types such as spoly, scircle, smoc, or spoint
(we advise against the use of spoint as a s_region type: only spatially
extended types should be used). The default type is spoly, the others
must be specified using the regiontype mixin parameter (see
//epntap2#table-2_0).
For more information on how to create values for these regions, see
Creating pgSphere Geometries.
EPN-TAP tables are queried through the DaCHS' TAP service. If
you have registred that, there is nothing else you need to do to access
your data.
For registration, just add:
<publish/>
to your table body and run dachs pub <rd-id>.
Datalink is not a discovery protocol like the others discussed so far;
rather, it is a file format and a simple access protocol for
representing relationships between parts of complex datasets.
Essentially, datalink is for you if you have parts of a dataset's
provenance chain, refined products like source lists and cutouts, masks, or
whatever else. Together with its companion standard SODA, it also lets
clients do server-side manipulations like cutouts, scaling, format
conversion, and the like.
Datalink this is particularly attractive when you have large datasets and you
don't want to push out the whole thing in one go by default. Instead,
clients can then query their users for what part of the dataset they
would like to get – or to alert them of the fact that a large amount of
data is to be expected.
Since Datalink is very flexible, defining datalink services is a bit
involved. The reference documentation has a large section on it.
Here, we discuss some extra usage patterns. The concrete application to
spectra and images is discussed in SSAP and Datalink and
SIAP and Datalink. See also the Datalink showcase in the GAVO
data centre for live examples of datalink documents.
In DaCHS, Datalink services are associated with tables. This
association is declared using the _associatedDatalinkService meta
item, which consists of a serviceId (a service reference as per
referencing in DaCHS) and an idColumn (stating from which column
the ID parameter to the datalink service is to be taken from). So,
within the table, you add something like:
<meta name="_associatedDatalinkService">
<meta name="serviceId">dl</meta>
<meta name="idColumn">pub_did</meta>
</meta>
This implies that the service dl within the current RD will produce
a datalink document if passed a string from idColumn. The example
implies that this column ought to contain publisher DIDs (see Dataset
Identifiers), which is
what the standard descriptor generators that come with DaCHS like to
see. Since publisher DIDs tend to be a bit unwieldy (they are supposed
to be globally unique, after all), the standard descriptor generators
will also let you pass in plain accrefs.
If you write your own descriptor generator, you are free to stick whatever
you like into the idColumn, just so long the table and the
descriptor generator agree on its interpretation.
The _associatedDatalinkService declaration discussed in the previous
section is all it takes when you serve data to datalink-aware clients.
If, however, you also want to cater to clients without native datalink
support, you may want to add links to the datalink documents in your
responses; this is particularly advisable when you have services working
through forms in web browsers.
One way to effect that is by defining a column like this:
<column name="datalink" type="text"
ucd="meta.ref.url"
tablehead="DL"
description="URL of a datalink document for this dataset"
verbLevel="1" displayHint="type=url">
<property name="targetType"
>application/x-votable+xml;content=datalink</property>
<property name="targetTitle">Datalink</property>
</column>
The property declarations add some elements to response VOTables that
inform clients like Aladin what to expect when following that link. At
this point, this is a nonstandard convention.
You will then have to fill that column in the rowmaker. As long as the
product is being managed through the products table and you thus used
the //products#define rowfilter in the grammar, all that
takes is a macro:
<map key="datalink">\dlMetaURI{dl}</map>
Here, the “dl” in the macro argument must be the id of the datalink
service.
This method will retain the datalink columns even in protocol responses.
While at this point there is something to be said for that, because
users immediately discover that datalink is available, datalink-aware
clients will then have both the datalink through
_associatedDatalinkService and the in-table column, which, since
they cannot know that the two are really the same, will degrade user
experience: Why should the same datalink be present twice?
With increasing availability of datalink-aware protocol clients, we
therefore prefer a second alternative: produce the extra datalinks only
when rendering form responses. To do that,
furnish web-facing services with an Element outputTable. In
there, do not include the column with your publisher DID but instead
produce a link directly to the links response, somewhat like this:
<service id="web" core="siacore">
...
<outputTable>
<outputField name="dlurl" select="accref"
tablehead="Datalink Access"
description="URL of a datalink document for the dataset
(cutouts, different formats, etc)">
<formatter>
yield T.a(href=getDatalinkMetaLink(
rd.getById("dl"), data)
)["Datalink"]
</formatter>
<property name="targetType"
>application/x-votable+xml;content=datalink</property>
<property name="targetTitle">Datalink</property>
</outputField>
In particular for large datasets, it is usually a good idea to keep
people from blindly pulling the data without first having been made
aware that what they're accessing is not just a few megabyte of FITS.
For that, datalink is a good mechanism by pointing to a links response
as the primary document retrieved.
Of course, without a datalink-enabled client people might be locked out
from the dataset entirely. On the other hand, DaCHS comes with a
stylesheet formatting links responses to be usable in a common web
brower, so that might still be preferable to overwhelming unsuspecting
clients with large amounts of data.
To have datalinks rather than the plain dataset as what the accref
points to, you need to change what DaCHS thinks of your dataset; this is
what the //products#define rowfilter in your grammar is for:
<fitsProdGrammar qnd="True">
<rowfilter procDef="//products#define">
<bind key="path">\dlMetaURI{dl}</bind>
<bind key="mime">'application/x-votable+xml;content=datalink'</bind>
<bind key="fsize">10000</bind>
[...]
</rowfilter>
[...]
</fitsProdGrammar>
The fsize here reflects an estimation of the size of the links
response.
When you do this, you must use a descriptor generator that does not
fetch the actual file location from the path in the products table,
since that column now contains the URI of the links response.
For FITS images, you can use the DLFITSProductDescriptor class as
//soda#fits_genDesc's descClass parameter. The base
functionality of a FITS cutout service with datalink products would then
be:
<service id="dl" allowed="dlget,dlmeta">
<meta name="title">My Cutout Service</meta>
<datalinkCore>
<descriptorGenerator procDef="//soda#fits_genDesc"
name="genFITSDesc">
<bind key="accrefPrefix">'mysvcs/data'</bind>
<bind key="descClass">DLFITSProductDescriptor</bind>
</descriptorGenerator>
<FEED source="//soda#fits_standardDLFuncs"/>
</datalinkCore>
</service>
If you have something else, you will have to write the resolution code
yourself – DLFITSProductDescriptor's sources (in
gavo.protocols.datalink) should give you a head start on how to do
that); see also the tsdl service bgds/l for how to integrate
that into your RD.
Note that DaCHS will not produce automatic previews in this
situation. Have a look at Product Previews for what to do
instead.
The Hierarchical Progressive Survey is a nifty way do publish
“zoomable“ data, in particular for images (but catalogues work as well).
Conceptually, HiPSes are static data, which you will first have to
generate. We will cover publishing an image HiPS from FITS files here.
First, get Hipsgen, a piece of java that will do the necessary math.
Move the downloaded file Hipsgen.jar to a convenient place; the
following assumes it's in your home directory. If you want to publish
HiPSes, at least skim the hipsgen user manual before proceeding.
To generate your HiPS, you next need to build a parameter file. In
DaCHS, you will conventionally have that in res/hips.params (do put
it under version control). DaCHS will generate a template for you. Use
something like (mkdiring res if necessary):
dachs adm genhips q#import 4 > res/hips.params
Here, q#import points to the data element importing the images
you want to turn into a HiPS. The 4 in the example gives that
smallest order to generate a HiPS for. The value here depends on the
coverage of your data collections. Use 0 when you have full sky
coverage, 1 for half the sky, and so on. To generate a HiPS for a
“survey“ of a one-degree patch, you would use 6.
This creates a control file for hipsgen
that will guess parameters to turn the FITS files the grammar will
presumably read into a HiPS in the hips subdirectory of your resource
directory.
It is likely that you will want to edit the hips.params. For one, adm
hipsgen contains a few rather naive heuristics on how to come up with
the inputs directories and identifiers.
But mainly, the hipsgen manual describes numerous options to influence
the hipsgen works for use on the command line (e.g., sky background
subtraction, cutting off borders, and much more that DaCHS simply cannot
guess). Put these options into your hips.params rather than on the
command line, and you will have a much easier time re-generating the
HiPS some later time.
In simple cases, you may get away with what DaCHS has generated. Either
way, one hips.params is ready, run:
java -Xmx4g -jar ~/Hipsgen.jar -param=res/hipsgen.param
inside the resource directory (-Xmx4g means “give it 4 Gig RAM”;
increase as sensible). In particular if experimenting on a large data
collection, consider using -pilot=10 (for just processing 100 input
images) first and proceed to inspect the visual appearance of your new
HiPS before wasting more resources on a computation that could possibly
be improved quite substantially with a modicum of tweaking.
This produces (with quite a bit of computation) the hierarchy of files
that then serves as a HiPS.
In principle, you could use a static renderer to publish this HiPS; once
computed, DaCHS only needs to hand out files. However, for proper
registration and some minor smoothing, there is a custom renderer for
HiPS, the hips renderer. With this, a service for handing
out your HiPSes looks like this:
<service id="hips" allowed="hips">
<meta name="title">Fornax Cluster Core in HiPS</meta>
<meta name="description">A HiPS generated from the high-resolution
image.</meta>
<property name="staticData">hips</property>
<nullCore/>
</service>
The id is of course up to you. With our choice here, the URIs into
the service look a bit silly (...q/hips/hips). Shortening the
description over the RD's description is probably a good idea; HiPS
descriptions tend to be one-liners. The staticData property needs to
point to the subdirectory into which your built your HiPS (where what is
here is the default of adm hipsgen). Finally, the nullCore says
that this service will never do any computation.
With this service, DaCHS has enough information to complete the
hips/properties file that keeps HiPS metadata in a proprietary format.
Hence, run:
dachs adm hipsfill q#hips
(the argument is the DaCHS id of the service you just defined). Have a
look at hips/properties after that and fix things as necessary.
Note that hipsfill will not touch lines that are not commented out. If
you want it to re-compute a line, prefix it with a hash.
If your HiPS is relatively small, consider pointing the Aladin lite in
the hips/index.html file generated by Hipsgen to some position that
actually has imagery. To do that, locate the instantiation of aladin in
that file and edit it to read somewhat like this:
var aladin = $.aladin("#aladin-lite-div", {showSimbadPointerControl: true,
target: "54.625 -35.455", fov: 1});
(for starting up at α=54.625, δ=-35.455 with a field of view of a
degree).
You could publish this service to the registry in the usual way.
However, it is rather likely that you already have a service (e.g., SIAP
or TAP) on the data collection in question. In that case (and dachs
hipsgen in effect assumes it), it would be wasteful to create a second
resource record for the same data collection. Instead, add a hips
capability to the existing record using:
<publish render="hips" sets="ivo_managed" service="hips"/>
in the service element of the other service. Run dachs pub after
doing this.
UWS is not a full-fledged protocol; it is rather a pattern for how to
manage asynchronous execution, to be adapted to concrete use cases and
then to be used from case-by-case scripts – or, of course, as parts of
other standards defining the sort of job to be executed (e.g., TAP,
SODA).
These caveats given, operators can wrap their own services into a UWS
pattern via DaCHS. As I just said, this will not magically work with
some “UWS client”, but at least you do not have to implement the subtle
job management yourself. Conceptually, user UWS combines some custom
core – i.e., Element customCore or Element
pythonCore with the async renderer (which also means that by
allowing api.xml, for instance, you can also run the services in
sync if you want, and form will give you a normal HTML form).
As a basic example, consider this simple service computing the powers of
a complex number:
<service id="pc" allowed="api,form,async">
<pythonCore>
<inputTable>
<inputKey name="opre" description="Operand, real part"
required="True"/>
<inputKey name="opim" description="Operand, imaginary part">
<values default="1.0"/>
</inputKey>
<inputKey name="powers" description="Powers to compute"
type="integer[]" multiplicity="single">
<values default="1 2 3"/>
</inputKey>
</inputTable>
<outputTable>
<outputField name="re" description="Result, real part"/>
<outputField name="im" description="Result, imaginary part"/>
<outputField name="log_value"
description="real part of logarithm of result"/>
</outputTable>
<coreProc>
<setup imports="cmath"/>
<code>
powers = inputTable.args["powers"]
op = complex(inputTable.args["opre"],
inputTable.args["opim"])
rows = []
for p in powers:
val = op**p
rows.append({
"re": val.real,
"im": val.imag,
"log_value": cmath.log(val).real})
t = rsc.TableForDef(self.outputTable, rows=rows)
t.addMeta("info", str(inputTable.args["opim"]),
infoName="inpar",
infoValue="opim")
return t
</code>
</coreProc>
</pythonCore>
</service>
Most of this defines a python core, including the input and output
tables, possibly with defaults. The important point for enabling UWS
operation is the async in the allowed renderers.
Feel free to try this within any RD you want. In case of doubt, create
a file temp.rd in your inputs directory and decorate the above
with:
<resource schema="test">
...
</resource>
Then point your browser to http://localhost:8080/temp/pc/async (or
whereever you put the service element). You can then hit New
job... and set the various job parameters. Note that before DaCHS
2.10.3, you need to remove the input for powers, or you will get a parse
error. That was a bug. Also note that each time you post something for
a list-valued parameter (like powers here), the extra values will be
appended to what is already there; there currently is no defined
mechanism to clear such a list-valued parameter, and work on the UWS
standard is necessary to define it – can you help?
Do not forget to hit Update when you have done your edits. Once
configured, you can start the job using Execute. Then hit the
browser's reload button; you should see a link to the results, which is
a VOTable you can view in the usual ways. Finally, hit Delete job
to return to the job submission page.
Note that this is of course not the way you should be using UWS; all
this is really intended to be consumed by machines, and what you saw is
XSLT-processed XML. Still, this may be a reasonable way to experiment
with UWS.
Also note that anonymously submitted UWS jobs are globally visible. If
you give users credentials, however, they will only see their own jobs.
In principle, you can often just add form to the list of allowed
renderers in a service and have a service display an HTML form and
return an HTML table.
However, such a service will typically not be very pretty or usable –
DaCHS services are normally intended to be consumed by machines. These
do not care about a plethora of parameters they will never use, and they
can easily fix the display of units or a large number of columns for
their users.
Web browsers and humans are not good at that, and so it often pays to
build extra services targeted at browser users next to typed services
– or perhaps have stand-alone forms, for instance on top of a
TAP-published table.
Soap box: don't waste too much time on form-based services when there are
standard protocols to access the data – they're a pain to
use, even if, and I admit that, most of your initial users may not have
realised that yet. Consider every minute spent on form service as a
compromise with an imperfect world. Of course, there are exceptions.
Me, for instance, I kind of like our sasmirala service, which doesn't
work without a good dash of HTML.
The template for a form-based service to a database table is:
<service id="web" allowed="form">
<meta name="shortName">____ web</meta>
<dbCore queriedTable="____"/>
</service>
As usual, the short name must be shorter than 17 characters, and
queriedTable contains the id of the table you want to query. That's an
XML id or a DaCHS cross-RD reference, not the name the table has in
the database. Of course, giving more of the usual service metadata
usually is a good idea.
The query parameters are generated from the condDescs of the dbCore; see
Service Definitions for a bit more on them. So, the classical
pattern for adding such a parameter is:
<condDesc buildFrom="col_to_query"/>
This will enable all kinds of VizieR-like expressions on col_to_query,
where the operators available depend on whether col_to_query is a
string, a number, or a timestamp. If you've run dachs limits
on the RD, you will also get placeholders giving the ranges
available for numeric columns.
However, there are cases when you want to fight the horror vacui
(sitting in front an empty form without an idea what to put where) more
carefully, in particular with enumerated columns. In that case, you can
tell DaCHS to produce selection boxes. This is based on building
condDescs from inputKeys with options:
<condDesc>
<inputKey original="calib_level">
<values>
<option title="raw/custom">0</option>
<option title="raw/standard">1</option>
<option title="calibrated">2</option>
<option title="derived">3</option>
</values>
</inputKey>
</condDesc>
where calib_level references a column that carries the remaining
metadata. When the column already carries such a values element,
you can skip the values on the input key. In that case, even a simple
buildFrom will produce a selection box. Note, however, that giving
the values on the column itself will reject values not mentioned when
importing.
Manually constructing the input key also lets you control how many items
the browser will show in the selection box by giving a showItems
attribute to the input key (see below for an example). Making
showItems -1 will use checkboxes (or radiobuttons) instead of a
selection list.
Explicitly enumerating the options in inconvenient when there is an
open-ended list of terms, as for instance with object names,
obs_collection in obscore, or perhaps band or emulsion names. For
such cases, DaCHS lets you pick the values from the database itself
using the values element's fromdb attribute. This must be a SQL
query fragment without the SELECT returning exactly one column. For
instance,
<condDesc buildFrom="calib_level"/>
<condDesc>
<inputKey original="obs_collection" showItems="10">
<values fromdb="obs_collection from ivoa.obscore"/>
</inputKey>
</condDesc>
Note, however, that DaCHS will execute this query when loading the RD.
Hence, if this is a long-running query, the RD will take a long while to
load, which usually is unwelcome. Simply make sure that the query can
use an index if your table has a non-trivial size.
On DaCHS newer than 2.5.1, instead of fromdb simply give the source
column a statistics property enumerate. Then, dachs limits
will gather the necessary statistics itself (on the obscore, this is
already done).
DaCHS will also evaluate the multiplicity attribute of such input
keys. If it is multiple (which is the default in this situation),
widgets are chosen that let users select multiple items; if it is
single, it the widgets will only admit a single selected item.
In all these cases, the input keys will generate constraints using
DaCHS' default rules. This includes the various input syntaxes like the
VizieR expressions mentioned above; use vexpr-float,
vexpr-string, or vexpr-date as type when building explicit
input keys rather than using buildFrom and you want these extra
syntax things. With explicit input keys, you could also tell DaCHS to
understand PQL-like syntax (as in SSAP) with the types pql-int,
pql-float, pql-string, and pql-date, but, frankly, I don't
think that's a good idea.
When this is not enough and you need to generate custom query fragments,
see More on CondDescs.
In browser-based services, you are directly talking to the user, and
therefore you probably want to return not too many columns. I can also
confidently predict that even in the 2020ies you will still be asked for
horrible sexagesimal coordinates and wavelengths in whatever twisted
units your data providers prefer (though I have to say that moving away
from the VO's preferred wavelengths is a good thing).
For column selection, DaCHS by default picks the columns with a
verbLevel up to and including 20. Columns excluded in this way can
be added back using the “more output fields“ selector in the query form.
However, in particular for services built on top of standard services,
that selection that is often… improvable.
In such cases, you can define an output table on the service. The
idiomatic way to do that is to copy over the columns you want to retain.
The element outputTable has a legacy autoCols attribute for
that, but now that DaCHS has LOOPs, my advice is to use the following
pattern, because it makes it much easier to sprinkle in modified columns
now and then:
<outputTable namePath="//obscore#ObsCore">
<LOOP listItems="obs_collection obs_title t_min t_max"><events>
<outputField original="\item"/>
</events></LOOP>
<outputField original="s_ra" displayHint="type=hms"/>
<outputField original="s_dec" displayHint="type=dms"/>
<outputField original="s_fov" displayHint="displayUnit=arcsec,sf=0"/>
<LOOP listItems="em_min em_max"><events>
<outputField original="\item" displayHint="displayUnit=Angstrom"/>
</events></LOOP>
<LOOP listItems="access_estsize access_url s_xel1 s_xel2">
<events><outputField original="\item"/></events>
</LOOP>
</outputTable>
The first thing to note here is the namePath, which tells DaCHS
where to resolve the columns in by default (this will default to the
table the core queries, so you can usually leave it out).
Then, we use Active Tags to copy over columns we want to treat
uniformly. In this case, we first copy over a few columns literally,
but then want to furnish other columns with display hints:
here, we mogrify the positions to sexagesimal (definitely not
recommended) and do some unit conversion and numeric formatting for the
field of view. Then we give both spectral coordinates in Angstrom (this
assumes optical astronomers, obviously; a propos of which I should also
call out the spectralUnit display hint DaCHS has since version 2.6).
The element concludes with another loop copying literally.
Note that you can usually give the display hints in the column
definition, as they will not hurt the protocol access. This would simplify
the element above because you can then copy things without further
modification. An extra (questionable) benefit of giving the display
hints in the column definitions in the table is that these will even be
picked up when users to ADQL query through the web interface.
In case DaCHS' built-in display hints are not enough, you can use the
select and formatter attributes discussed in the reference
(Element outputField). For instance:
<outputField name="dimensions"
tablehead="Dim"
description="Pixel Dimensions, space x space x time x spectrum x
polarization"
select="array[s_xel1, s_xel2, t_xel, em_xel, pol_xel]">
<formatter>
return "x".join(str(s) if s else "1" for s in data)
</formatter>
</outputField>
which also shows how to select multiple values at a time.
You can also produce arbitrary HTML using stan syntax (cf.
templating.rstx, the section on in-python stan):
<outputField original="ivoid">
<formatter>
return T.a(href="/glots/q/showtables/qp/%s"%urllib.parse.quote(data))[
data[6:]]
</formatter>
</outputField>
Several of the examples above were from obsform/q, and RD
that is active in the GAVO data centre. If you have an obscore
service of your own, you can probably just pull the resdir into your
inputs and have the contents of that accessible to browser users, too.
It showcases a few tricks that may come in handy in your own form
creations. First, there is a cone search condition choosing custom
columns; this is necessary here because the obscore s_ra and s_dec
columns do not have the UCDs expected by the SCS condDesc (requires
DaCHS 2.6+):
<condDesc original="//scs#humanInput">
<phraseMaker original="//scs#humanSCSPhrase">
<bind key="raColName">"s_ra"</bind>
<bind key="decColName">"s_dec"</bind>
</phraseMaker>
</condDesc>
The next condition descriptor, <condDesc buildFrom="calib_level"/>,
showcases how DaCHS turns values metadata attached to the table column
into a selection box. To inspect the respective column definition, see
the output of dachs adm dump //obscore.
Then there is a condition descriptor for the collection column:
<condDesc>
<inputKey original="obs_collection"
tablehead="Collection" showItems="10">
<values fromdb="obs_collection from ivoa.obscore"/>
</inputKey>
</condDesc>
This is a bit of a sore spot, as this will be slow at this point once
your obscore table has grown to a certain size. This is because of a
nasty conspiracy between how you cannot have indexes on views and that
postgres does not optimise for constants in DISTINCT queries. We will
think about this a bit more deeply if and when this becomes a problem
for other deployers.
The real exciting pars are the last next two conditions, those for
spectrum and time. These are tricky because they are effectively
interval-valued in the database, with a minimum and maximum column each.
We still want to let people enter our VizieR-like expressions, which
means we have to do what DaCHS does behind the scenes form buildFrom
manually.
This at the moment requires a certain about of manual code (that will
also only work on DaCHS newer than 2.5). Here's the code for the
spectral condition:
<condDesc>
<inputKey name="BAND" type="vexpr-float"
unit="Angstrom" ucd="em.wl"
tablehead="Wavelength"
description="Wavelength covered by the dataset"/>
<phraseMaker>
<setup imports="gavo.svcs.vizierexprs">
<code>
obscore = parent.parent.queriedTable
minCol = svcs.InputKey.fromColumn(
obscore.getColumnByName("em_min"),
inputUnit="Angstrom")
maxCol = svcs.InputKey.fromColumn(
obscore.getColumnByName("em_max"),
inputUnit="Angstrom")
</code>
</setup>
<code>
try:
tree = vizierexprs.parseNumericExpr(inPars[inputKeys[0].name])
except utils.ParseException as msg:
raise base.ValidationError(
f"Bad VizieR syntax: {msg}", "BAND")
res = vizierexprs.NumericIntervalFlattener(
).getSQLFor(tree, (minCol, maxCol), outPars)
yield res
</code>
</phraseMaker>
</condDesc>
So, we first define a fully synthetic input key – none of the metadata
of em_min or em_max is terribly helpful here.
Then we declare a phrase maker. Once we have a good idea how to write
this, we will probably give a procDef that hides quite a bit of this
uglyness; then again, we as a community probably should just use more
interval-typed columns.
Phrase makers are procDef-s as, say apply, and hence they
consist of a setup part executed when the RD is imported and a
code part executed once per (in this case) query.
Here, we use the setup to find in the min and max columns. We make
input keys from them, as that is what the flattener we use later
expects. The fromColumn constructor also lets us smuggle in code
for unit adaptation (the inputUnit attribute that is turned into a
scaling factor by the input key machinery).
Whatever is defined in a procedure definition's setup code is
available in its code; we will use minCol and maxCol in a
moment.
First, however, we have to deal with the VizieR-like expressions we are
expecting. To parse them, we are using parseNumericExpr from the
svcs.vizierexprs module. This returns a parse tree that can, in
turn, be translated into SQL. Before we do that, we catch parse errors
(which DaCHS would return as 500 internal server errors) and turn them
into ValidationError-s for the BAND parameter. This lets DaCHS'
form renderer mark up errors inline rather than just spit out some ugly
error page.
The actual SQL generation happens using a NumericIntervalFlattener,
which encapsulates how to translate the various VizieR constructs to
SQL. If you think they should be translated differently, you could
derive your own flattener – see gavo.svcs.vizierexprs on how to do
that. Its getSQLFor method takes the parsed expression, the input
columns, and the dictionary of SQL parameters that gets passed into
condDescs implicitly. It is the use of two column objects rather
than just one that makes these interval flatteners so special that you
currently cannot currently get by without custom code.
If you have understood this code, the condition descriptor for TIME
will not be very surprising: You just use parseDateExprToMJD to get
the parse tree (and would use parseDateExprToDateTime if you had
timestamp-valued columns in the database). Once that's done, the
remaining code is essentially the same, as date and numeric intervals
are rather parallel in the VizieR grammar.
By the way, these two are obvious candidates for writing a common
procDef. Don't be too surprised if the actual SVN code already has
that by the time you read this.
Users can enter data into the database in many ways. The most common
would be through uploads, either directly into the database (as, e.g.,
in theossa/q), by doing file uploads with then get
periodically ingested (as, e.g., in lightmeter/q), or by
being harvested (the classical example would be the relational registry,
rr/q).
Sometimes, however, it is convenient to let people interactively edit
database content. We don't have a service that does this in the GAVO
data centre; there is a somewhat contrived example for that at
http://docs.g-vo.org/editsample_q.rd; to play with it, put it into
/var/gavo/inputs/editsample/q.rd.
If you have a look at the RD's content, you will first see the
definition of the table to be edited, objlist, and a data item that
fills it with more or less random data. If you have never used an
embedded grammar before, a brief glance at this might be inspiring. To
continue with the example, run dachs imp q, as you will need the
table to edit it.
Linking to Edit Services
The RD then has a view service, which lets you query the table using
the object id. In addition to normal DaCHS fare, it has this:
<outputTable original="objlist">
<outputField name="edit"
select="array[id, remarks]">
<formatter>
return T.a(class_="buttonlike", href="/\rdId/edit/form?"
+urllib.parse.urlencode({
"id": data[0],
"remarks": data[1] or ""}))["Edit this"]
</formatter>
</outputField>
</outputTable>
By saying original="objlist", you tell DaCHS to base the output
table on the full table in the database (see also Output Table for
alternative ways of getting output fields).
What's new is the edit output field; see
http://localhost:8080/editsample/q/view and send off the empty form to
see its effect. It has a select attribute that directly gives
material to put into the select clause. We use an array because we want
both the id (to know what we will be editing) and the remarks (in order
to make it easy not to lose old remarks). If you want to make more or
all fields editable, it is probably preferable to use the wantsRow
attribute of Element outputField; in that case data
within formatter is the entire row as a dictionary.
The formatter then produces stan, which is essentially HTML written in a
slightly cooler way (see templating.html#in-python-stan).
Here, I am setting a class attribute (the underscore is to dodge
python's class keyword) so you could style the link with Operator
CSS, and I am computing the URL in a halfway convenient way: using the
\rdId macro makes this thing independent of what RD it lives in,
and I am using urllib.parse.urlencode (which is part of the functions
available in rowmakers, which you have in formatters, too) to robustly
produce a query string.
In this particular case, we only want to edit the remarks column.
If you wanted to edit more columns, you would add the respective
columns in a similar way. Note, however, that urlencode will encode
None as literal "None", which is rarely what you want; if you
may have null values, make sure you map them to empty strings manually.
Writing an Edit Service
The actual edit service is protected to prevent accidental overwrites by
rampaging robots:
<service id="edit" limitTo="ari">
– see Restricting Access for how to manage users, groups, and
credentials in DaCHS.
I am also seting a link back to the view service:
<meta name="_related" title="Query Service"
>\internallink{\rdId/view}</meta>
This helps people to quickly go from the edit service's sidebar back to
where they can query the table (but see below for a plausible
alternative).
Since I cannot see major common use cases in this problem's vicinity,
DaCHS has no built-in cores for editing things. Instead, write a
Element pythonCore. This starts with declaring the input
and output structure, where I am requiring both inputs, and I set a
manual widgetFactory to give remarks a nice text input box:
<pythonCore>
<inputTable>
<inputKey original="objlist.id" required="True"/>
<inputKey original="objlist.remarks" required="True"
widgetFactory="widgetFactory(TextArea, rows=5, cols=40)"/>
</inputTable>
<outputTable original="objlist" primary=""/>
Don't question too much what that widgetFactory thing is. It is
ancient and venerable, and has been in need of fixing for 15 years.
Just take it as a scheme for producing text boxes rather than
single-line input fields as it stands. Ahem.
You might consider giving the input key for the id a widgetFactory
of Hidden; that would not produce a user-editable widget, which in
this scenario might be preferable; but then I almost always prefer to
assume I am dealing with sensible people, and for them, editing the id
might one day be useful, and they would otherwise leave it alone.
The output table as defined here is just the object list again; the plan
here is to return the edited line. Alternatives are discussed below.
The action of python cores is defined in an Element
coreProc, which is a regular procedure application (just as the
Element apply you may know from rowmakers). It could look
like this:
<coreProc>
<code>
with base.getWritableAdminConn() as conn:
if 1!=conn.execute("UPDATE \schema.objlist"
" SET remarks=%(remarks)s"
" WHERE id=%(id)s", inputTable.args):
raise base.ValidationError("No row for id '{}'".format(
inputTable.args["id"]), colName="id")
return rsc.TableForDef(
self.outputTable,
rows=list(conn.queryToDicts("SELECT * FROM \schema.objlist"
" WHERE id=%(id)s", inputTable.args)))
</code>
</coreProc>
So, we basically rely on DaCHS' built-in input validation and just turn
the items from inputTable.args – which is sufficiently
dictionary-like for our database interface – into an update query. I am
using a writeble admin connection here, as normal DaCHS tables (the ones
originating from a plain dachs imp) are non-writable by table
connections, and the normal table connections cannot write at all (cf.
Database Queries). You might consider making such
editable tables writable by the normal web user, too, but for now I have
no plans to make admin connections somehow inaccessible to the web
server any more, so that is probably not terribly useful.
conn.execute returns the number of rows that were touched by an
operation. I am making sure that is one here, and if that is not the
case, I am raising a ValidationError with colName="id" (there cannot
be more than one row touched because id is a primary key). Giving the
colName lets DaCHS mark the location of the problem in its browser
interface – try it.
If everything has gone well, I am building an output table out of the
modified row. This is what DaCHS displays in response of a successful
request.
While that behaviour makes sense – it lets people verify that their
edits did what they expected they would –, it is unlikely that users
will like it very much. It is more likely that they would like to get
back to the original table display. To effect that, make DaCHS redirect
there, perhaps with a restriction to the edited id:
raise svcs.Found("/\rdId/view?"+urllib.parse.urlencode({
"id": inputTable.args["id"],
"__nevow_form__": "genForm"}))
To make this work, you have to acquiant your python core with the
gavo.svcs module that defines the Found exception (this is just an
API to produce a 302 Found HTTP status code); the most succinct way to
do that is to add:
<setup imports="gavo.svcs"/>
to the coreProc's content.
It somewhat cryptic "__nevow_form__": "genForm" is a weird
implementation detail. Without it, DaCHS will give the user a form
filled out with the id. With it, it will actually execute the query.
Looking Further
I give you the proposed interaction feels somewhat clunky in today's
world of animated widgets, whether or not it makes a lot of sense. I'd
expect most scientists will put up with it eventually, though.
But you could do in-place editing by inserting a solid helping of
Javascript into a defaultresponse template (cf.
templating.html). This would have a text box open on some
user interaction, and once things are typed in retrieve the URL of the
service call we produce in the formatter using Javascript's fetch.
In such a scenario, it is certainly simpler if the service just returns,
say, YES or NO depending on whether the update has succeeded.
You would do that by returning a pair of media type and payload:
return "text/plain", "YES"
If you made whole rows editable, your should probably return the entire
row, too, presumably in JSON. To do that, you could write something
like:
return "application/json", json.dumps(
next(conn.queryToDicts("SELECT * FROM \schema.objlist"
" WHERE id=%(id)s", inputTable.args)))