=================== GAVO DaCHS Tutorial =================== :Author: Markus Demleitner :Email: gavo@ari.uni-heidelberg.de :Date: |date| :Copyright: Waived under `CC-0`_ .. contents:: :depth: 3 :backlinks: entry :class: toc .. Headline sequence = - ' . Introduction ============ DaCHS is an integrated suite for publishing astronomical data to the `Virtual Observatory`_, with support for services operated through web browsers, too. If you are not sure what the Virtual Observatory is: Technically, it is a set of APIs and a Registry in which the URIs for services compliant with these APIs are associated with a bunch of relatively formalised metadata. What that actually means is probably best illustrated by trying some of the tutorials involving TOPCAT_ or Aladin_ from VOTT_. This tutorial is part of the DaCHS documentation, in particular the `Installation Guide`_ and the `Reference Documentation`_. The goal is that after skimming the tutorial DaCHS operators have a good idea of what to do when and to provide recipes for the most common problems. It will also guide operators in setting up a site well integrated into the larger VO. In order to have some basic understanding of what all this is about, our advice is to linearly work through `DaCHS Basics`_ in order to have an idea of DaCHS' basic concepts (yes, it's a bit more than 20 pages, but we'd like to believe that grokking that is a half hour well spent) and then try the `quick start with DaCHS`_. Depending on the data you intend to publish, then skim the corresponding section from `Publishing Data Using VO Protocols`_ and build a service, presumably using ``dachs start``. Once the service is ready to go public, look at `Configuring DaCHS`_ and finally `Interacting with the VO Registry`_; they tell you how to become a good citizen of the VO. For continued operation, and if you have inherited a running DaCHS installation, read `Server Operations`_. The concluding chapter on `Topics in DaCHS Publishing`_ collects more detailed explanations, recipes, and other material helping in common publishing situations. We believe it is worth skimming through this part in order to have an idea what is there and return to it for closer study when necessary. All this assumes that you have a basic understanding of how to run a simple Debian server (or some other distribution, which then you should understand a bit better). If this is your first server, you probably should skim (at least) sections 5 through 10 of the `Debian Administrator's Handbook`_. It is not important that you understand and memorise everything that is written there, but you should get an idea of where to look up what and how, roughly, things fit together. .. _Debian Administrator's Handbook: https://www.debian.org/doc/manuals/debian-handbook/ Installation ------------ Unless you have truly overriding reasons to try something else, to try out and learn DaCHS, get a machine running Debian stable and type:: apt install gavodachs2-server This will give you the DaCHS software that was current when the Debian release was frozen, which ought to be just fine for “normal“ publishing activity. If you need a bleeding-edge DaCHS (or run into a fixed bug), read up on `manually add our repository`_. DaCHS can be installed and run in many other ways, as detailed in the `Installation Guide`_. But since DaCHS involves communication with a database server and the network, it is certainly a good idea to start out with a simple, one-server setup on a platform that many others use and that therefore has a rather low likelihood of odd surprises. Notation -------- **DaCHS** is the name of the software package (it's an acronym for Data Center Helper Suite and happens to mean “badger” in German). The unix command is called ``dachs``. The default paths, the python package, and the default database are called ``gavo`` (though you could configure it differently if you wanted) – that's the name of the project within which DaCHS was born, and changing these names to dachs afterwards would have been an challenge for backwards compatiblity. Since it didn't seem a big deal to us, we didn't do it. Still, apologies for the confusion. In the following, we call people running DaCHS **operators** (so, if you're reading this, that's you), people consuming your services **users**, programs communicating with your services **clients**. Something like “:dachsref:`Element table`” is a link into the reference documentation, whereas “:dachsdoc:`install.html`” references other pieces of DaCHS documentation. Specifications like ``[web]serverURL`` refer to configuration items in ``/etc/gavo.rc`` (or its alternatives; see `Basic Gavo.rc Settings`_ and :dachsref:`Configuration Reference` for more on this). DaCHS Basics ============ Invoking DaCHS -------------- All DaCHS functionality is invoked through a program called ``dachs``. Multiple functions are integrated and selected through the first argument; run ``dachs help`` (or, for perhaps less cryptic output, ``man dachs``) to see what's available. Realistically, the functions most operators will be confronted with are ``start``, ``import``, ``serve``, ``publish``, ``test``, and perhaps ``admin``. DaCHS will be happy with unique prefixes into these command names; for instance, we tend to just type ``dachs adm xsd`` rather than ``dachs admin xsdValidate``. DaCHS has some global options (that go in front of the subcommand name), which mostly are useful for `Debugging`_. Most subcommands take options and/or arguments, which then have to be after the subcommand name. If you've ever used git: Same deal, except we're aiming for less cryptic man pages. For a brief overview what the individual functions are, use the built-in help, as in:: $ dachs imp --help Usage: dachs imp [options] {} imports all (or just the selected) data from an RD into the database. Options: -h, --help show this help message and exit -n, --updateRows Use UPDATE on primary key rather than INSERT with rows [and so on] or look at the man page. DaCHS on Disk ------------- In the default configuration, almost everything DaCHS is concerned with is below its **root directory**, which by default is ``/var/gavo``. There are two exceptions to this rule: * Configuration in ``/etc/gavo.rc`` (or, in a pinch, ``~/.gavorc``) – see `Basic gavo.rc Settings`_ for details; and also on how to have these in other locations. * anything DaCHS has put into the database. The **inputs directory**, defaulting to ``/var/gavo/inputs``, is where most of a DaCHS operator's work is done. It is where the services are configured, where metadata and tabular data is fed from into the database, and where bulk data is served from. Three of the other directories in there are relevant for the casual user: * ``/var/gavo/etc`` (the **config directory**) contains various extra configuration like some `Basic Metadata`_, the database access profiles (which is nontrivial in DaCHS because it runs queries as differently privileged users), and `The Userconfig RD`_. * ``/var/gavo/logs`` contains access, error, and info logs. In particular, dcErrors and dcInfos deserve a look whenever DaCHS spits out errors; in particular when you called ``dachs`` with the ``--debug`` global flag, it will dump tracebacks there that are not visible in the main output. * ``/var/gavo/web`` lets you override many of the browser-facing things of DaCHS, and there is ``nv_static``, in which you can dump files that then become visible at ``/static``. Templates are in a ``templates`` child, and one template you will probably want to override is `The Root Page`_ of your data centre. The Resource Descriptor ----------------------- When publishing data, what you do in DaCHS is write a **resource descriptor** (RD). That's a piece of XML (the serious ones at the GAVO data center are between 100 and 2000 lines) containing almost all you and DaCHS need to know about a service and its underlying data. DaCHS tries to avoid spreading information on a single resource over multiple files in order to facilitate understanding what is going on later or after a handover. There are exceptions to the one-stop-definition principle (e.g., C-language boosters, custom grammars and pages, etc.), but for publishing of well-behaved data operators should not need those: Essentially, a service is the **upstream data** (i.e., what you got from the scientist or project) data plus the RD. Put another way, the RD is the formalisation of what you can learn about the data from your upstream *about* the data. This section will introduce an example RD for a simple service (we will use it in the `Quick Start with DaCHS`_) publishing a simple astrometric catalogue. To follow the exposition below, have a look at the XML by fetching http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/arihip/q.rd (use HTTP, not HTTPS, as the latter requires authentication) and opening the resulting file q.rd in a text editor. The upstream data for this is available at http://dc.g-vo.org/arihip/q/cone/static. This file comes from the collection of RDs active at GAVO's Heidelberg Data Centre (https://dc.g-vo.org) which you can inspect at http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs – in particular, you're welcome to copy as much as you want. To locate examples for concrete elements, meta items, and such, have a look at :dachsdoc:`elemref.html` for these. While the XML elements within an RD could come in any sequence convenient (except that most references need to be lexically after the elements they reference), we recommend to adhere to first declare global metadata, then define the tables and the ingestion rules, continuing with the service definitions and concluding with the regression tests. The following exposition follows this recommendation. Global Metadata ''''''''''''''' The root element of an RD always is ``resource``, and you should immediately define the database schema anything you import will end up in:: .. sidebar:: Attributes and Elements in DaCHS DaCHS almost never distinguishes between attributes and elements with atomic content, which means that you could also have written the opening tag of the root element as:: arihip This notational freedom sometimes allows clearer statements, and it helps with defining `active tags`_. Multiple specifications of the same property make up multiple values where the property is sequence-like (in the `reference documentation`_ this is indicated by phrases like “zero or more“ or “list of“ in the description). For atomic properties, later specifications overwrite earlier ones. RDs are normal XML files (meaning that you could, e.g., add an XML declaration if you want an encoding other than utf-8). DaCHS assumes the schema name to be identical to the **resource directory**, which again must be a child of the inputs directory. If you have a strong reason to deviate from that convention, you must give the subdirectory name in the resource element's ``resdir`` attribute – either way, this is then used to build absolute paths within the RD, e.g., for the sources element discussed below. From DaCHS 2.6 on, it is recommended to always set ``resdir="."`` explicitly. This way, assuming the RD is a direct child of the resdir (and you should always build your resources this way), the RDs will still work even if the directory is mored. This is an advantage in the context of mirroring resources. In general, you should have exactly one RD per database schema. This is not enforced, but sharing schema names between RDs will cause some undesirable behaviour. An example is permissions: When importing a table, the schema access rights are adapted. If you have one RD A defining an ADQL-queriable table in schema X and another RD B that has no ADQL-queriable table, importing A will make schema X readable to untrusted queries, whereas importing B will make it unreadable again; this would lead to query failures. The RD goes on to give metadata applying to everything within the RD:: ARIHIP astrometric catalogue 2010-11-03T10:13:00 The catalogue ARIHIP has been constructed by selecting the 'best data' for a given star from combinations of HIPPARCOS data with Boss' GC and/or the Tycho-2 catalogue as well as the FK6. It provides 'best data' for 90 842 stars with a typical mean error of 0.89 mas/year (about a factor of 1.3 better than Hipparcos for this sample of stars). Wielen, R.; Schwan, H.; Dettbarn, C.; et al Catalogs Astrometry Stars: Proper Motions Catalog Optical 0/0-11 2.721e-19 4.138e-19 1989-09-01 1993-08-15 The ARIHIP Catalogue is a suitable combination of the results of the HIPPARCOS astrometry satellite with ground-based data. [...] 2001VeARI..40....1W This metadata is crucial for later registration of the service, some of it turns up in service responses, and quite a bit is used in making web pages, in the metadata sidebars and elsewhere. Metadata elements have a ``name`` attribute that gives the “kind” of metadata contained, and sometimes also determine a specific type. The names should usually give a good indication of what information is given. We will later revisit individual items and what should and should not be put there. If there is more than one meta element for a name (like for subject in the example), the key is **multi-valued**. For quite a few meta keys, DaCHS will at some point raise errors you if you have multiple values where you can't have them, as for instance, with titles, descriptions, or creation dates. Because DaCHS itself tries to treat all metadata as equally as possible, the failures will not happen during metadata definition but when generating things like registry records, which sometimes makes it a bit hard to figure out the error's origin actually lies in the metadata definition. The ``dachs val`` subcommand should catch such problems, though (and if it doesn't, please report that as a bug). Metadata can come in various **formats** as determined by the ``format`` attribute. If you give nothing there (which makes it ``plain``), DaCHS will apply some whitespace normalisation, and it will interpret empty lines as paragraphs if the target format supports it. With ``format="rst"`` (as used in ``_longdoc`` and ``_intro`` in the example), the content will be interpreted as reStructuredText_, which lets you do rather fancy things when the metadata is being rendered into HTML. Be careful to use consistent indendation in this case. There are also ``raw`` and ``literal`` as formats; don't worry about them now. Metadata with names starting with underscores – like ``_intro`` in the example – is for DaCHS-internal use, very typically when building informational pages; where there's no underscore, there is generally some external standard, and in all likelihood the information will end up in the VO Registry in some way. Many of those we discuss in :dachsdoc:`data_checklist.html`. This checklist was actually written to hand out to people wanting to publish in the Heidelberg deata center, but you are welcome to re-use it. In the above RD snippet, there are two elements that are not simple meta elements. One is ``coverage``; this gives the extent of the data contained in the service in space, time, and spectrum. You will usually only define this manually if you know things the computer can't easily find out itself. This is typically “data consists of several well-defined campaigns” (so you have clear time intervals) or “data is taken in a few of rather small bands” (so you have clear spectral intervals). The other non-meta element is ``FEED``. In DaCHS uppercase elements signify `active tags`_, elements that are replaced by something else (or nothing) in the document tree. ``FEED``, as used here, simply inserts a stream (essentially, a sequence of elements) defined elsewhere and puts it in. The reference manual has list of :dachsref:`predefined streams` with an indication of what is inserted. Here, this is used to insert a few more formalised meta items declaring the licence under which your data can be used and distributed. Please be explicit on your licences, see `Licensing`_ below for why we ask that (and why not). The last major concept you should now in connection with DaCHS metadata is **metadata inheritance**. For instance, if you define a service within the RD, and there is no title meta within the service, DaCHS will use the title meta on the RD. The root of the metadata inheritance hierarchy is the file ``etc/defaultmeta.txt``. If you defined a meta item ``title`` there, this would be the default title for everything in your data centre (which is something you probably don't want). This default metadata, however, is handy for items that plausibly are essentially constant across a data centre, like a publisher or a contact. By the inheritance rules you can still override it in individual RDs or services if necessary. Defining Tables ''''''''''''''' A major part of the metadata DaCHS deals with is the table structure. It is defined in table elements, which usually are direct children of the resoource element. A resource element may contain multiple table definitions. Skip over the ``macDef`` elements for now to where the ``table`` element begins. What you see is something like:: The ``id`` attribute of the table doubles as the name of the database table; make sure you use something that works as a valid simple SQL identifier (i.e., ``[A-Za-z_][A-Za-z0-9_]*``) – DaCHS does not support for delimited identifiers as table names (but over standard SQL it will let you have leading underscores). Be sure to always specify ``onDisk="True"`` unless you're going for special effects – without it, the table will end up only in memory and be gone after the import. The ``adql`` attribute says that TAP queries should be allowed on the table; leave it out for tables not suitable for “raw” consumption by your clients. For the ``mixin`` attribute is one of the deeper concepts in DaCHS. We defer a closer look at that to the `Mixin Introduction`_. Finally, using the ``primary`` attribute you can specify an explicit **primary key** of the table (if it is made up of several columns, concatenate their names with commas). This is made into a primary key for postgres straightforwardly, which means that the database makes sure there are no two rows with the same value for the primary key. Also, the database creates an index for efficient queries using the primary key. While this table doesn't have metadata (because this is a one-table resource, and the global resource metadata is the one of the table itself), table elements may override RD metadata, and in multi-table RDs it's usually a good idea to override title and description in the table element itself. What follows is a definition of the structure of the space-time coordinates:: Position ICRS Epoch J2000.0 "raj2000" "dej2000" Error "err_ra" "err_de" Velocity "pmra" "pmde" Error "err_pmra" "err_pmde" Position ICRS Epoch J2000.0 "raHIP" "deHIP" Error "err_raHIP" "err_deHIP" Velocity "pmraHIP" "pmdeHIP" Error "err_pmraHIP" "err_pmdeHIP" Position ICRS Epoch J2000.0 "raSTP" "deSTP" Error "err_raSTP" "err_deSTP" Velocity "pmraSTP" "pmdeSTP" Error "err_pmraSTP" "err_pmdeSTP" Position ICRS Epoch J2000.0 "raLTP" "deLTP" Error "err_raLTP" "err_deLTP" Velocity "pmraLTP" "pmdeLTP" Error "err_pmraLTP" "err_pmdeLTP" What this says is that there's a number of coordinate structures in the table, grouping together positions, errors, and proper motions in a specific reference frame. See `Space-Time Metadata`_ for more information. Column definitions '''''''''''''''''' The RD continues with the definitions of the columns that make up the table:: For every column in a table, there is one column element with a host of attributes. The ``name`` attribute is central in that it will be the column name in the database, the key for the column's value in record dictionaries that the software uses internally, and it is usually used to reference the column from the outside (see `Referencing in DaCHS`_). Column names must be legal identifiers for both python and SQL in DaCHS. SQL delimited identifiers thus are not allowed (this is not the whole truth, but it's true enough, and you're saving yourself a lot of headache if you simply believe it). The type attribute defaults to real, and can otherwise take values in valid SQL datatypes. Among the more common types are: * ``text`` – a string. You could also use types with explicit length like char(7), but this is discouraged. It does not help postgres (or much anything else within the DC). The only advantage would be if you later want to handle FITS Binary serialisations, because when all strings and arrays in a record have fixed lengths, libraries can directly seek to records. Also note that text-valued columns can only keep ASCII. Use the ``unicode`` type if you need non-ASCII. * ``real`` – a floating point number, pretty much like ``float`` in C. * ``double precision`` – a floating point number. You should use such doubles if you need to keep more than about 7 digits of mantissa. * ``integer`` – typically a 32-bit integer * ``bigint`` – typically a 64-bit integer * ``smallint`` – typically a 16-bit integer * ``timestamp`` – a combination of date and time. While postgres can process a very large range of dates, the DC stores timestamps in datetime.datetime objects, which means that for “astronomical” times (like 10000 B.C. or 10000 A.D. you may need to use custom representations. Also, the DC assumes all times to be without time zones. Further time metadata (like distinguishing TT from UT) is given through STC specifications. See `Some Words on Times`_ below for why you probably do not want timestamps. * ``date`` – a date. See timestamp. * ``time`` – a time. See timestamp * ``spoint``, ``scircle``, ``sbox``, ``spoly`` – objects of spherical geometry, taken from pgSphere. See `Creating pgSphere Geometries`_ below on how to ingest them. * ``jsonb``, ``json`` – structured data in a column. If you think this solves one of your problems, you are most probably wrong (in the long run). Still, if you want to use these, feed them with lists, dicts, or JSON literals in strings. Some more types (like ``raw`` and ``file``) are available to tables in service definitions, but they should, in general, not appear in database tables. Futher metadata on columns includes: * ``unit`` – the unit the column values are in. The syntax is that of VOUnits_. Unit is left out for unitless values. Because it's hard to rigorously define, DaCHS does not distinguish between dimensionless (as, for instance, width/height, which in principle has a unit that just happens to come out as 1), things that can't have units to begin with (e.g., an observer name), and things that should have units but are written in a way that they can't (e.g., sexagesimal coordinates or ISO datetime strings). In case you must have units not in VOUnits (e.g., mCrab or jupiterMass), enclose them in single quotes as forseen by VOUnits. DaCHS will then not complain about them. * ``tablehead`` – a very short string designating the content. This string is typically used for display purposes, e.g., as table headings or labels on input fields and defaults to the capitalized column name. * ``description`` – a longer string characterizing the content. This may end up in help bubbles or VOTable descriptions. Since these could be longer, you may want to put them in a child element rather than an attribute. In both cases, whitespace is normalized, so you can enter line breaks and similar for readability in the source, and they will always be rendered as a single blank. For even longer, note-like material, see `Table Notes`_. An example for a descripton:: The aperture is the full-width-half-max of the response function of our sage 3000 hyper-detector. * ``ucd`` – a Unified Content Descriptor as defined by IVOA. To figure out “good” UCDs, the `GAVO UCD resolver`_ can help. An easy way to come up with them is also to leave them out initially and then run ``dachs admin suggestucds`` (it's what we do these days). * ``required`` – True if value must be set in order for the record to be valid. By default, NULL (which in python is None) is a valid value for any column and will silently be inserted if you don't assign a value in the rowmaker (see below). For required columns, an error will be raised when a value is missing. In HTTP service interfaces, missing required parameters will lead to a 400 invalid parameters HTTP response. * ``verbLevel`` – A measure for the “importance” of the column. Various protocols have the notion of “verbosity”, where higher verbosity means you get to see more columns with more esoteric content. Within DaCHS, verbLevel is a number between (usefully) 1 and 30, with columns with verbLevel 1 always given and those with verbLevel 30 only given if someone really wants to see all columns. Technically, in SCS, a column is part of the output table if its verbLevel is smaller or equal to ten times the query's VERB parameter. Column elements may have a child element `values <./ref.html#element-values>`_. This lets you specify metadata like maximum or minimum, or enumerate possible values. The most common use is the definition of null literals, though. This is not necessary for floats, and usually not even strings, because these have useful (and actually non-overridable) null values in the VOTable representation (where this sort of thing counts most). It is, however, highly recommended to give null literals when defining integral types (including chars) in columns that may have NULLs. DaCHS will try to pick useful null values for those automatically when possible, but when streaming tables, this is impossible, and errors will be raised during VOTable rendering when NULLs are encountered in such a situation. So, just define null values whenever you define a non-required integral column, like this:: The output of ``dachs info`` can help you to choose suitable NULL values. To help people spot them when metadata is missing, it's usually wise to choose “conspicuous” null values (like -1, 9999, or similar). A child element of ``table`` we have not mentioned above is this:: This is the simplest, but mostly sufficient, form of telling DaCHS to index a column, where the columns attribute just lists the names of the columns to index – and most of the time, that will be just one. More complicated index patterns are possible – see :dachsref:`element index` in the reference. Indices are usually generated after a full import. If you add or change indices later, run ``dachs imp -I``, which will re-make all indices. For large tables, you may want to just create one particular index; in that case, see the ``indexStatements`` subcommand to ``dachs admin``. Scrolling a bit further down in the arihip RD, you will notice some LOOP constructs. These are discussed below under `active tags`_. After the column definitions, there are meta elements called “note”; for those, see `Table Notes`_. Parsing Input Data '''''''''''''''''' Going further down in the RD, you will find a data element. Their main purpose is to describe how certain input files fill the table(s) defined above. It starts like this:: data/data.txt.gz Data elements must have ids which can be used to individually reference them from a ``dachs imp`` command line; this is useful if you just want to import one part of a multi-table data collection. The default of ``dachs imp`` default is to build all ``data`` elements except those having an ``auto="False"`` attribute. It is recommended that the id is a short verb phrase, as ``data`` basically contains instructions for an action. You might rightly argue that we have not chosen the element name too aptly (``recipe`` might have been more appropriate), but we feel it's too late to change it now. Source Specifications ..................... The :dachsref:`Element sources` lets you specify the names of the input files to be processed. There are several ways to do that; in this case, there's just one input file, which is given as element content, with the path interpreted relative to the resource directory. If the data was distributed into several files in two directories, something like the following specification would do the trick:: inp2/*.txt inp1/*.txt The ``sources`` element also has a recurse (boolean) attribute that makes DaCHS search for the pattern in the subdirectories of the path part of the pattern if true. Grammars ........ In the RD, it next says:: hipno: 3-8 srcSel: 47-49 alphaHMS: 59-73 [...] This is the definition of a **grammar**, which in DaCHS is something that turns some (specific) sort of input into a sequence of dictionaries. Stuff coming out of a grammar we sometimes call **rawdict**, with a bit of a pun, because after some processing in the rowmaker (discussed below) they will turn into **rowdicts**. While a rawdict almost always (exceptions are for higher-level source formats like FITS) maps strings to strings, the rowdicts map strings (column names) to something directly ingestable into the database (e.g., integers, or perhaps polygons). There are many grammars built into DaCHS, e.g., for getting values from FITS headers or tables, from VOTables, via regular expressions, or using column-based formats; you can also write specialised grammars in python, and there are direct grammars that you will want to learn about if you have data larger than, say 10e7 rows (see `Grammars in C`_ for these). It is often useful to inspect what a grammar emits. You can do that using import's ``--dumpRows`` flag, usually telling DaCHS to stop before you drown in screen output, perhaps like this:: dachs imp -M 100 --dumpRows q.rd | less The source file of arihip has both a separator character and aligned columns, so we could use either :dachsref:`Element columnGrammar` (which, as shown above, has ranges of character positions with a line as shown by most editors to define what will be in the rawdicts' values) or :dachsref:`Element reGrammar` (which we could have used to split along vertical bars). The choice was mainly a matter of taste, but experience also shows that data providers are not always careful to somehow escape their separator characters when the appear in values, so using a column grammar is perhaps the somewhat more defensive choice. By the way, note that in ranges, the last column is included in the string – these are no python slices but basically a representation of the character ranges in VizieR-style “byte-by-byte” descriptions. Grammars also have various attributes; the ones parsing from text files support, for example, ``topIgnoredLines``, which allows you to skip header lines, and ``preFilter`` that lets you run the input through a shell command before it is processed using DaCHS (if you find yourself doing a lot more than just decompression in such a preFilter, you should probably look for a different solution). Somewhat further on in the grammar, you will find a LOOP element with lots of backslashes. It is not necessary to understand that to write good RDs. But it is handy and helps avoiding repetition, which is why we are showing it in the example RD, and we have a chapter devoted to it later in in the introduction: `Metaprogramming: Macros and LOOPs`_. Mapping data ............ The arihip RD then goes on with:: hmsToDeg(@alphaHMS, None) dmsToDeg(@deltaDMS, None) ... parseWithNull(@kbin, str, "9") parseWithNull( @vrad, lambda a:float(killBlanks(a)), "") The :dachsref:`Element make` brings together a table (in the ``table`` attribute) with a recipe for how to fill it from the output of the grammar (the **row maker**). As explained above, the output of grammars and hence the input to a make is a sequence of mappings from names to strings (the “rawdicts”). The database, on the other hand, wants typed values, i.e., integers, time stamps, etc. Also, data in input tables is frequently given in inconvenient formats (e.g., sexagesimal angles), deprecated or inconsistent units, or values may be distributed over multiple columns (e.g., date and time of an observation when we want a single timestamp). All that can be fixed in row makers, recipes for producing rowdicts. Rowmakers are where it's rather common to have embedded python code in RDs; however, in simple cases you can get by with purely declarative XML, which, of course, is preferable, as it is much less likely we will break that while DaCHS, python, and DaCHS' dependencies evolve. A row maker has three kinds of children: * :dachsref:`Element var` – assignments of expression values names in the rawdict. * :dachsref:`Element map` – simple mappings of (python) expressions to values in the destination rowdict * procedure applications (:dachsref:`Element apply`) – manipulations of both rawdicts and rowdicts in python code The fragment above shows one of several ways to use both ``var`` and ``map`` (which work exactly the same way, except that var generates additional values in the rawdict, whereas map directly manipulates the rowdict): with simple mappings between names, using pre-defined procedure applications, or generating values from python expressions. In the latter case, there is the special syntax ``@identifier``, which expands to whatever value the rawdict has for that key (or raises a KeyError if the key is not present in the rawdict). When building a rowdict for ingestion into the database, a row maker first binds var names, then applies procedures and finally performs the mappings. In the bodies of the mappings, you can use all built-in python functions plus a set of useful :dachsref:`functions available for row makers`, as well as everything from the python standard library modules ``datetime``, ``math``, ``os``, ``re``, ``sys``, ``time``, and ``urllib.parse`` (you need to give the module name when referring to names from these modules as in, e.g., ``re.sub``). Furthermore, the gavo modules ``base``, ``stc``, and ``utils`` are in the namespace of the mapping code, as well as the submodule ``utils.pgsphere`` as ``pgsphere``. These latter may not be quite as stable, so it's probably better if you ask for having something in the official rowmaker functions than pull them from these internal-ish modules. Having said that, you can figure out what is in there in DaCHS' :dachsdoc:`apidoc`. For simple cases, maps will suffice; frequently, you can do without python expressions by giving a ``src`` attribute specifying a rawdict key instead of element content (as said above, less code is better in RDs). This will perform some default conversion (e.g., integers will be converted by python's int constructor, where empty strings are mapped to None, datetimes are parsed as ISO strings, etc) If you match the keys in the rawdicts with the names of the database columns their content is supposed to end up with and the content needs no further manipulations, a row maker like:: would to the trick. Since this is a bit unwieldy, DaCHS provides a shortcut:: which expands to exactly what is written above. The keys in each pair do not need to be identical; the first item of each pair is the table column name, the second the rawdict key. The case where the names of rawdict and rowdict keys are identical is so common (since the RD author often controls both) that there is an even more compact shortcut for this:: Idmaps sets up one map element each with both dest and src set to the value for every name in the comma separated list idmaps. You can abbreviate this further to:: – so, idmaps values can contain shell patterns. They will be matched to the column names in the target table. For every column for which there is no explicit mapping, an identity mapping (with type conversion) will be set up with this specification. In the authors' experience, building the rowdict in the rawdict (i.e., using ``var``) and then mapping it all with ``idmaps="*"`` has proven to be the preferable approach in many situations. This pattern is used, for instance, in the built-in mapping procedures for SSAP. On the other hand, the built-in mapping procedures for SIAP were written before we had this insight, and thus they are, regrettably, incompatible with this approach. Of course, you can have values that do not even depend on grammar output:: datetime.datetime.now() **Null values** are always troublesome. Within DaCHS, the null value (almost) always is python's ``None``. There is the row maker function ``parseWithNull`` to help you come up with those; if your upstream was devious enough to use 99.99 as a null value for a magnitude, you could say:: parseWithNull(@VmagSrc, float, "99.99") Note that the null value here is a literal matched against the string coming from the grammar, i.e., before conversion to the target type. This is good because otherwise you could only safely compare against relatively few floating point numbers (99.99 is not among them). In cases where you can compare against the converted value, it is preferable to use the ``nullExpr`` attribute on vars and maps. A complementary way to produce NULLs (in particular when ``parseWithNull`` is used in expressions) the ``nullExcs`` attribute, which is just a comma separated list of exceptions that should be caught and interpreted as “this is null”. If, in the example above, the source would give the magnitude in millimags to save a decimal point, you could use:: parseWithNull(@VmagSrc, float, "99999")/1000. If ``parseWithNull`` here returns None, a ``TypeError`` will be raised and caught, and ``Vmag`` will be None. You can turn more than one exception into None. For example, if ``magicOffset`` has been parsed before and could be None, while ``magicLit`` is to be parsed and has the empty string as a NULL literal, you could write:: @magicOffset+float(@magicLit) If ``magicOffset`` is None, magic will be None via the ``TypeError``, whereas empty ``magicLits`` will result in Nones via a ``ValueError``. Rowmaker procedures are in :dachsref:`Element apply`. There are a number of predefined ones (see :dachsref:`Procedures available for rowmaker/parmaker apply` in the reference documentation). Quite commonly, mixins for tables underlying VO standard protocols come with one or more applys to populate the tables in a controlled fashion. More on these when we discuss the protocols served by these mixins. You can also define your own applys, but that's for higher-level DaCHS magicians and won't be discussed here. Service Definitions ''''''''''''''''''' The last but one part of the RD deals with how to get the data out of the database again, i.e., the services exposing the data. This part is fairly simple for arihip:: arihip cone 9.4076 9.6414 1.0 To understand what's going on here, some basic understanding of DaCHS' service architecture is required; it consists of: * cores; they do the actual computation or database query. * renderers; these digest the bits coming in from the client and (in general) format the result in some way requested by the client again. There are renderers for web forms and several VO protocols, ones producing HTML documentation for RDs, and so on. In the ideal case – as in the example –, you can use the same core for both a VO protocol and a form-based service by just allowing different renderers. * services; they hold together the core and the renderer, can reformat core results, and they are what is registered, which means that they hold the metadata that ends up in the registry (most of it is typically inherited from their RDs). The renderers are referenced by name in the service's ``allowed`` attribute. What can be given there (concatenated by commas) is listed in the reference documentation's section on :dachsref:`Renderers Available`. Which renderer is used when a service is run is selected by the client: it's (usually) the last segment of the path part of the access URL. If a client tries to retrieve a URL with a renderer that is not in the service's ``allowed`` list, DaCHS will respond with a 403 forbidden HTTP code (excepting certain “unchecked” renderers like ``info`` that typically expose service metadata). Note that some of the more exotic renderers may pose special requirements on cores. Don't be too disappointed of a given renderer won't work with your core. The most common core for catalogue services (and the one you'll typically use for SCS services) is the :dachsref:`Element scsCore`, as used here; you might also use a generic :dachsref:`Element dbCore`. Other cores you should know about include the :dachsref:`Element nullCore` (for services that don't do any server-side computation at all), :dachsref:`Element datalinkCore` (which can define all sorts of manipulations on and linking of datasets), the python and custom cores (cf. :dachsref:`writing-custom-cores`), and several protocol-specific cores discussed below. The dbCore generates a (single-table) query from condition descriptors and returns a table that you describe through an output table. Cores are defined as direct children of the resource (as with grammars and rowmakers, you can also have them in ``resource`` and then write ``core="id-of-element"``, which makes sense when a single core is shared by several services). dbCores need a ``queriedTable`` attribute, the value of which must be the id of a ``table`` element. This is the table the query will run against. The condition descriptors within dbCores (defined in :dachsref:`Element condDesc`) define, in essence, input fields: in the form renderer, these will be rendered as form items people can fill in. Most commonly, you will either define them using the ``original`` attribute (when inheriting from predefined condDescs) or using ``buildFrom``. The first case is typically used in connection with protocols and on tables having mixins; such condDescs result in zero or more input fields, and they typically inspect the queried table. For example, the ``//scs#humanScs`` condDesc (which is part of the :dachsref:`//scs#coreDescs` stream) locates the “main” positions as identified by UCDs and generates queries against them using two input fields. One somehow specifies center position of the region of interest, the other gives the search radius. When you define a ``condDesc`` using ``buildFrom``, the result is usually one or more input field(s) constraining values in the column named in the ``buildFrom`` attribute. The software tries to make some useful input definition from that column, depending on the current renderer's parameter style (which is given in :dachsref:`renderers available`). For instance, if you build a condDesc from a real- or double precision-typed column, and the service is accessed with a form-styled renderer, users can constrain the column's values with Vizier-like float expressions, where, for instance, an interval would be written as ``range_min .. range_max``. When the request comes in through renderers with a parameter style of ``pql`` (e.g., SSAP), the condDesc would expect SSAP-style intervals (e.g., ``range_min/range_max``). In the parameter style ``dali``, that interval would be written ``range_min range_max``. Yes, it is deplorable that something like the parameter style exists, but that particular uglyness is not DaCHS' fault; in practice, it's a lot less horrifying than it sounds, because clients very typically just use one renderer, and they hopefully agree with it on the parameter syntax. The ``service`` element must have an ``id`` attribute that is used to select the service run in the access URL's last but one segment. Furthermore, there should be certain pieces of metadata useful for the Registry. First, there's ``shortName``, which is typically used by clients in space-restricted displays. It must not be longer than 16 characters, so something like an acronym and a very terse role identifier is the best you can do (hence the “arihip cone” here). Frequently, a ``title`` meta is also useful, in particular when an RD contains multiple services. Many standard VO protocols require additional, protocol-specific metadata. In the case of arihip, we have a Simple Cone Search service, which, as laid down in the reference for the :dachsref:`the scs.xml renderer`, requires the parameters of a test query returning a non-empty result. Regression Tests '''''''''''''''' While it's admittedly boring, writing regression tests while developing your RD will save you a lot of fear, uncertainty, and doubt later. In the arihip RD, there are two regression tests:: cone/form self.assertHasStrings("PM RA (STP)", "1.0890086361", "-5.00", "Note bin") cone/scs.xml row = self.getFirstVOTableRow() self.assertEqual(row["flags"], 's 00 .. .... ... ...') self.assertAlmostEqual(row["err_parallaxSI"], 5.25e-07) self.assertEqual(row["hipno"], '8') As you can see, the tests are grouped into suites, where for most RDs you will just have one suite. Each test has a title, which is used in diagnostics. The body of a test is a pair of an :dachsref:`element url` – which, really, is just a convenient way to write a URL with potentially complex parameters – and of an :dachsref:`element code`, which mostly gives a bunch of assertions about what a running server should be returning for the query encoded by ``url``. We have tried to write the reference documentation's section on :dachsref:`regression testing` in a tutorial-like fashion and therefore point you there for further details. Referencing in DaCHS -------------------- You will often have to reference items within DaCHS RDs, for instance in command line arguments, in ``original`` attributes, when pulling in STREAMs, or when deriving from procedures. All these follow (essentially) the same logic: (a) An RD has an identifier that is the path to the RD file relative to inputsDir (corollary: DaCHS doesn't let you have active RDs outside of inputsDir), with the ``.rd`` extension cut off. (b) When referencing an RD from the command line, you can normally use relative references. For instance, if you are in ``inputs/myres``, saying ``dachs imp q`` and ``dachs imp myres/q`` is the same thing (exceptions apply). (c) When referencing elements within RDs, use an XML-like fragment identifier. For instance, ``original="main"`` would use the element with the id ``main`` in the current RD. (d) With tables or other items that have children with names, you can reference them by their names after a dot. For instance, ``myres/q#main.ra`` references the ``ra`` column in the ``main`` table in ``myres/q``. If referencing within an RD, you can leave the RD id out; in the example, within ``myres/q`` you can write ``main.ra`` as well (and you can abbreviate that even more when the parent has a ``namePath`` attribute; this is what's behind the compact ``buildFrom`` notations in output tables). (e) If an RD id starts with a double slash (``//``), it refers to a built-in RD; ``//tap`` is the built-in TAP RD. To inspect these built-in RD, use a command like ``dachs adm dumpDF //tap``. For instance, the TAP service in this way has the id ``//tap#run``. Mixin Introduction ------------------ We have so far deferred the discussion of the ``mixin`` attribute in arihip's opening table element::
**Mixins** are DaCHS' primary tool to endow tables with “everything needed to serve a standard” (e.g., a minimal set of columns, certain indices, or metadata). For instance, an image table must have a certain structure determined by the SIA protocol. The ``//siap#pgs`` mixin makes sure that tables have this structure, and it makes sure that `The Products Table`_ is updated when the table is filled. The content of the mixin element (or the attribute value when you give the mixin property as an attribute) is a reference to a mixin definition. If you have read `Referencing in DaCHS`_, you will see that the mixin here sits in a built-in RD (because it starts with a double slash). That is where you will usually get them from, and what you can choose from you can figure out from the reference documentation's :dachsref:`mixins` section. For the curious: If you want to see the definition of a mixin (it's in an RD, after all), use admin's ``dumpDF`` subcommand like this:: dachs admin dumpDF //scs :dachsref:`The //scs#pgs-pos-index mixin` referenced in arihip/q arranges for spatial indexing of tables having some sort of spherical coordinates. To identify which columns to index, this particular mixin inspects the UCDs of the columns, which happens to almost be enough to make the table suitable for publication through the IVOA's simple cone search protocol (SCS). What it looks for here are columns with UCDs of ``pos.eq.(ra|dec);meta.main``, and it will index them. Contrary to mixins for other standard protocols, it does not automatically insert these columns (and neither the only other required column in SCS, the main row identifier with the UCD ``meta.id;meta.main``). Metaprogramming: Macros and LOOPs --------------------------------- In the paragraph on `Grammars`_, we skipped over something that admittedly is a bit cryptic (you can skip this subsection if you get scared; you can use all of DaCHS without knowing about it):: \racols [...] \errracols [...] As with the ``colDefs`` discussed above, this part still assigns column ranges in the input file to keys in the rawdict. Column grammars accept these in the relatively compact form in an :dachsref:`Element colDefs` as seen above. Alternatively, you can have several of :dachsref:`Element col`, each of which has a ``key`` attribute that gives the key in the rawdict and the range of columns in the body. These are useful here because they let us use another piece of RD metaprogramming: The active :dachsref:`Element LOOP`. Essentially, this lets you define some sort of fillers, and for each set of fillers, whatever is in the LOOP's ``events`` child will end up in the document tree once with the appropriate fillers. In this particular case, there were three modes of data reduction in arihip, and these modes turn up in several places in the RD; you may have wondered what a similar thing in the column definition was. To make this enumeration re-usable, I've defined it in a macro near the top of the RD:: mname, racols, decols, pmracols, pmdecols, tracols, errracol... LTP, 224-233, 276-285, 328-335, 383-390, 438-444, 478-483 ... STP, 237-246, 289-298, 339-346, 394-401, 448-454, 487-492 ... HIP, 250-259, 302-311, 350-357, 405-412, 458-464, 496-501 ... These macros can now be used in many places, in particular in LOOP's ``csvItems`` attribute, where LOOP will make one set of fillers from each CSV line. And these fillers then become macros in the LOOP's event body. In that way, the example above spits out something like:: 224-233237-246250-259 from the first ``col`` in ``events`` and the first and second column in the CSV. The second element I show in the example above has the somewhat weird ``\mname\+_mas`` macro call; the ``\+`` here just makes up for the lack of whitespace after ``\mname``. If it read ``\mname_mas``, DaCHS would look for a filler called ``mname_mas``, which doesn't exist. With that cleared, the expansions of that second ``col`` example would be:: 478-483487-492496-501 The good news: This is essentially how far RD wizardry goes (well, you could argue multiple expansion is worse, but I'd dispute that). Once you write your first RD with LOOPs and macros, you have ascended into the ranks of DaCHS grand mastery. Most operators lead happy lives without such aspirations. The others can go on to a gentler introduction to `active tags`_ below. Quick start with DaCHS ====================== This brief section will guide you through the process of importing data into DaCHS, running an associated service, and publishing it, using the arihip RD discussed in `The Resource Descriptor`_ above. It is probably a good idea to have skimmed the `DaCHS Basics`_ before experimenting here. Preparing the Resource Directory -------------------------------- In general, when you start publishing data in DaCHS, you will do something like:: cd /var/gavo/inputs mkdir cd mkdir data (put your upstream data into the data subdirectory) dachs start For the sake of getting something running quickly, we will skip the ``dachs start`` (which necessarily is followed by a lot of work on mapping the data) and just use an existing RD. You can run the following under any user id, as long as you wisely manage the permissions (short version: ``sudo adduser `id -nu` gavo`` to add you to the gavo group, then log out and in again to make it effective). For testing, however, we recommend doing it as the DaCHS' administrative account (for the Debian package, that's ``dachsroot``; if you did the setup manually, it's whatever user did the ``dachs init``). If you want to stay yourself, see `becoming a DaCHS operator`_. To get a quick start, just pull in the RD in use by GAVO's data centre:: sudo -u dachsroot bash cd /var/gavo/inputs mkdir arihip cd arihip curl -O http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/arihip/q.rd The directory name (“arihip” in this case) will (normally) appear in URLs, so it's a good idea to choose something descriptive, short, and without characters that need URL escaping. The directory you have just created is the resource directory introduced in the overview. The resource descriptor name also appears in URLs. At GAVO's data center, we usually call it ``q.rd`` as that looks nicely query-ish (to our tastes). Next, get the raw data. In particular because sources assumes that much, the raw data should reside in a subdirectory of the resource directory, and unless you have a good reason to break with that convention (e.g., separating several data releases would be such a good reason), just call it ``data``. At the GAVO DC we usually keep everything in the resource directory under version control except for that data directory (which tends to be large, full of binary files, and either versioned by upstream or not at all):: mkdir data cd data curl -O http://dc.g-vo.org/arihip/q/cone/static/data.txt.gz Starting from Scratch --------------------- (You can skip this when you are just following the quick start). When you do not have the luxury of having a ready-made RD (i.e., almost always), use ``dachs start``. Again, begin by creating your resource directory, change into it and build a q.rd template by running ``dachs start ``. For instance, when you have a catalogue called smartdata, you would write:: cd /var/gavo/inputs mkdir smartdata cd smartdata dachs start scs Use ``dachs start list`` to see what templates there are (scs, siap, and ssap are the most likely candidates for VO beginners). When your data does not fit anything offered by ``dachs start``, you probably want to send mail to `dachs-support`_ describing what you are trying to do; perhaps we want to extend ``dachs start``. Also, have a look at the `GAVO data centre's service roster`_ if there's anything similar there and then see if starting from the corresponding RD in our `RD repository`_ looks like it will save you work. Either way, you will end up with an RD in your resource directory, and that would be the perfect time to check it into your version control system. Don't go on without one. After bringing in your data, the actual work begins: Filling in the global metadata, the table structure, the ingestion rules, service metadata, and regression tests. Where possible, the templates are written such that when you have filled out everything ``%between percent signs%``, you should have a very basic (possibly useless) service. See also the section in `Publishing Data Using VO Protocols` corresponding to your data type (if it exists) for more material that may help you figure out what is going on in the template and what additional declarations you may want to put in. The envisioned workflow is that you go through the file percent pattern by percent pattern, ideally with support from your editor. As an example, if you're using the vim editor, you can drop the following into your ~/.vimrc:: augroup rd au! au BufRead,BufNewFile *.rd imap /%[^%]*%a au BufRead,BufNewFile *.rd imap cf% augroup END Then, when you open an RD file, you can, in insert mode, alternately hit F8 to go to the next thing to fill in, and F9 to replace the template text (that usually gives you a hint on what to put there – but also read the comments). Another hint on working with the templates: Since the patterns are not valid values for what they stand in for in some cases, you may want to comment out what you haven't yet filled out (e.g., while doing a test import). Since there are comments in the material, that's not straightforward, but you can fairly simply comment a large part of RD by inserting ```` at its end. Of course, this needs to respect XML and RD rules, so the resulting ``macDef`` element must be a direct child of ``resource`` (which mostly is natural). If you have skimmed the `DaCHS Basics`_ above, not much of what you'll be seeing should surprise you. If you're wondering about a particular thing, you can always refer to the :dachsdoc:`elemref.html` to see how it's used in our data centre, or go to the `reference documentation`_. Also, please give us hints if you see a way to make the material more helpful. If your input data already comes with metadata (FITS binary table, VOTable, or perhaps you have a VizieR-style byte-by-byte description), you may want to look into using ``dachs gencol``; this emits column definitions that can be pasted into a table element (in the case of VizieR, also material for :dachsref:`Element columnGrammar`). Don't be content with what the machine gives you: in all cases we have seen so far, the metadata extracted from data was sub-standard. Also note that the the gencol output for FITS binary tables assumes you use fitsTableGrammar with ``lowerKeys="True"``. .. _GAVO data centre's service roster: http://dc.g-vo.org .. _RD repository: http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/ Running the Ingestion, Getting Stats ------------------------------------ Back with arihip, we're ready for ingestion, which is covered by the ``import`` subcommand (cf. `Invoking DaCHS`_). So, return to the resource directory and run the default import:: cd .. dachs imp q This should run for a while, reporting the number of ingested rows now and then, and finally say something like “Rows affected: XY”. With this, the data is in the database and is ready for querying. After you have imported a table, it is a good idea to run ``dachs info`` with the DaCHS identifier of the freshly imported table, e.g.,:: dachs info arihip/q#main If this command's argument looks confusing to you, please review `Referencing in DaCHS`_. This will output several properties (right now, minimum, maximum, average) of numeric columns, which may help spot problems in the data or the row maker (or both). Also note that for each column, the presence of NULLs is given. When you import data, it is a good idea to check whether these correspond to your expectations, and to consider declaring columns as required when they do not indeed contain NULLs. Once everything looks right, tell DaCHS to inspect your table and collect column statistics, space-time coverage, and similar metadata on it\ [#belim]_:: dachs limits q Note: if installing from Debian 11 Bullseye, the current dachs version is 2.3 and ``dachs limits q`` will not work. See the error ``module 'gavo.utils' has no attribute 'pyparsingWhitechars'`` here https://docs.g-vo.org/DaCHS/commonproblems.html for a solution. For large tables, this can run for a long time; to make good for it, DaCHS samples increasingly less of the tables as they get larger (which might still take hours). To override DaCHS' wisdom here, both ``limits`` and ``info`` take a ``--sample-percent`` option, a number between 0 and 100. DaCHS will only do statistics on views if you pass the table id to limits or give the view's descriptor a ``forceStats`` property. In some cases, you may want to skip statistics gathering on certain columns. If so, you can set the relevant columns' ``statistics`` property to ``no``. To inspect the limits DaCHS has figured out, say:: dachs limits -d q Trying the services ------------------- Now start the server; if you installed from the Debian package, it is already running; stop it first for this tutorial:: sudo service dachs stop # only if installed from package dachs serve debug If the last command fails with permission problems, add yourself to the gavo group, say ``newgrp gavo`` (or log out and in again) and try again [#port]_. The RD sets up a form-based service you can operate from a web browser; open the URL http://localhost:8080/arihip/q/cone/form [#binding]_ and play around a bit. Note the small links behind some query fields – DaCHS supports VizieR-like expressions in those fields. Briefly have a look at the URL. A promised in `Service Definitions`_, apart from the host name and port (see the `Configuring DaCHS`_ on how to change those), there is the path to the RD (without the file extension), then the id of the service element, and finally the renderer name. In this case, that's ``form`` for an HTML form. This systematics makes for longish URLs, which is inconvenient when you would like to include them on paper publications. To get around that, DaCHS supports **vanity names**, short aliases that let you absorb all these extra segments. In the present case, you could, for instance, add:: arihip/q/cone/form ARIHIP to ``/var/gavo/etc/vanitynames.txt`` and then reach your form as http://localhost:8080/ARIHIP. For details, see :dachsref:`The Vanity Map` in the reference documentation. By the way, a convenient backdoor to find such links without looking up the individual parts is the RD browser. To enter that, just append your RD id to the ``browse`` child of your server; in this case, that would be http://localhost:8080/browse/arihip/q. This will give you a convenient overview over the tables and services defined by the RD. Another renderer supported by this service is ``scs.xml``, which implements the IVOA Simple Cone Search (SCS) protocol. A client speaking SCS is TOPCAT_; to try it out, in TOPCAT select VO/Cone Search and fill out the Cone URL field in the lower part of the window to be ``http://localhost:8080/arihip/q/cone/scs.xml``. Enter some object name and a sufficiently large search radius (e.g., Aldebaran and 2 degrees), and you'll see the results coming in. In case you're missing the mv parameter you had in the form interface: cone search does not (yet) have a usable interface to discover custom parameters, and hence TOPCAT restricts you to those mandatory for every SCS service. DaCHS supports a special syntax for such “free” parameters of cone searches (see the discussion of parameter styles in `Service Definitions`_; SCS is pql in DaCHS). To say ”everything brighter than 6th magnitude”, the parameter setting would be ``-Inf 6``; to use this constraint within TOPCAT, the access URL needs to be amended like this: ``http://localhost:8080/arihip/q/cone/scs.xml?mv=-Inf%206``. Finally, the RD opens the arihip table for the IVOA Table Access Protocol TAP, which lets you write queries in a dialect of SQL. Again, TOPCAT has a nice client for TAP built in. To try it, select VO/TAP Query, enter ``http://localhost:8080/tap`` in the TAP URL field near the bottom of the window, and hit “Enter Query”. In the resulting dialog, you can browse the table's metadata and then enter queries like:: SELECT * from arihip.main where sqrt(pmde*pmde+pmra*pmra)>2/3600. For more information on what fancy things you can do here, see `GAVO's ADQL short course`_. This is also the moment to write regression tests (yeah, it would be cool write the tests before writing tables and services, but frankly, that's something you'll only pull off if you're really familiar with both the data and the protocols; don't feel bad about writing the tests after the fact, but do feel bad if you don't write them at all) and then run:: dachs test -v q Publishing a Service -------------------- Once you are satisified the service is in presentable shape, you should publish it. In DaCHS, publication is an extra action. :dachsref:`The publish element`, if you will, earmarks a thing for publication. To actually perform it, you have to run:: dachs pub q This will enter all the published services and tables in the RD q.rd into the table of published resources. For ``local`` publications, this means they will appear on the default front page. However, DaCHS caches the front page, and to see changes, you will have to invalidate the cache. If you have set `the admin password`_, and it fits what the ``serverURL`` says, ``dachs pub`` will do that itself. If in doubt, just restart the DaCHS server. Note again that before running dachs pub, it is a good idea\ [#belim]_ to run ``dachs limits q``, which will create (or update) all kinds of statistal estimates which are being communicated to the registry and are important for discovery (e.g., coverage and some statistical measures for the columns). As of 2021, even some DaCHS-internal hacks working around deficiencies in the database server's query planner in connection with common spherical indexing schemes rely on these statistics. Also, your publication will be more useful if you have endowed your resource with `STC coverage`_. For ``ivo_published`` services, publication means that their metadata will be served whenever DaCHS is harvested by VO registries using a protocol called `OAI-PMH`_. DaCHS' implementation of that protocol is done such that XSLT-aware user agents (which, at least in 2020, still means: all major web browsers) can be used to (somewhat clumsily) operate it. In practice, you can point your browser to the ``oai.xml`` child of your server root – by default http://localhost:8080/oai.xml –, and you will see an error message (because you've violated the protocol in this way), but you will also see a link to a list of links to the metadata of all the resources you publish. Follow this and look around; inspecting what DaCHS tells the Registry sometimes lets you spot metadata errors. Note that for your services to show up in common VO clients, you will have to do a bit more (once). See `Interacting with the VO Registry`_. By the way, once you are in the Registry, there are two kinds of things that may become publicly visible without a ``dachs pub`` because their typical discovery bypasses the Registry: (a) TAP tables. They will be entered into TAP_SCHEMA immediately, and TAP clients will find it there at once. Also, many clients still use `GloTS `_, which, once it has re-harvested your server, will make your TAP table globally discoverable. If you don't want this, write ``adql="Hidden"`` in the table head and change it to ``adql="True"`` when you publish your data. (b) Obscore datasets. When you put products into the obscore table, they will immediately appear there, and, for instance, all-VO obscore searches will find them. The only way to avoid that at this point is to comment out obscore mixins pre-publication and then comment them in when you publish. The recommended way to deal with both problems is to have a development or test server that is non-public (and perhaps even has a global access restriction) and only take the data and RDs to the production server when you go for publication. Publishing Data Using VO Protocols ================================== This chapter gives somewhat more in-depth information on what to do to create services compliant to the various DAL protocols by data product type. While ``dachs start`` should give you a fair chance of getting a service running without reading any of this, it is still a good idea to read the section for the sort of data you want to publish before setting out. **DAL** is VO-speak for “Data Access Layer”, the standard protocols the VO defines for data discovery and access. To support such a protocol, you usually need to arrange things in three places: * The table queried needs a certain set of columns * The core must support certain input and output fields * The renderer must exhibit specified behaviour as regards, e.g., the formatting of error messages, and it may require protocol-specific metadata Publishing Catalogues via SCS (Cone Search) ------------------------------------------- SCS, the simple cone search, is the simplest IVOA DAL protocol – it is just HTTP with RA, DEC, and SR parameters, a slightly constrained VOTable response, plus a special way to encode errors (in a way somewhat different from what has been specified for later DAL protocols). The service discussed in the `DaCHS Basics`_ is a combined SCS/form service. This section just briefly recapitulates what was discussed there. For a quick start, just follow `Quick start with DaCHS`_. SCS Tables '''''''''' SCS can expose any table that has exactly one column each with the UCDs ``pos.eq.ra;meta.main``, ``pos.eq.dec;meta.main``, and ``meta.id;meta.main``, where the coordinates must be real or double precision, and the id must be either some integral type or text; the standard requires the id to be text, but the renderer will automatically convert integral types. The main query is then ran against the position specified in this way. You almost always want to have a spatial index on these columns. To do that, use :dachsref:`the //scs#pgs-pos-index mixin` on the tables, like this::
... The “pgs” in pgs-pos-index refers to pgSphere, a postgres database extension for spherical geometry. You may see RDs around that use :dachsref:`the //scs#q3cindex mixin` instead here. It does the same thing (dramatically speed up spatial queries) but uses a different scheme. It's faster and takes up less space, but it's also less general, which is why we are trying to phase it out. Only use it when you are sure you cannot afford the (reasonable, i.e., mostly within a factor of two) cost of pgSphere. Note that to have a valid SCS service, you must make sure the output table always contains the three required columns (as defined by the UCDs) discussed above. To ensure that, these columns' ``verbLevel`` attribute must be 10 or less (we advise to have it at 1). SCS Cores ''''''''' SCS could work with a dbCore, but friendly cone search services include a field with the distance between the object found and the position passed in; this is added by the special :dachsref:`element scsCore`. You (in effect) must include the some pre-defined condDescs that make up the SCS protocol, like this:: This will provide the RA, DEC, and SR parameters for most renderers. The form renderer, however will show a nice input box that lets humans enter object names or, if they cannot live without them, sexagesimal positions (if you are curious: this works by setting the ``onlyForRenderer`` and ``notForRenderer`` attributes on :dachsref:`Element inputKey`). In addition, :dachsref:`//scs#coreDescs` gives you a paramter MAXREC to limit or raise the number of matches returned. This parameter is not required by SCS, but it is useful if people with sufficient technical skills (they'll need those because common SCS clients don't support MAXREC yet) want to raise or lower DaCHS' default match limit (which is configured in ``[ivoa]dalDefaultLimit`` and can be raised up to ``[ivoa]dalHardLimit``). SCS allows more query parameters; you can usually use condDesc's ``buildFrom`` attribute to directly make one from an input column. If you want to add a larger number of them, you might want to use `active tags`_:: Note that most current SCS clients are not good at discovering such additional parameters, since for SCS this requires going through the Registry. In TOPCAT, for example, users would have to manually edit the cone search URL. Also note that SCS does not really define the syntax of these parameters, which is relevant because most of the time they will be float-valued, and hence you will generally need to use intervals as constraints. The interval syntax used by the SCS renderer is DALI, so a bounded interval would be ``22.3 30e5``, and you'd build half-bounded intervals with IEEE infinity literals, like ``-Inf -1``. Of course, when accessed through a form, the usual VizieR parameter syntax applies. SCS Services '''''''''''' To expose that core through a service, just allow the scs.xml renderer on it. With the extra human-oriented positional constraint and mainly builtFrom condDescs, you can usually have a web-based form interface for free:: Nice Catalogue Cone Search NC Cone 10 10 0.01 The meta information given is used when generating registry records; the idea is that a query with the ra, dec, and sr you give actually returns some data. Publishing Images via SIAP -------------------------- *SIAPv2 as described here is only available in DaCHS 2.7.3 and later. If you want to create new image services, please make sure you update to that.* In the VO, there are currently two versions of the Simple Image Access Protocol SIAP. While DaCHS still supports SIAP version 1, you should only use it for new services if you know exactly what you are doing. Hence, this tutorial (and, actually, the reference documentation, too) only talks about SIAP2. To generate a template RD for an image collection published through SIAP, run:: dachs start siap See `Starting from Scratch`_ for a discussion on how to fill out this template. While you can shoehorn DaCHS to pull the necessary information from many different types of images, anything but FITS files with halfway sane WCS headers is going to be fiddly – and of course, FITS+modern WCS is about the only thing that will work nicely on all relevant clients. If you have to have images of a different sort, it is probably a good idea to inquire on the `dachs-support`_ mailing list before spending a major effort on local development. Quick Sample Image Service '''''''''''''''''''''''''' Check out a sample resource directory:: cd `dachs config inputsDir` svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/emi cd emi mkdir data Now fetch some files to populate the data directory so you have something to import:: cd data curl -FLANG=ADQL -FFORMAT=votable/td \ -FQUERY="SELECT TOP 5 access_url FROM emi.main WHERE weighting='uniform'" \ http://dc.g-vo.org/tap/sync \ | tr '<>' '\n' \ | grep getproduct \ | xargs -n1 curl -sO (no, this is no way to operate TAP services; use a proper client for real work, and we didn't show you this). This RD also publishes to obscore, so make sure you have the obscore table:: dachs imp //obscore If you do not plan to publish via obscore yourself (which is reasonably unlikely) and you try this on a box that the Registry will see later (you shouldn't), be sure to ``dachs drop`` obscore again when done. Run the import:: cd .. dachs imp q Now start the server as necessary (see above), and start TOPCAT and Aladin_. In TOPCAT, open VO/SIA Query, enter your new service's access URL (it's ``http://localhost:8080/emi/q/s/siap.xml`` unless you did something cunning and should know better yourself) under “SIA URL” pretty far down in the dialog. Then have “Lockman Hole” as Object Name and resolve it, or manually enter 161.25 and 58.0 as RA and Dec, respectively, and have 2 as Angular Size. Send off the request. You'll get back a table that you can send to Aladin (Interop/Send to/Aladin), which will respond by presenting a load dialog. Doubleclick and load as you like. Yes, the images look a bit like static noise. That's all right here – but do combine these images with, say, DSS colored optical imagery and marvel at the wonders of modern VLBI interferometry. Indicentally, we made the detour through TOPCAT since there's no nice UI to query non-registred SIAP services in Aladin. Defining SIAP Tables '''''''''''''''''''' SIAP-capable tables should mix in ``//siap2#pgs``. This mixin provides all the columns necessary for valid SIAP responses, and it will prepare the table so that spatial queries (which are the most common) will use a pg_sphere index. So, in the simplest case, a table published through a SIAP service would be declared like this::
This only has the minimal SIAP metadata. The net result of this is that your new ``images`` table will have all the mandatory columns of Obscore_. You can add your own, custom columns as usual. :dachsref:`The //siap#pgs mixin` also takes care that anything added to the table also ends up in `the products table`_. This means that the grammar filling the table needs a :dachsref:`//products#define` rowfilter. Filling SIAP Tables ''''''''''''''''''' When filling SIAP tables, you will almost always use: * :dachsref:`Element fitsProdGrammar` as the grammar; see also `fitsProdGrammars`_ further down. * the :dachsref:`//siap2#computePGS` apply to interpret the FITS WCS to fill the spatial columns in the table. * the :dachsref:`//siap2#setMeta` apply to fill the non-spatial SIAP metadata. In practice, this might look like this (taken from the emi/q RD from the quick start):: *.fits 80 OBJECT OBSDEC OBSRA "emi.main" with open(rd.getAbsPath("res/namemap.csv")) as f: nameMap = dict(csv.reader(f)) @target_name = nameMap[@object] 0.207 0.228 dateTimeToMJD(datetime.datetime(2010, 7, 4)) dateTimeToMJD(datetime.datetime(2010, 7, 4)) "VLBA 1.4 GHz "+@target_name 3 "VLBA LH sources" "phot.flux.density;em:radio.750-1500MHz;phys.polarisation.Stokes.I" "/I/" 1 @target_name 5000000 "agn" \inputRelativePath.split("_")[-1][:-5] This does, step by step: * The ``sources`` element is as always – with image collections, the ``recurse`` attribute often comes in particularly handy. * When ingesting images, you will very typically read from FITS primary headers. That is what :dachsref:`element fitsProdGrammar` does unless told otherwise: Its rawdicts simply are the (astropy.io.fits) headers turned into plain python dictionaries. * The ``qnd`` attribute of the grammar is recommended as long as you get away with it. It makes some (weak) assumptions to yield significant speedups, but it limits you to the primary header. You cannot use ``qnd`` with compressed FITS images. Also, note the ``hdusField`` attribute when you have more complex FITSes to process. * The ``fitsProdGrammar`` will map keys with hyphens to names with underscores, which allows for smoother action with them in rowmakers. The ``mapKeys`` element can produce additional mappings; in this case, we abuse it a bit to let us have idmaps (rather than simplemaps) in the rowmaker. And, actually, to illustrate the feature, as this data does not need key mapping, really. * Since we are defining a table serving data products here, the grammar needs the :dachsref:`//products#define` rowfilter discussed in `the products table`_. * We have mentioned the :dachsref:`//siap2#computePGS` apply above; as long as astropy can deal with your WCS, it's all automatic (though you may want to pass parameters when you have cubes with suboptimal WCS or want to keep products without WCS in your table). And if you don't have proper WCS: see above on checking with dachs-supoort. * The second apply you want when feeding SIAP tables is :dachsref:`//siap2#setMeta`. This has a lot of parameters, because depending on what you are publishing – SIAP2 explicitly was designed to push out cubes, too – you may want to give metadata like the length of a wavelength axis. ``dachs start siap`` puts in default mappings for most of them; feel free to delete them if they don't make sense for your kind of data. * There are two non-obscore parameters in setMeta. For one, instead of ``em_min`` and ``em_max``, you can set bandpassId and have the :dachsref:`//siap#getBandFromFilter` look them up. Whether a band is known you can find out by running ``dachs admin dumpDF data/filters.txt`` – and we are grateful for further contributions to that file. * You can also set ``dateObs`` to the central time of the observation and give ``t_exposure`` a sensible value in seconds. DaCHS will then compute ``t_min`` and ``t_max`` for you, assuming a single, contiguous observation. * Typically, many values you find in the FITS headers will be messy and fouled up. You'll spend some quality time fixing these values in the typical case. Here, we translate somewhat broken object names using a simple mapping file that was provided by the author. In many other situations, the :dachsref:`//procs#mapValue` or :dachsref:`//procs#dictMap` applys let you do fixes with less code. * As is usual in DaCHS procdures, you can access the embedding RD as ``rd``. In our object name fixer, we use that to let DaCHS find the input file independently of where the program was started. Somewhat regrettably, //siap2#setMeta cannot be used with ``idmaps="*"``; as settings from setMeta would be overwritten then. That's a compromise for backwards compatibility. Cores ''''' For SIAP1, there was the :dachsref:`Element siapCutoutCore` that returned cutouts rather than full images. Nothing of this sort is possible for SIAP2, because spatial constraints are optional for the latter. If you really need that functionality, you'll either have to go back to SIAP1 or (probably preferably, although client support isn't all that great yet) return datalinks rather than full images (see `SIAP and datalink`_). That said, the core to use for SIAP2 is the :dachsref:`Element dbCore`. Add the condDescs necessary for SIAP manually (and add any custom condDescs as you see fit). See the next section for an example. Service ''''''' The service uses the siap2.xml renderer, for which you need some additional metadata for VO registration as described in :dachsref:`the siap2.xml renderer`. With this, a service definition would look like this (again taken from emi/q):: VLBI-Lockman Pointed 163.3 57.8 0.1 0.1 You probably do not want to point real users at the from rendering of this service. If you want to have image services working in the browser, write an extra service; the output of ``dachs start siap`` has a starting point for that. SIAP and Obscore '''''''''''''''' The SIAP2 metadata schema is exactly the one of Obscore. Hence, to publish a SIAP table to Obscore, all you need to say is:: //obscore#publishObscoreLike in your table element. Note that for tables of non-trivial size, you really should have a spatial index on s_ra and s_dec (our advice: q3c) and on s_region (that will have to be a pg_sphere one) and otherwise index at least obs_publisher_did, em_min, em_max, t_min, and t_max. SIAP and Datalink ''''''''''''''''' If you have larger images or cubes and serve them through SIAP, consider offering datalinks or perhaps even have `Datalinks as Products`_. The latter case is particularly attractive if your images are so large that people just clicking on some row in Aladin_ might not expect a download of that size (in 2020, I'd set that limit at, perhaps, 100 MB). In both cases, people can select and download only parts of the image or request scaled versions of it. Defining a datalink service for normal FITS images is not hard. In the simplest case, you just give a bit of metadata, use the :dachsref:`//soda#fits_genDesc` descriptor generator (you don't need to understand exactly what that is; if you are curious: :dachsref:`Datalink and SODA` has the full story) and FEED :dachsref:`//soda#fits_standardDLFuncs`. Done. The following example, taken from :samplerd:`lswscans/res/positions`, adds a fixed link to a scaled version, which might work a bit smoother with unsophisticated Datalink clients, using a meta maker:: HDAP Datalink This service lets you access cutouts from HDAP plates and retrieve scaled versions. lswscans yield descriptor.makeLink( makeProductLink(descriptor.accref+"?scale=4"), contentType="image/fits", description="FITS, scaled by 1/4", contentLength=descriptor.estimateSize()/16.) See :dachsref:`Meta Makers` for more information on what is going on inside the meta maker. The remaining material is either stereotypical or pure metadata: title and description are as for any other serivce, and the ``accrefPrefix`` should in general reflect your resource directory name. DaCHS' datalink machinery will reject any publisher DID asking for an accref not starting with that string. The idea here is to avoid applying code to datasets it is not written for. When you attach the datalink functionality (rather than having datalink links as access URLs), declare your table as having datalink support; both SIAP and TAP will pick that up and add the necessary declarations so datalink-aware clients will know they can run datalink queries against your service:: dlpub_did This block needs to sit in the table element. The serviceId meta contains the id of the datalink service. If you also produce HTML forms and tables, see `datalinks in columns`_. Publishing Spectra or Time Series via SSAP ------------------------------------------ Publishing spectra is harder than publishing catalogues or images; for one, the Simple Spectral Access Protocol comes with a large bunch of metadata, quite a bit of which regrettably repeats VOResource. And there is no common format for spectra, just a few contradicting loose conventions. That is why ``dachs start`` produces a template that contains an embedded datalink service. This lets you push out halfway rational VOTables that most interesting clients can reliably deal with, while still giving access to whatever upstream data you got. In the past, we have tried to cope with the large and often constant metadata set of SSAP using various mixins that have a certain part of the metadata in PARAMs (which is ok by the standard). These were, specifically, the mixins ``//ssap#hcd`` and ``//ssap#mixc``. *Do not use them any more* in new data and ignore any references to them in the documentation. The modern way to deal with SSAP – both for spectra and for time series – is to use :dachsref:`the //ssap#view mixin`. In essence, this is a relatively shallow way to map your own metadata to SSA metadata using a SQL view. This is also what the ``dachs start`` template does. Quick Sample SSAP Service ''''''''''''''''''''''''' Check out the feros resource directory into your inputs directory:: cd `dachs config inputsDir` svn co http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/feros cd feros mkdir data As recommended, the checkout does not contain actual data, so let's fetch a file:: cd data curl -O http://dc.g-vo.org/getproduct/feros/data/f04031.fits cd .. This RD also publishes to obscore, so make sure you have the obscore table:: dachs imp //obscore If you do not plan to publish via obscore yourself (which is reasonably unlikely) and you try this on a box that the Registry will see later (you shouldn't), be sure to ``dachs drop`` obscore again when done. Run the input and the regression tests:: dachs imp q dachs test q One regression test should fail since you've not yet pre-generated the previews (which are optional but recommended for your datasets, too):: python3 bin/makepreviews.py dachs test q If the regressions tests don't pass now, please complain to the authors of this tutorial. From here on, you can point your favourite spectral client (fallback: TOPCAT's SSA client; note that TOPCAT itself cannot open this service's native format and you'll have to go through datalink, which TOPCAT knows how to do since about version 4.6) to http://localhost:8080/feros/q/ssa/ssap.xml and do your queries (if you don't know anything else, do a positional query for 149.734, 2.28216). Please drop the dataset again when you're done playing with it:: dachs drop feros/q Since the dataset is in the obscore table, it would otherwise be globally discoverable, and that'd be bad. SSAP Tables ''''''''''' Contrary to what DaCHS does with the relatively small SCS, SIAP, and SLAP models, due to the size of the SSAP model, spectral services are always based on a database view on top of a table the structure of which is controlled by you; you're saving quite a bit of work if you keep your own table's columns as close to their SSA form as possible, though. Another special situation is that most spectra are delivered in fairly crazy formats, which means that it's usually a good idea to have a datalink service that serves halfway standard files – in DaCHS, these comply to the VO's spectral data model, which is a VOTable with a bit of standard metadata. It's certainly not beautiful or very sensible, but it sure beats the IRAF-style 1D images people still routinely push around. So, to start a spectral service, use the ``ssap+datalink`` template:: $ mkdir myspectra; cd myspectra $ dachs start ssap+datalink This will result in the usual ``q.rd`` file as per `starting from scratch`_; see there for how to efficiently edit this and for explanations on the common metadata items. SSAP-specific material starts at the meta definitions for ``ssap.dataSource`` and ``ssap.creationType``. These are actually used in service discovery, so you should be careful to select the right words. Realistically, using ``survey``/``archival`` for observational data and ``theory``/``archival`` for theoretical spectra should be the right thing most of the time. Next, you define the ``raw_data`` table; this should contain all metadata “unpredictably” varying between datasets (or, for large data collections, anything that needs to be indexed for good performance). For instance, for observational data, the observed location is going to change from row to row. The start and the end of the spectrum is probably going to be fixed for a given instrument, and so if you have a homogeneous data collection you probably will not have columns for them and rather provide constant values when defining the view. To conveniently define the table, it is recommended to pull the SSA columns for ``raw_data`` by name from DaCHS' SSAP definitions and use SSAP conventions (e.g., units). The generated RD is set up for this by giving ``namePath="//ssap#instance"``, which essentially means “if someone requests an element by id, first look in the ``instance`` table of the RD”. This is then used in the following ``LOOP`` (cf. `Active Tags`_). As generated, this will look somewhat like:: – this will pull in the enumerated columns as if you had defined them literally in the table. Depending on the nature of your data, you may want to pull in more columns if they vary for your datasets (or throw out ones you don't need, as ``ssa_dateObs`` for theoretical data). To see what is available, refer to the reference documentation of the :dachsref:`the //ssap#view mixin`. Any parameter that starts with ``ssa_`` can be pulled in as a column. The template RD then mixes in ``//products#table`` (which you pretty certainly want; see `The Products Table`_ for an explanation), ``//ssap#plainlocation`` (which at this point you must have to make a valid SSA service) and ``//ssap#simpleCoverage`` (which you want if you want to publish your observational spectra through obscore). The template then defines:: This again is mainly useful for obscore as long as DaCHS' ADQL engine may turn queries into q3c statements; just leave it if you have positions, and remove it if you don't. You can, of course, define further columns here, both for later use in the view and for local management. SSAP lets you return arbitrary local columns, and in particular for theory services, you will have to (to define the physics of your model). As a DaCHS convention, please don't use the ``ssa_`` prefix on your custom columns. See :samplerd:`theossa/q` for an example of a table with many extra columns. The SSA template then goes on with a data item filling the ``raw_data`` table. The template assumes you're parsing from IRAF-style 1D images. You will have to use a different grammar if that is not what you have, and in that case you in particular you cannot use the ``specAx`` var defined in the rowmaker. The data item has ``make_view`` quite early on; this simply makes sure that the SSA view will be regenerated after you import the table itself. The rowfilter in the grammar is fairly complex here because we will completely hide the original; if you simply want to serve your upstream format, just cut it down to just giving ``table``, ``mime``, ``preview`` and ``preview_mime``. If you do that, use the following strings in ``mime``: * ``image/fits`` for IRAF-style 1D image spectra * ``application/fits`` for spectra in FITS tables * ``application/x-votable+xml`` for spectra in VOTables Please do not put anything else into SSA tables, because you will most certainly overstrain most SSA clients; if you have a different upstream format and you want to make it available, turn it into SDM VOTables and use datalink to link to the original source. Hence, for most cases (including also ASCII spectra), here's what we recommend as the product definition rowfilter (it's roughly what's in the template) to isolate your clients from the odd upstream formats:: "\\schema.main" \\fullDLURL{"sdl"} %typical size of an SDM VOTable% "\\rdId#sdl" "application/x-votable+xml" \\standardPreviewPath "image/png" This is pointing the path accessed to a datalink URL using the ``fullDLURL`` macro, which expands to a URL retrieving the full dataset; the “sdl” argument to the macro references the datalink service defined further down. Since the data returned is generated on the fly, you will have to give an estimate of how large the VOTable will be (overriding DaCHS' default of the size of the source file). Don't sweat this too much, just don't claim something is 1e9 bytes when you're really just returning a few kilobytes. The rowfilter expects the size in bytes. The bindings already prepare for making and serving previews, which is discussed in more detail in :dachsref:`Product Previews` in the DaCHS reference; see there for everything mentioning “preview“. SSAP has a feature that lets users request certain formats, and for clients that don't know Datalink, this *may* be a good idea. In that scheme, you use a rowfilter to return a description of your native data *and* the processed SDM-compliant dataset as used here. See :samplerd:`theossa/q` for an example how that would look like. Our recommendation: don't bother, it's a misfeature that will most likely just confuse your users. The rowmaker is fairly standard; we should perhaps mention the elements:: %use getWCSAxis(@header_, 1) for IRAF FITSes (delete otherwise)% %typically, @specAx.pixToPhys(1)*1e-10% %typically, @specAx.axisLength% ``getWCSAxis`` is a function that looks at a FITS image's WCS information to let you transform pixel to physical coordinates. This currently uses a simplified DaCHS implementation that only does a small part of WCS (but we may change that, keeping the interface stable). The ``var``, anyway, binds the resulting object to ``specAx``. You can use that later to find out the limits of the spectrum. The way it is written here, you will still have convert the value to metres manually. But as said above, if you're publishing a homogeneous collection of spectra, both values are probably constant, and you'll want to remove both maps from the template. The SSA View '''''''''''' The template goes on defining the ``data`` table that will serve as the basis of the service. It starts with the declaration:: sdl ssa_pubDID This is lets DaCHS add a link to the datalink service to results generated from this (both via SSAP and TAP). There's nothing you need to change here (unless you chuck datalink); see `SSAP and Datalink`_ and `Datalink`_ for details. The main body of the table definition is :dachsref:`the //ssap#view` mixin. In it, you need to write the SSA parameters as SQL literals (i.e., strings need single quotes around them) or expressions involving column references. To keep things fast, you should have SSA-ready columns in the source table, so you will usually have column references only here. Most of these items default to NULL, so if you do not have a piece of metadata, it is reasonably safe to just remove the attribute. A few of the mixin's parameters deserve extra discussion: * ``sourcetable`` – this is a *reference*, i.e., this must resolve to the ``id`` of some table element. It can be cross-RD if really necessary. It is *not* the SQL table reference (that would include a schema reference). * ``copiedcolumns`` – this lets you copy over columns from the source table, i.e., the one you just defined using (comma-separated) shell patterns of their names (yes, that's just like ``idmaps`` in rowmakers). The ``*`` given in the template should work in most cases, but if you have private columns in the source table, you can suppress them in the view; a useful convention might be to start all private columns with a p; you'd then say ``copiedcolumns="[^p]*"``. Note that copied columns are automatically added in the view as 1:1 maps, and you cannot use view arguments to override them. Use different column names in the source table and the view if you (think you) have to do view-level processing of your values. * ``customcode`` – use this if you have extra material to enter in the view definition generated by the mixin. We hope you won't need that and would be interested in your use case if you find yourself using this. * ``ssa_spectralunit``, ``ssa_fluxunit`` – these are the only mandatory parameters starting with ``ssa_`` (but their values are still overwritten if they are in copiedcolumns). There really is no point in having them vary from row to row because their values are metadata for the corresponding error columns (which is one of the many spec bugs in SSAP). * ``ssa_spectralucd``, ``ssa_fluxucd`` – these are like the unit parameters in that they contain data collection-level metadata. The only reason they are not mandatory is that there are defaults that seem sensible for a large number of cases. Check them, and again, you cannot really let them vary from row to row. * ``ssa_fluxSI``, ``ssa_spectralSI``, ``ssa_timeSI`` – these were an attempt to work around a missing specification for unit strings in the VO. Since we now have VOUnit, just ignore them. The data item making this table is trivial. You should set it to ``auto="False"`` (i.e., don't build this on an unadorned ``dachs imp``). The building of this data will normally be triggered by the ``recreateAfter`` of the source table import. SSAP Services ''''''''''''' Use the :dachsref:`element ssapCore` for SSAP services. You must feed in the condition descriptors for the SSAP parameters you want to support (some are mandatory). The simplest way to do that is to ``FEED`` the :dachsref:`//ssap#hcd_condDescs` stream. It includes condition descriptors for all mandatory and optional parameters that we can remotely see in use. Some of them may not be relevant to your service because your table never has values for them. For example, theoretical spectra will typically not give information on positions. The SSAP spec says that such a service should ignore POS rather than returning the empty set. We consider that an unfortunate recommendation that you should ignore; if someone queries your theoretical service with a position, it is highly likely they do not want to see all your spectra. If you nevertheless think you must ignore certain conditions, you can use the ``PRUNE`` active tag. This looks like this:: Again, do not do this just because you don't have, say position information. Here is a table of parameter names and ids; you can always check them by inspecting the output of ``dachs adm dumpDF //ssap``: ============== =========== Parameter name condDesc id -------------- ----------- POS, SIZE coneCond BAND bandCond TIME timeCond ============== =========== For APERTURE, SNR, REDSHIFT, TARGETNAME, TARGETCLASS, PUBDID, CREATORDID, and MTIME, the condDesc id simply is ``_cond``, e.g., ``APERTURE_cond``. To have custom parameters, simply add condDesc elements as usual:: For SSAP cores, ``buildFrom`` will enable “PQL”-like query syntax such that users can post arguments like ``20000/30000,35000`` to ``t_eff``. This is in keeping with the general SSAP parameter style, while more modern VO services use 2-arrays for intervals (“DALI style”). To expose SSAP cores, use :dachsref:`the ssap.xml renderer`. Form-based Spectral services '''''''''''''''''''''''''''' Using the form renderer on SSAP cores is not terribly useful, because the core returns XML directly, and there are far too many parameters no human will ever be interested in anyway. Hence, you will typically define extra browser-based services. The example RD shows a compact way to do that:: \\schema Web accref, mime, ssa_targname, ssa_aperture, ssa_dateObs, datalink Essentially, we only select a few fields people might want to query against, and we directly build them out of the query fields; the SSA condDescs are bound to the funny and insufficiently defined SSA input syntax and probably not very useful in interactive applications. The extra selector for object names with the names actually present in the database is a nice service as long as you only have a few hundred objects or so. Since the query over ``ssa_targname`` is executed at each load of the RD, it should be fast, which means that even for medium-sized tables, you should have an index on the object names in the ``raw_data`` table, probably like this:: On DaCHS newer than 2.5.1, instead of ``fromDb``, it is usually better to say:: enumerate in the table element. This will make ``dachs limits`` create the necessary metadata once and not on every RD reload. Note that DaCHS will only create statistics for views if the ``forceStats`` property on the table definition is set. However, if you create statistics on the metadata table, these will be inherited by columns copied over into the view, which means that you will in general not have to do this. In the output table, we only give a few of the many dozen SSAP output fields, and we change the units of the spectral limits to Angstroms, which will look nicer for optical spectra. For educational reasons you might want to change this to nm (nanometer). In the template, this form-based service is published as a capability of the SSA service. This is done using the service attribute in the :dachsref:`Element publish` *in the SSAP service element*:: See `Registering Web Interfaces to DAL Services`_ for more background. SSAP and Obscore '''''''''''''''' The SSA metadata is not far from the Obscore metadata (cf. `publishing anything through obscore`_), and so an Obscore publication of SSAP data almost comes for free: Minimally, just mix in :dachsref:`//obscore#publishSSAPMIXC` and set ``calibLevel``. The template does a bit more:: //obscore#publishSSAPMIXC – if you use one of the old hcd or mixc mixins, you do not want oUCD and emUCD. In particular for larger spectral collections, it is highly recommended to also have :dachsref:`the //ssap#simpleCoverage mixin` in an obscore-published spectral table; only then will you get indexed queries when there are constraints on s_region, and having these non-indexed will lead to really slow obscore queries. Similarly, you should probably add:: to large SSAP tables, which creates an index useful for obscore queries over ``t_min`` and ``t_max`` (cf. :dachsref:`//ssap#obscore-time-index`). Whenever the spectral coverage is not constant on large-ish SSAP tables, you should also have:: SSAP and Datalink ''''''''''''''''' Given that you will usually get fairly bizarre inputs and will probably want to publish “repaired” spectra, using Datalink to provide both native and SDM (“Spectral Data Model”) compliant spectra without having to resort to SSAP's ill-thought-out FORMAT feature is a fairly natural thing to do. That is why the SSAP+datalink template comes with almost all that you need to do that; what is left is mainly to write an embedded grammar to parse the spectra (if the parsing is complex, you might to go for an :dachsref:`Element customGrammar`, which lets you keep the source outside of the RD). Other than that, it is just a few formalities. So, you first define the table that will later hold your spectrum. Use :dachsref:`the //ssap#sdm-instance` mixin for that (this continues material from the template)::
//ssap#sdm-instance%a few words what a spectrum represents%
The descriptions you need to enter here typically end up being labels on axes of spectral plots, so it is particularly important to be concise and precise here. If your spectrum has additional columns (e.g., errors, noise estimates, bin widths), just put more columns in here. The mixin pulls in all the various params that SDM wants to see from the row of the spectrum in the SSAP table. Note that the table does not have ``onDisk="True"``; these tables are only made for the brief moment it takes to serialise them into what the user receives. As usual, to fill tables, you want a data element. The template just gives a few hints on how that might work. As a working example, :samplerd:`zcosmos/q` parses from 1D FITS images like this:: fitsPath = products.RAccref.fromString( self.sourceToken["accref"]).localpath hdus = pyfits.open(fitsPath) ax = utils.getWCSAxis(hdus[0].header, 1) for spec, flux in enumerate(hdus[0].data[0]): yield {"spectral": ax.pix0ToPhys(spec), "flux": flux} hdus.close() The way this figures out the file from which to parse will work if you have the actual file path in the product table. When you hide the upstream format as recommended, you have to follow some custom convention. The SSAP+datalink template has:: sourcePath = urllib.decode( self.sourceToken["ssa_pubDID"].split('?', 1)[-1]) This works if you use the :dachsref:`macro standardPubDID` as in the template. But you can do arbitrary things here; see the :samplerd:`califa/q3` RD for an example for how you can decode the accref to database rows. A more complex scenario with ASCII data stored externally and cached is in :samplerd:`theossa/q`, a case where multiple orders of echelle spectra are being processed in :samplerd:`flashheros/q`. You will notice that the rowmaker in both the example and the template is missing. DaCHS will then fill in the default one, which essentially is ``idmaps="*"``. Since you are writing the grammar from scratch, just use the names of the columns defined in the instance table and be done with it. The predefined column names are ``spectral`` and ``flux``, so make sure you always have keys for them in the dictionaries you yield from your grammars. While there is no rowmaker, the ``make`` does have an :dachsref:`Element parmaker`; this is stereotypical, just always use the ``procDef`` as here. It copies values from the SSA input row to the params in the instance table. Finally, you would need to write the service. For SSAP and SODA, what's in the template ought to just work. Add :dachsref:`Element metaMaker`-s to provide links to, e.g., your raw input files if you want. In that case, please skim the endless chapter on :dachsref:`Datalink and SODA` in the reference documentation to get an idea of how descriptor generators, data functions, and meta makers play together. For reasons discussed in `Datalinks in Columns`_, it *may* be a good idea to include a custom column with a datalink URL in the SSAP table. The ssap+datalink template already has such a column in its source table and fills it in import's rowmaker. You will usually want to touch up the instance table's metadata a bit. For example, the VOTable name of these tables by default will be constant (``instance``, usually), which will then show up, e.g., in the spectrum browser of SPLAT, where it is not terribly useful. It is hence usually a good idea to have an apply in then parmaker of your sdm instance-making data item. Note that the SSAP row with your metadata is not in ``vars`` here as usual elsewhere. Instead, you will find it as the ``sourceToken`` on the parser. The following example, lifted from :samplerd:`gaia/s3`, ought to help you figure things out:: sourceId = vars["parser_"].sourceToken["source_id"] targetTable.setMeta("description", base.getMetaText(targetTable, "description") +" for Gaia DR3 object {}".format(sourceId)) targetTable.setMeta("name", str(sourceId)) SSAP for Time Series '''''''''''''''''''' As long as there is no anointed successor to SSAP explicitly catering to time series, you can use SSAP to publish time series. It is a bit of a hack, but clients like SPLAT do about the right thing with them. As to examples, check out :samplerd:`k2c9vst/q` (which parses the data from ASCII files) and :samplerd:`gaia/q2` (which stores the actual time series in the database, a technique we believe is a very good idea). To notify the Registry (and possibly clients, too), that you are producing time series, do two things: (1) Globally declare that you serve time series by setting:: timeseries near the top of your RD. The ssap+datalink template has more information on the ``productType`` meta. (2) have ``'timeseries'`` as ssa_dstype in your view definition. Building Time Series VOTables ''''''''''''''''''''''''''''' **This section is under construction** Since there is no actual agreed-upon standard for the serialisation of time series, you will probably have to produce time series on the fly; DaCHS helps you to produce something that will hopefully follow standardisation efforts in the VO without *much* additional work later on, using, you guessed it, a mixin. For now, the only mixin available is for photometric timeseries: See :dachsref:`the //timeseries#phot-0 mixin` to build things. If you have other time series, please write mail to dachs-support. To see things in action, refer to :samplerd:`k2c9vst/q`, the instance table and the corresponding makes. However, there is an additional complication, showcased in :samplerd:`gaia/q2` and :samplerd:`bgds/l`: it is quite common to have time series in multiple bands in one resource. For DaCHS, this is a bit of a problem, because the band influences quite a bit of the table metadata in DaCHS – this is in the mixin, and what you set there is fixed once the table instance is made. To get around this, look at the technique shown in :samplerd:`bgds/l`. This first defines a STREAM ``time-series-template`` with a macros where the items very between bands:: band_short, band_human, band_ucd, effective_wavelength i, SDSS/i, em.opt.I, 7.44e-7 r, SDSS/r, em.opt.R, 6.12e-7 The dispatch between the different table templates then happens in the data function of the tsdl service, using a somewhat obscure feature of ``rsc.Data``: when using the ``createWithTable`` class function, you can pass in the table that the data item should make. This obviously only works in specialised circumstances like the one here, but then it's really convenient. So, while the make in ``make_instance`` claim to build ``instance-i``, this is really overridden in the datalink service's data function to select the actual table definition:: dd = rd.getById("make_instance") descriptor.data = rsc.Data.createWithTable(dd, rd.getById("instance_"+descriptor.band)) descriptor.data = rsc.makeData( dd, data=descriptor.data, forceSource=descriptor) (where ``descriptor.band`` has been pulled from the dataset identifier in the custom descriptor generator of that datalink service; that pattern is probably a good idea when you are in a similar situation). This will not scale well to many dozens of bands – if you have that, you probably want somewhat more hardcore means –, but for the usual handful of bands this is a relatively reasonable way to produce time series with nice metadata. Publishing Anything Through Obscore ----------------------------------- “Obscore”, in VO jargon, refers to a publication of datasets by putting their metadata into a TAP-queriable database table with a bespoke set of columns. It lets people pose very complex constraints, even using uploaded tables, and it is flexible enough to support almost any sort of data the typed services (SIAP, SSAP) serve and a lot more. You may ask: Why have the S*APs in the first place? The answer is, mainly, history. Had we had TAP from the start, it is likely we had not bothered with defining standards for typed services. But that's not how things worked out, and thus client support of Obscore still is inferior to that of the typed services. However, with a view to a future migrating towards obscore, it is certainly a good idea to publish data through obscore, too. The good news is that in DaCHS, that is generally close to trivial. You will sometimes see something called ObsTAP mentioned. This was meant to refer to “Obscore queried through TAP”, but since, really, everyone uses Obscore through TAP, people do not say ObsTAP much any more. If you see it somewhere, pretend it is really saying Obscore. Before you can do anything with obscore, you have to run:: dachs imp //obscore This will also declare support for the obscore data model in your TAP service's registry record, which will make all-VO obscore queries use your service. Avoid that if you do not really publish anything through Obscore. To drop Obscore if you have accidentally imported it, run:: dachs drop --system //obscore Internally, the ``ivoa.obscore`` table is implemented as a view. If this view contains bad SQL or tables that have been dropped, working with obscore can result in rather confusing messages. If that happens, try:: dachs imp //obscore recover This should remove the bad content from the view statement. Obscore Derived from Typed Service Tables ''''''''''''''''''''''''''''''''''''''''' Obscore defines a single table called ``ivoa.obscore``. In DaCHS, that table typically contains data from a multitude of resources with different metadata structures. To keep that manageable, DaCHS implements the table as a view, where the individual tables are mapped onto that common schema. These mappings are almost always created using a mixin from the //obscore RD. Filling out its parameters will result in SQL DDL fragments that are eventually combined to the view definition. In case you are curious: The fragments are kept in the ``ivoa._obscoresources`` table. There is some documentation on what to put where in the mixin documentation, but frankly, as a publisher, you should have at least passing knowledge of the obscore data model (:bibcode:`2017ivoa.spec.0509L`). When you start with a table underlying a typed service, you can get away with just saying something like (using SIAP as an example):: mixin="//obscore#publishSIAP" to the table definition's start tag. You do not have to re-import a table to publish it to Obscore when you have already imported it – ``dachs imp -m && dachs imp //obscore`` will include an existing table in the obscore view. When you import data without the ``-m`` flag, the mixins arrange for everything, so you do not need the extra step of importing ``//obscore``. Since the Obscore data model is quite a bit richer than SIAP's and just a bit richer than SSAP's, you will usually want to add extra metadata through the mixin, for instance:: //obscore#publishSIAP Again, a ``dachs imp -m`` followed by an import of ``//obscore`` would be enough to make these changes visible in ``ivoa.obscore``. See `SIAP and Obscore`_ and `SSAP and Obscore`_ for more information on how to Obscore-publish typed data. Obscore Notes ''''''''''''' Dataset Identifiers ................... Obscore uses the concept of dataset identifiers rather extensively, and it is not unlikely that queries against the ``obs_publisher_did`` column will be run – not the least in connection with datalink, in which the DID has the role of something like a primary key. DaCHS' obscore-associated datalink service, for instance, will do such queries, and will be slow if postgres has to seqscan large tables for pubDIDs. While DaCHS probably does a good job with creating usable (and globally unique) publisher DIDs, it will not index them by default. Use the ``createDIDIndex`` parameter of the various mixins to make one if your data collection contains more than just a few hundered entries and there is no index on it anyway. On the other hand, the creator DID would be assigned by whoever wrote the data file, and you should not change or invent it. It was intended to let people track who republishes a given data set, weed out duplicates, and the like. Regrettably, only very few data provides assign creator DIDs, so it's probably not worth bothering. If you are in a position in which you could make your data provider generate creator DIDs, you could make them set a good precendent. DaCHS helps you by letting you claim an authority for them (which would be the first step). See :samplerd:`tutreg/gavo_edu_auth` for an example RD that, when ``dachs pub``-ed, will claim an authority for your publishing registry, and see `Claiming an Authority`_ for the background on authorities. target_class ............ The obscore model has the notion of a target class for pointed observations; this is intended to cover use cases like “get me spectra of Galaxies“ or so. Of course, this only works with a common vocabulary of object types, which does not actually exist in the VO at this time. The next best thing is `SIMBAD's types `_, which are to be used until then. s_region ........ Obscore has two ways to do spatial queries: using s_ra, s_dec, and perhaps s_fov on the one hand, and using s_region on the other. That is a bit unfortunate because in practice you have to have two indices over at least three columns. Also, DaCHS really likes it if columns are type-clean, and thus the mixins take quite a bit of pain to make sure only polygons are in ``s_region``. Given ``s_region`` in our obscore has an xtype of ``adql:REGION`` and is thus polymorphic, you *might* get away with having other types in there. No proimises on the long term, though. Having said all that: please make sure that whenever there is a position of some kind, you also fill s_region; this is not a problem in SIAP, but where you only have a position and aperture, in a pinch fill in something like:: pgsphere.SCircle.fromDALI( [alpha, delta, aperture]).asPoly(6) (:dachsref:`the //ssap#setMeta mixin` already does that when both a position and an aperture are available). See also `Creating pgSphere Geometries`_ for more information on how to fill geometry-valued columns. Pure Obscore Tables ''''''''''''''''''' You can also have “pure” Obscore tables which do not build on protocol mixins. A live example is the ``cubes`` table in the :samplerd:`califa/q3` RD within the GAVO data centre. Here is a brief explanation of how this works. Somewhat like with `the SSA view`_, you define a table for the obscore columns varying for your particular data collection. In that tables' definition re-use the metadata given in the global obscore table. A compact way to do that is through a LOOP (see `Active Tags`_) and ``original`` references, exploiting the ``namePath`` on :dachsref:`Element Table`::
``adql="True"`` is absent here as the obscore mixin will set it later. To just have all obscore columns in your table, you can write:: instead of the LOOP. If you do not have any additional columns (which you can of course have) and just want to have your datasets in the obscore table, consider having ``hidden`` after the obscore mixin. This will make your table invisible to but still readable by TAP. This is desirable in such a situation because the entire information of the table would already be contained in the obscore table, and thus there is no real reason to query the extra table. In the Califa example cited above, that is not the case; there is a wealth of additional columns in the custom, non-obscore table. We believe this will be the rule rather than the exception. For a quick overview over what column names you can have in the ``listItems`` above, see the `obscore table description`_. Even with a custom obscore-like table, you will almost always want to have DaCHS manage your products. This works even when all your files are external (i.e., you're entering http URLs in ``//products#define``'s path), and so use :dachsref:`the //products#table mixin` (which you don't see with SIAP and SSAP as their mixins pull it in for you):: //products#table Then, apply :dachsref:`the //obscore#publish mixin`, which is like the protocol-specific mixins except it doesn't pre-set parameters based on what is already in protocol-specific tables:: //obscore#publish Essentially, what is constant is given in literals, what is variable is given as a column reference. It is a bit unfortunate that you have to enter quite a few identity mappings in here, but pre-setting them won't help in most cases. That's about it for defining the table. To fill the table, just have a normal rowmaker; since the table contains products, don't forget the :dachsref:`//products#define` rowfilter in the grammar. Registering Obscore-Published Datasets '''''''''''''''''''''''''''''''''''''' Most of the time, you do not need to worry about telling the Registry anything about what you do with obscore. As long as you have the obscore table, your TAP registry record will tell the Registry about it, and as long as that is published, clients looking for obscore-published data will find your service and thus your datasets (if they match the constraints in the obscore query, that is). In the other direction, when you register a service for a data collection published via a typed protocol, DaCHS will add a reference such that clients can see that the data is available through obscore, too. But when you do not register a typed service for your data collection for some reason, you should also register the standalone table as described in `publishing DaCHS-managed tables via TAP`_ SIAP version 2 '''''''''''''' SIAP version 2 is just a a thin layer of parameters on top of obscore. To publish with SIAP version 2, simply ingest your data as described in `publishing images via SIAP`_ and add :dachsref:`the //obscore#publishSIAP mixin`. In contrast to SIAP version 1, you do not define or register a service for a SIAPv2-published data collection. Instead, there is a sitewide SIAPv2 service at ``/__system__/siap2/sitewide/siap.xml``. It is always there, but it is unpublished by default. To publish it, you should furnish some extra metadata in `the userconfig RD`_ and then run:: dachs pub //siap2 Specifically, get the ``sitewidesiap2-extras`` stream and follow the instructions there to update the meta items as appropriate; at this point, they are exactly analogous to the ones for SIAP version 1. .. _Tody et al (2011): http://www.ivoa.net/Documents/ObsCore .. _obscore table description: http://dc.g-vo.org/tableinfo/ivoa.ObsCore Publishing DaCHS-Managed Tables via TAP --------------------------------------- In the simplest form, all you need to do to publish a table through the TAP endpoint is to add an ``adql="True"`` attribute to the table definition and update the metadata (by saying ``dachs imp -m ``). This will include basic table metadata including its columns in both the TAP schema and the tableset of the TAP service, which is marginally enough to make it discoverable once your TAP service is registered. You should, however, take particular care to give a useful description of the table, usually as a direct meta on the table. Keep in mind that people will typcially discover the table through some sort or Registry query. Try to make sure they can figure out whether the table contains data useful to them by that description and the column metadata. However, both the TAP schema and the tableset only contain rather limited metadata. Hence, when there is no published service on the table (which will take care of producing a proper registry record), you should have a publication for the table itself, which substantially increases people's chances to locate the data in typical Registry queries. To publish such a non-service (usually a table definition, but you can register data descriptors containing multiple tables, too), use the :dachsref:`Element publish` on the table. For a simple table, just adding ```` to the table body and a subsequent ``dachs pub`` is enough. To have such a table turn up on the root page of your site – this then links to a page describing the table and giving some information on how to query it using TAP –, as for services tell ``publish`` to include the local set, too:: In more complex scenarios involving multiple tables that should be all be mentioned in the registration, you can define an “inactive” data element. An example is found in :samplerd:`califa/q3`, where five tables make up the data collection's table set:: CALIFA DR3 tables When publishing “non-obvious” tables to TAP, it is a good idea to add one or more TAP examples for it. See `Writing Examples`_. Publishing Externally Managed Tables via TAP -------------------------------------------- While we generally discourage not letting DaCHS do table creation and ingestion, sometimes there already is substantial infrastructure building a postgres table, in particular when data “lives”. If you now want to use DaCHS to publish it via TAP, just write an RD to describe the table, but make the data element trivial and updating (which means that DaCHS will not tear down the table if one accidentally runs ``dachs imp`` without an ``-m`` flag). Here's an example of how that could look like:: My great table ... (more metadata)
id of object covered here
Within the data element you need one ``make`` each for each table you want to publish. After that, say ``dachs imp -m ``. This adds the metadata you've given to all kinds of administrative tables DaCHS keeps, but DaCHS will not touch the rows in the tables. It will also try to fix the permissions of the table such that DaCHS' untrusted user can read it. To let DaCHS manage the permissions, in psql say (assuming standard profiles):: GRANT ALL PRIVILEGES ON SCHEMA TO gavoadmin WITH GRANT OPTION; GRANT SELECT ON . TO gavoadmin WITH GRANT OPTION; If you have local users accessing the table, you may need to declare them in either the allRoles or readRoles attributes to the table definition (see :dachsref:`Element table`). See `publishing DaCHS-managed tables via TAP`_ on the Registry aspects, and in particular the use of a ``sets`` attribute to ``publish`` if you want the table(s) to be visible from your site's front page. EPN-TAP ------- EPN-TAP is a standard for publishing planetary data via TAP\ [#prevepn]_. The `EPN-TAP recommendation`_ is now in version 2.0, and support for that is provided by //epntap2. There's an official web-based client for EPN-TAP at http://vespa.obspm.fr. A hands-on guide on how to do EPN-TAP publications from scratch with a view to this custom client is `available on the VO Paris wiki`_. If you have not at least skimmed this document's `Introduction`_ and the `Dachs Basics`_, by all means start there. .. _available on the VO Paris wiki: https://voparis-wiki.obspm.fr/display/VES/Setting-up+an+EPN-TAP+service%3A+Tutorial+for+Beginners .. _EPN-TAP recommendation: https://ivoa.net/documents/EPNTAP/20220822/ You can use EPN-TAP to publish data without any associated datasets; this happens, for instance, in the catalogue of minor planets, :samplerd:`mpc/q`. More commonly, however, there are data files (“products”) associated to each row. In this case, have at least:: optional_columns="access_url access_format access_estsize" – these are required to manage such products. When publishing datasets, there are two basic scenarios: * local files; you let DaCHS find the sources, parse them, and infer metadata from this; DaCHS will then serve them. That's what's shown in the quick start example below. We believe that is the more robust model overall. * ingest from pre-destilled metadata; this is when you don't have the files locally (or at least DaCHS should not serve them). Instead, you read metadata from dumps from other databases, metadata stores, or whatever. The :samplerd:`titan/q` RD shows an example for that. To start an EPN-TAP service, do as per `Starting from Scratch`_ and use the epntap template:: dachs start epntap Data in planetary sciences often comes in PDS format, which superficially resembles FITS but is quite a bit more sophisticated. Unfortunately, python support for PDS is underwhelming. At least there is PyPDS_, which needs to be installed for DaCHS' :dachsref:`Element pdsGrammar` to work. Quick Sample EPN-TAP Service '''''''''''''''''''''''''''' Install PyPDS if you don't have it anyway:: curl -LO https://github.com/RyanBalfanz/PyPDS/archive/master.zip unzip master.zip cd PyPDS python setup.py build sudo python setup.py install Get the sample data:: cd `dachs config inputsDir` curl -O http://docs.g-vo.org/epntap-example.tar.gz tar -xvzf epntap-example.tar.gz cd lutetia Import it and build the previews from the PDS images:: dachs imp q python bin/makePreview.py Start the server as necessary. If you go to your local ADQL endpoint (something like http://localhost:8080/adql) and execute queries like:: SELECT * FROM lutetia.epn_core there. For access through a standard protocol, start TOPCAT_, select VO/TAP Query, and at the bottom of the dialog enter http://localhost:8080/tap (or whatever you configured) in “TAP URL”. Hit “Use Service”, wait until the table metadata is in and then again query something like:: SELECT * FROM lutetia.epn_core Hit “Run Query”,open the table and play with it. As a little visual treat, in TOPCAT's main window hit “Activation Action”, and configure the ``preview_url`` column under “View URL as Image”. Then click on the table rows. To get into Vespa's query interface, you will have to register your table. **Do not** do this with the sample data. Tables '''''' In essence, EPNcore is just a set of columns, some mandatory, some optional. The mandatory ones are pulled into a table by using :dachsref:`the //epntap2#table-2_0 mixin`, which needs the ``spatial_frame_type`` parameter (see the reference for what's supported for it) since that determines the metadata on the spatial columns. Optional columns can be pulled in through the ``optional_columns`` mixin parameter, and, as said above, a few of these optional columns are actually required if you want to publish data products through EPN-TAP. The reference documentation lists what is available. You can, of course, define further, non-standard columns as usual. So, an EPN-TAP-publishable table might be defined like this:: //epntap2#table-2_0
Filling EPNcore Tables '''''''''''''''''''''' To populate EPNcore tables, use the :dachsref:`//epntap2#populate-2_0`, apply identifying the parameters applying to your data collection and setting them as usual (cf. `Mapping Data`_). You may need to refer to the `EPN-TAP proposed specification`_ now and then while doing that. Note again that parameter values are python expressions, and so you have to use quotes when specifying literal strings. If you have to evaluate complex expressions, it is recommended to do the computations in :dachsref:`Element var`-s and then use the variables set there in the ``bind``s (as ``@myvar``). This also lets you re-use values once computed. Even more complex, multi-statement computations can be done in :dachsref:`Element apply` with custom ``code``. Serving Local Products ...................... When DaCHS is intended to serve local files itself (which is preferable), * use the :dachsref:`//products#define` rowfilter in the grammar as usual (cf. `The Products Table`_). Note that this assumes by default that you are serving FITS files, which in EPN-TAP most likely is not the case. Hence, you will usually have to set the ``mime`` parameter as in, perhaps:: "image/x-pds" * in your row maker, the use the :dachsref:`epntap2#populate-localfile-2_0` apply (if this gives you errors, make sure you have the optional columns for products as described above). Incidentally, you could still use that even for external products, which is useful if you have DaCHS-generated previews or want to attach a datalink service. In that case, however, you have to invent some accref for DaCHS (probably the remote file path) and set that in products#define's ``accref`` parameter. The remote URI then needs to go into the ``path`` parameter. Serving External Products ......................... When all you have are external URLs, you do not need to go through the products table (though you still can, as described in `Serving Local Products`_). It is simpler, however, to just directly fill the ``access_url``, ``access_format`` and ``access_estsize`` columns using plain :dachsref:`Element map`-s. On ``s_region`` in EPN-TAP '''''''''''''''''''''''''' The ``s_region`` parameter (see :dachsref:`//epntap2#populate-2_0`) is essentially a footprint describing the area covered by 2D spatially extended data products. It uses pgshpere types such as ``spoly``, ``scircle``, ``smoc``, or ``spoint`` (we advise against the use of ``spoint`` as a ``s_region`` type: only spatially extended types should be used). The default type is ``spoly``, the others must be specified using the ``regiontype`` mixin parameter (see :dachsref:`//epntap2#table-2_0`). For more information on how to create values for these regions, see `Creating pgSphere Geometries`_. Service ''''''' EPN-TAP tables are queried through the DaCHS' TAP service. If you have registred that, there is nothing else you need to do to access your data. For registration, just add:: to your table body and run ``dachs pub ``. Datalink -------- Datalink is not a discovery protocol like the others discussed so far; rather, it is a file format and a simple access protocol for representing relationships between parts of complex datasets. Essentially, datalink is for you if you have parts of a dataset's provenance chain, refined products like source lists and cutouts, masks, or whatever else. Together with its companion standard SODA, it also lets clients do server-side manipulations like cutouts, scaling, format conversion, and the like. Datalink this is particularly attractive when you have large datasets and you don't want to push out the whole thing in one go by default. Instead, clients can then query their users for what part of the dataset they would like to get – or to alert them of the fact that a large amount of data is to be expected. Since Datalink is very flexible, defining datalink services is a bit involved. The reference documentation has a `large section on it`_. Here, we discuss some extra usage patterns. The concrete application to spectra and images is discussed in `SSAP and Datalink`_ and `SIAP and Datalink`_. See also the `Datalink showcase`_ in the GAVO data centre for live examples of datalink documents. .. _Datalink showcase: http://dc.g-vo.org/static/datalinks.shtml .. _large section on it: http://docs.g-vo.org/DaCHS/ref.html#datalink-cores Associating Datalink Services ''''''''''''''''''''''''''''' In DaCHS, Datalink services are associated with tables. This association is declared using the ``_associatedDatalinkService`` meta item, which consists of a ``serviceId`` (a service reference as per `referencing in DaCHS`_) and an ``idColumn`` (stating from which column the ID parameter to the datalink service is to be taken from). So, within the table, you add something like:: dl pub_did This implies that the service ``dl`` within the current RD will produce a datalink document if passed a string from ``idColumn``. The example implies that this column ought to contain publisher DIDs (see `Dataset Identifiers`_), which is what the standard descriptor generators that come with DaCHS like to see. Since publisher DIDs tend to be a bit unwieldy (they are supposed to be globally unique, after all), the standard descriptor generators will also let you pass in plain accrefs. If you write your own descriptor generator, you are free to stick whatever you like into the ``idColumn``, just so long the table and the descriptor generator agree on its interpretation. Datalinks in Columns '''''''''''''''''''' The ``_associatedDatalinkService`` declaration discussed in the previous section is all it takes when you serve data to datalink-aware clients. If, however, you also want to cater to clients without native datalink support, you may want to add links to the datalink documents in your responses; this is particularly advisable when you have services working through forms in web browsers. One way to effect that is by defining a column like this:: application/x-votable+xml;content=datalink Datalink The property declarations add some elements to response VOTables that inform clients like Aladin_ what to expect when following that link. At this point, this is a nonstandard convention. You will then have to fill that column in the rowmaker. As long as the product is being managed through the products table and you thus used the :dachsref:`//products#define` rowfilter in the grammar, all that takes is a macro:: \dlMetaURI{dl} Here, the “dl” in the macro argument must be the id of the datalink service. This method will retain the datalink columns even in protocol responses. While at this point there is something to be said for that, because users immediately discover that datalink is available, datalink-aware clients will then have both the datalink through ``_associatedDatalinkService`` and the in-table column, which, since they cannot know that the two are really the same, will degrade user experience: Why should the same datalink be present twice? With increasing availability of datalink-aware protocol clients, we therefore prefer a second alternative: produce the extra datalinks only when rendering ``form`` responses. To do that, furnish web-facing services with an :dachsref:`Element outputTable`. In there, do not include the column with your publisher DID but instead produce a link directly to the links response, somewhat like this:: ... yield T.a(href=getDatalinkMetaLink( rd.getById("dl"), data) )["Datalink"] application/x-votable+xml;content=datalink Datalink Datalinks as Products ''''''''''''''''''''' In particular for large datasets, it is usually a good idea to keep people from blindly pulling the data without first having been made aware that what they're accessing is not just a few megabyte of FITS. For that, datalink is a good mechanism by pointing to a links response as the primary document retrieved. Of course, without a datalink-enabled client people might be locked out from the dataset entirely. On the other hand, DaCHS comes with a stylesheet formatting links responses to be usable in a common web brower, so that might still be preferable to overwhelming unsuspecting clients with large amounts of data. To have datalinks rather than the plain dataset as what the accref points to, you need to change what DaCHS thinks of your dataset; this is what the ``//products#define`` rowfilter in your grammar is for:: \dlMetaURI{dl} 'application/x-votable+xml;content=datalink' 10000 [...] [...] The ``fsize`` here reflects an estimation of the size of the links response. When you do this, you must use a descriptor generator that does not fetch the actual file location from the path in the products table, since that column now contains the URI of the links response. For FITS images, you can use the ``DLFITSProductDescriptor`` class as :dachsref:`//soda#fits_genDesc`'s ``descClass`` parameter. The base functionality of a FITS cutout service with datalink products would then be:: My Cutout Service 'mysvcs/data' DLFITSProductDescriptor If you have something else, you will have to write the resolution code yourself – ``DLFITSProductDescriptor``'s sources (in gavo.protocols.datalink) should give you a head start on how to do that); see also the tsdl service :samplerd:`bgds/l` for how to integrate that into your RD. Note that DaCHS will not produce automatic previews in this situation. Have a look at :dachsref:`Product Previews` for what to do instead. HiPS ---- The `Hierarchical Progressive Survey`_ is a nifty way do publish “zoomable“ data, in particular for images (but catalogues work as well). Conceptually, HiPSes are static data, which you will first have to generate. We will cover publishing an image HiPS from FITS files here. .. _Hierarchical Progressive Survey: http://ivoa.net/documents/HiPS/20170519/index.html Building the HiPS ''''''''''''''''' First, get `Hipsgen`_, a piece of java that will do the necessary math. Move the downloaded file ``Hipsgen.jar`` to a convenient place; the following assumes it's in your home directory. If you want to publish HiPSes, at least skim the `hipsgen user manual`_ before proceeding. To generate your HiPS, you next need to build a parameter file. In DaCHS, you will conventionally have that in ``res/hips.params`` (do put it under version control). DaCHS will generate a template for you. Use something like (mkdiring ``res`` if necessary):: dachs adm genhips q#import 4 > res/hips.params Here, ``q#import`` points to the ``data`` element importing the images you want to turn into a HiPS. The ``4`` in the example gives that smallest order to generate a HiPS for. The value here depends on the coverage of your data collections. Use 0 when you have full sky coverage, 1 for half the sky, and so on. To generate a HiPS for a “survey“ of a one-degree patch, you would use 6. This creates a control file for hipsgen that will guess parameters to turn the FITS files the grammar will presumably read into a HiPS in the hips subdirectory of your resource directory. .. _hipsgen user manual: https://aladin.cds.unistra.fr/hips/HipsgenManual.pdf .. _Hipsgen: https://aladin.cds.unistra.fr/hips/Hipsgen.jar It is likely that you will want to edit the hips.params. For one, ``adm hipsgen`` contains a few rather naive heuristics on how to come up with the inputs directories and identifiers. But mainly, the hipsgen manual describes numerous options to influence the hipsgen works for use on the command line (e.g., sky background subtraction, cutting off borders, and much more that DaCHS simply cannot guess). Put these options into your ``hips.params`` rather than on the command line, and you will have a much easier time re-generating the HiPS some later time. In simple cases, you may get away with what DaCHS has generated. Either way, one hips.params is ready, run:: java -Xmx4g -jar ~/Hipsgen.jar -param=res/hipsgen.param *inside the resource directory* (``-Xmx4g`` means “give it 4 Gig RAM”; increase as sensible). In particular if experimenting on a large data collection, consider using ``-pilot=10`` (for just processing 100 input images) first and proceed to inspect the visual appearance of your new HiPS before wasting more resources on a computation that could possibly be improved quite substantially with a modicum of tweaking. This produces (with quite a bit of computation) the hierarchy of files that then serves as a HiPS. Publishing the HiPS ''''''''''''''''''' In principle, you could use a static renderer to publish this HiPS; once computed, DaCHS only needs to hand out files. However, for proper registration and some minor smoothing, there is a custom renderer for HiPS, :dachsref:`the hips renderer`. With this, a service for handing out your HiPSes looks like this:: Fornax Cluster Core in HiPS A HiPS generated from the high-resolution image. hips The ``id`` is of course up to you. With our choice here, the URIs into the service look a bit silly (``...q/hips/hips``). Shortening the description over the RD's description is probably a good idea; HiPS descriptions tend to be one-liners. The staticData property needs to point to the subdirectory into which your built your HiPS (where what is here is the default of ``adm hipsgen``). Finally, the nullCore says that this service will never do any computation. With this service, DaCHS has enough information to complete the ``hips/properties`` file that keeps HiPS metadata in a proprietary format. Hence, run:: dachs adm hipsfill q#hips (the argument is the DaCHS id of the service you just defined). Have a look at ``hips/properties`` after that and fix things as necessary. Note that hipsfill will not touch lines that are not commented out. If you want it to re-compute a line, prefix it with a hash. If your HiPS is relatively small, consider pointing the Aladin lite in the ``hips/index.html`` file generated by Hipsgen to some position that actually has imagery. To do that, locate the instantiation of aladin in that file and edit it to read somewhat like this:: var aladin = $.aladin("#aladin-lite-div", {showSimbadPointerControl: true, target: "54.625 -35.455", fov: 1}); (for starting up at α=54.625, δ=-35.455 with a field of view of a degree). You could publish this service to the registry in the usual way. However, it is rather likely that you already have a service (e.g., SIAP or TAP) on the data collection in question. In that case (and ``dachs hipsgen`` in effect assumes it), it would be wasteful to create a second resource record for the same data collection. Instead, *add* a hips capability to the existing record using:: in the service element of the *other* service. Run ``dachs pub`` after doing this. biblink-harvest --------------- These are facilities for communicating metadata to VO-external metadata services like the ADS. At this point, we have little to add to the material in the reference documentation: :dachsref:`Bibliographic Links and biblink-harvest`. HTML form-based services ------------------------ In principle, you can often just add ``form`` to the list of allowed renderers in a service and have a service display an HTML form and return an HTML table. However, such a service will typically not be very pretty or usable – DaCHS services are normally intended to be consumed by machines. These do not care about a plethora of parameters they will never use, and they can easily fix the display of units or a large number of columns for their users. Web browsers and humans are not good at that, and so it often pays to build extra services targeted at browser users next to typed services – or perhaps have stand-alone forms, for instance on top of a TAP-published table. Soap box: don't waste too much time on form-based services when there are standard protocols to access the data – they're a pain to use, even if, and I admit that, most of your initial users may not have realised that yet. Consider every minute spent on form service as a compromise with an imperfect world. Of course, there are exceptions. Me, for instance, I kind of like our `sasmirala service`_, which doesn't work without a good dash of HTML. .. _sasmirala service: http://dc.g-vo.org/sasmirala/q/im/form Basics '''''' The template for a form-based service to a database table is:: ____ web As usual, the short name must be shorter than 17 characters, and queriedTable contains the id of the table you want to query. That's an XML id or a DaCHS cross-RD reference, *not* the name the table has in the database. Of course, giving more of the usual service metadata usually is a good idea. Query Parameters '''''''''''''''' The query parameters are generated from the condDescs of the dbCore; see `Service Definitions`_ for a bit more on them. So, the classical pattern for adding such a parameter is:: This will enable all kinds of VizieR-like expressions on col_to_query, where the operators available depend on whether col_to_query is a string, a number, or a timestamp. If you've run ``dachs limits`` [#belim]_ on the RD, you will also get placeholders giving the ranges available for numeric columns. However, there are cases when you want to fight the horror vacui (sitting in front an empty form without an idea what to put where) more carefully, in particular with enumerated columns. In that case, you can tell DaCHS to produce selection boxes. This is based on building ``condDescs`` from ``inputKeys`` with options:: where ``calib_level`` references a column that carries the remaining metadata. When the column already carries such a ``values`` element, you can skip the values on the input key. In that case, even a simple ``buildFrom`` will produce a selection box. Note, however, that giving the values on the column itself will reject values not mentioned when importing. Manually constructing the input key also lets you control how many items the browser will show in the selection box by giving a ``showItems`` attribute to the input key (see below for an example). Making ``showItems`` -1 will use checkboxes (or radiobuttons) instead of a selection list. Explicitly enumerating the options in inconvenient when there is an open-ended list of terms, as for instance with object names, ``obs_collection`` in obscore, or perhaps band or emulsion names. For such cases, DaCHS lets you pick the values from the database itself using the values element's ``fromdb`` attribute. This must be a SQL query fragment without the SELECT returning exactly one column. For instance, :: Note, however, that DaCHS will execute this query when loading the RD. Hence, if this is a long-running query, the RD will take a long while to load, which usually is unwelcome. Simply make sure that the query can use an index if your table has a non-trivial size. On DaCHS newer than 2.5.1, instead of fromdb simply give the source column a ``statistics`` property ``enumerate``. Then, ``dachs limits`` will gather the necessary statistics itself (on the obscore, this is already done). DaCHS will also evaluate the ``multiplicity`` attribute of such input keys. If it is ``multiple`` (which is the default in this situation), widgets are chosen that let users select multiple items; if it is ``single``, it the widgets will only admit a single selected item. In all these cases, the input keys will generate constraints using DaCHS' default rules. This includes the various input syntaxes like the VizieR expressions mentioned above; use ``vexpr-float``, ``vexpr-string``, or ``vexpr-date`` as ``type`` when building explicit input keys rather than using ``buildFrom`` *and* you want these extra syntax things. With explicit input keys, you could also tell DaCHS to understand PQL-like syntax (as in SSAP) with the types ``pql-int``, ``pql-float``, ``pql-string``, and ``pql-date``, but, frankly, I don't think that's a good idea. When this is not enough and you need to generate custom query fragments, see `More on CondDescs`_. Output Table '''''''''''' In browser-based services, you are directly talking to the user, and therefore you probably want to return not too many columns. I can also confidently predict that even in the 2020ies you will still be asked for horrible sexagesimal coordinates and wavelengths in whatever twisted units your data providers prefer (though I have to say that moving away from the VO's preferred wavelengths *is* a good thing). For column selection, DaCHS by default picks the columns with a ``verbLevel`` up to and including 20. Columns excluded in this way can be added back using the “more output fields“ selector in the query form. However, in particular for services built on top of standard services, that selection that is often… improvable. In such cases, you can define an output table on the service. The idiomatic way to do that is to copy over the columns you want to retain. The :dachsref:`element outputTable` has a legacy ``autoCols`` attribute for that, but now that DaCHS has LOOPs, my advice is to use the following pattern, because it makes it much easier to sprinkle in modified columns now and then:: The first thing to note here is the ``namePath``, which tells DaCHS where to resolve the columns in by default (this will default to the table the core queries, so you can usually leave it out). Then, we use `Active Tags`_ to copy over columns we want to treat uniformly. In this case, we first copy over a few columns literally, but then want to furnish other columns with :dachsref:`display hints`: here, we mogrify the positions to sexagesimal (definitely not recommended) and do some unit conversion and numeric formatting for the field of view. Then we give both spectral coordinates in Angstrom (this assumes optical astronomers, obviously; a propos of which I should also call out the spectralUnit display hint DaCHS has since version 2.6). The element concludes with another loop copying literally. Note that you can usually give the display hints in the column definition, as they will not hurt the protocol access. This would simplify the element above because you can then copy things without further modification. An extra (questionable) benefit of giving the display hints in the column definitions in the table is that these will even be picked up when users to ADQL query through the web interface. In case DaCHS' built-in display hints are not enough, you can use the ``select`` and ``formatter`` attributes discussed in the reference (:dachsref:`Element outputField`). For instance:: return "x".join(str(s) if s else "1" for s in data) which also shows how to select multiple values at a time. You can also produce arbitrary HTML using stan syntax (cf. :dachsdoc:`templating.rstx`, the section on in-python stan):: return T.a(href="/glots/q/showtables/qp/%s"%urllib.parse.quote(data))[ data[6:]] Example: Form-based obscore ''''''''''''''''''''''''''' Several of the examples above were from :samplerd:`obsform/q`, and RD that is `active in the GAVO data centre`_. If you have an obscore service of your own, you can probably just pull the resdir into your inputs and have the contents of that accessible to browser users, too. .. _active in the GAVO data centre: http://dc.g-vo.org/obsform/q/web It showcases a few tricks that may come in handy in your own form creations. First, there is a cone search condition choosing custom columns; this is necessary here because the obscore s_ra and s_dec columns do not have the UCDs expected by the SCS condDesc (requires DaCHS 2.6+):: "s_ra" "s_dec" The next condition descriptor, ````, showcases how DaCHS turns values metadata attached to the table column into a selection box. To inspect the respective column definition, see the output of ``dachs adm dump //obscore``. Then there is a condition descriptor for the collection column:: This is a bit of a sore spot, as this will be slow at this point once your obscore table has grown to a certain size. This is because of a nasty conspiracy between how you cannot have indexes on views and that postgres does not optimise for constants in DISTINCT queries. We will think about this a bit more deeply if and when this becomes a problem for other deployers. The real exciting pars are the last next two conditions, those for spectrum and time. These are tricky because they are effectively interval-valued in the database, with a minimum and maximum column each. We still want to let people enter our VizieR-like expressions, which means we have to do what DaCHS does behind the scenes form ``buildFrom`` manually. This at the moment requires a certain about of manual code (that will also only work on DaCHS newer than 2.5). Here's the code for the spectral condition:: obscore = parent.parent.queriedTable minCol = svcs.InputKey.fromColumn( obscore.getColumnByName("em_min"), inputUnit="Angstrom") maxCol = svcs.InputKey.fromColumn( obscore.getColumnByName("em_max"), inputUnit="Angstrom") try: tree = vizierexprs.parseNumericExpr(inPars[inputKeys[0].name]) except utils.ParseException as msg: raise base.ValidationError( f"Bad VizieR syntax: {msg}", "BAND") res = vizierexprs.NumericIntervalFlattener( ).getSQLFor(tree, (minCol, maxCol), outPars) yield res So, we first define a fully synthetic input key – none of the metadata of ``em_min`` or ``em_max`` is terribly helpful here. Then we declare a phrase maker. Once we have a good idea how to write this, we will probably give a procDef that hides quite a bit of this uglyness; then again, we as a community probably should just use more interval-typed columns. Phrase makers are ``procDef``-s as, say ``apply``, and hence they consist of a ``setup`` part executed when the RD is imported and a ``code`` part executed once per (in this case) query. Here, we use the setup to find in the min and max columns. We make input keys from them, as that is what the flattener we use later expects. The ``fromColumn`` constructor also lets us smuggle in code for unit adaptation (the ``inputUnit`` attribute that is turned into a scaling factor by the input key machinery). Whatever is defined in a procedure definition's ``setup`` code is available in its ``code``; we will use ``minCol`` and ``maxCol`` in a moment. First, however, we have to deal with the VizieR-like expressions we are expecting. To parse them, we are using ``parseNumericExpr`` from the ``svcs.vizierexprs`` module. This returns a parse tree that can, in turn, be translated into SQL. Before we do that, we catch parse errors (which DaCHS would return as 500 internal server errors) and turn them into ``ValidationError``-s for the BAND parameter. This lets DaCHS' form renderer mark up errors inline rather than just spit out some ugly error page. The actual SQL generation happens using a ``NumericIntervalFlattener``, which encapsulates how to translate the various VizieR constructs to SQL. If you think they should be translated differently, you could derive your own flattener – see ``gavo.svcs.vizierexprs`` on how to do that. Its ``getSQLFor`` method takes the parsed expression, the input columns, and the dictionary of SQL parameters that gets passed into ``condDescs`` implicitly. It is the use of two column objects rather than just one that makes these interval flatteners so special that you currently cannot currently get by without custom code. If you have understood this code, the condition descriptor for TIME will not be very surprising: You just use ``parseDateExprToMJD`` to get the parse tree (and would use ``parseDateExprToDateTime`` if you had timestamp-valued columns in the database). Once that's done, the remaining code is essentially the same, as date and numeric intervals are rather parallel in the VizieR grammar. By the way, these two are obvious candidates for writing a common ``procDef``. Don't be too surprised if the actual SVN code already has that by the time you read this. Services for Editing the Database ''''''''''''''''''''''''''''''''' Users can enter data into the database in many ways. The most common would be through uploads, either directly into the database (as, e.g., in :samplerd:`theossa/q`), by doing file uploads with then get periodically ingested (as, e.g., in :samplerd:`lightmeter/q`), or by being harvested (the classical example would be the relational registry, :samplerd:`rr/q`). Sometimes, however, it is convenient to let people interactively edit database content. We don't have a service that does this in the GAVO data centre; there is a somewhat contrived example for that at http://docs.g-vo.org/editsample_q.rd; to play with it, put it into ``/var/gavo/inputs/editsample/q.rd``. If you have a look at the RD's content, you will first see the definition of the table to be edited, ``objlist``, and a data item that fills it with more or less random data. If you have never used an embedded grammar before, a brief glance at this might be inspiring. To continue with the example, run ``dachs imp q``, as you will need the table to edit it. Linking to Edit Services ........................ The RD then has a ``view`` service, which lets you query the table using the object id. In addition to normal DaCHS fare, it has this:: return T.a(class_="buttonlike", href="/\rdId/edit/form?" +urllib.parse.urlencode({ "id": data[0], "remarks": data[1] or ""}))["Edit this"] By saying ``original="objlist"``, you tell DaCHS to base the output table on the full table in the database (see also `Output Table`_ for alternative ways of getting output fields). What's new is the ``edit`` output field; see http://localhost:8080/editsample/q/view and send off the empty form to see its effect. It has a ``select`` attribute that directly gives material to put into the select clause. We use an array because we want both the id (to know what we will be editing) and the remarks (in order to make it easy not to lose old remarks). If you want to make more or all fields editable, it is probably preferable to use the ``wantsRow`` attribute of :dachsref:`Element outputField`; in that case ``data`` within formatter is the entire row as a dictionary. The formatter then produces stan, which is essentially HTML written in a slightly cooler way (see :dachsdoc:`templating.html#in-python-stan`). Here, I am setting a ``class`` attribute (the underscore is to dodge python's ``class`` keyword) so you could style the link with `Operator CSS`_, and I am computing the URL in a halfway convenient way: using the ``\rdId`` macro makes this thing independent of what RD it lives in, and I am using ``urllib.parse.urlencode`` (which is part of the functions available in rowmakers, which you have in formatters, too) to robustly produce a query string. In this particular case, we only want to edit the ``remarks`` column. If you wanted to edit more columns, you would add the respective columns in a similar way. Note, however, that urlencode will encode ``None`` as literal ``"None"``, which is rarely what you want; if you may have null values, make sure you map them to empty strings manually. Writing an Edit Service ....................... The actual edit service is protected to prevent accidental overwrites by rampaging robots:: – see `Restricting Access`_ for how to manage users, groups, and credentials in DaCHS. I am also seting a link back to the view service:: \internallink{\rdId/view} This helps people to quickly go from the edit service's sidebar back to where they can query the table (but see below for a plausible alternative). Since I cannot see major common use cases in this problem's vicinity, DaCHS has no built-in cores for editing things. Instead, write a :dachsref:`Element pythonCore`. This starts with declaring the input and output structure, where I am requiring both inputs, and I set a manual ``widgetFactory`` to give ``remarks`` a nice text input box:: Don't question too much what that ``widgetFactory`` thing is. It is ancient and venerable, and has been in need of fixing for 15 years. Just take it as a scheme for producing text boxes rather than single-line input fields as it stands. Ahem. You might consider giving the input key for the id a ``widgetFactory`` of ``Hidden``; that would not produce a user-editable widget, which in this scenario *might* be preferable; but then I almost always prefer to assume I am dealing with sensible people, and for them, editing the id might one day be useful, and they would otherwise leave it alone. The output table as defined here is just the object list again; the plan here is to return the edited line. Alternatives are discussed below. The action of python cores is defined in an :dachsref:`Element coreProc`, which is a regular procedure application (just as the :dachsref:`Element apply` you may know from rowmakers). It could look like this:: with base.getWritableAdminConn() as conn: if 1!=conn.execute("UPDATE \schema.objlist" " SET remarks=%(remarks)s" " WHERE id=%(id)s", inputTable.args): raise base.ValidationError("No row for id '{}'".format( inputTable.args["id"]), colName="id") return rsc.TableForDef( self.outputTable, rows=list(conn.queryToDicts("SELECT * FROM \schema.objlist" " WHERE id=%(id)s", inputTable.args))) So, we basically rely on DaCHS' built-in input validation and just turn the items from ``inputTable.args`` – which is sufficiently dictionary-like for our database interface – into an update query. I am using a writeble admin connection here, as normal DaCHS tables (the ones originating from a plain ``dachs imp``) are non-writable by table connections, and the normal table connections cannot write at all (cf. :dachsref:`Database Queries`). You *might* consider making such editable tables writable by the normal web user, too, but for now I have no plans to make admin connections somehow inaccessible to the web server any more, so that is probably not terribly useful. ``conn.execute`` returns the number of rows that were touched by an operation. I am making sure that is one here, and if that is not the case, I am raising a ValidationError with ``colName="id"`` (there cannot be more than one row touched because id is a primary key). Giving the ``colName`` lets DaCHS mark the location of the problem in its browser interface – try it. If everything has gone well, I am building an output table out of the modified row. This is what DaCHS displays in response of a successful request. While that behaviour makes sense – it lets people verify that their edits did what they expected they would –, it is unlikely that users will like it very much. It is more likely that they would like to get back to the original table display. To effect that, make DaCHS redirect there, perhaps with a restriction to the edited id:: raise svcs.Found("/\rdId/view?"+urllib.parse.urlencode({ "id": inputTable.args["id"], "__nevow_form__": "genForm"})) To make this work, you have to acquiant your python core with the gavo.svcs module that defines the ``Found`` exception (this is just an API to produce a 302 Found HTTP status code); the most succinct way to do that is to add:: to the coreProc's content. It somewhat cryptic ``"__nevow_form__": "genForm"`` is a weird implementation detail. Without it, DaCHS will give the user a form filled out with the id. With it, it will actually execute the query. Looking Further ............... I give you the proposed interaction feels somewhat clunky in today's world of animated widgets, whether or not it makes a lot of sense. I'd expect most scientists will put up with it eventually, though. But you *could* do in-place editing by inserting a solid helping of Javascript into a defaultresponse template (cf. :dachsdoc:`templating.html`). This would have a text box open on some user interaction, and once things are typed in retrieve the URL of the service call we produce in the formatter using Javascript's ``fetch``. In such a scenario, it is certainly simpler if the service just returns, say, ``YES`` or ``NO`` depending on whether the update has succeeded. You would do that by returning a pair of media type and payload:: return "text/plain", "YES" If you made whole rows editable, your should probably return the entire row, too, presumably in JSON. To do that, you could write something like:: return "application/json", json.dumps( next(conn.queryToDicts("SELECT * FROM \schema.objlist" " WHERE id=%(id)s", inputTable.args))) Configuring DaCHS ================= While you can run services on DaCHS without further configuration, as soon as other people on the network get involved, you must or should configure your server in various ways. This section starts with a walkthrough though the various items you *must* configure in order to let your services work within the wider network and then covers a few more advanced topics. As delivered, the web interface of DaCHS will make it seem you are running a copy of the GAVO data centre, with some metadata defused such that you are not actually disturbing our operation if you accidentally activate your registry interface. To change this and make the site “yours”, you need to: * Set up a few basic items in gavorc * Set the fallback metadata * Adapt a few things facing the web In particular if running a busier site, you should also look into configuring your Postgres. There are many guides on this on the net; before delving into any of these, you may want to have a brief look at `this blog post`_ discussing relatively DaCHS-specific configuration issues. When you have larger tables, you should also have a look at a `post on parallel queries`_. .. _post on parallel queries: https://blog.g-vo.org/parallel-queries.html .. _this blog post: https://blog.g-vo.org/taming-the-postgres-jit/ Becoming a DaCHS operator ------------------------- When you write RDs, run ``dachs serve debug``, write in ``/var/gavo/etc`` or do similar DaCHS administrative work, you need to be in the ``gavo`` group, and things are simpler when you are a postgres superuser. The default Debian package creates the ``dachsroot`` system user for that purpose but disables logins for that; you could run ``sudo -u dachsroot bash`` before doing DaCHS work or enable logins for dachsroot by running ``passwd dachsroot``. If, on the other hand, you want to be yourself when doing DaCHS work, add userself to the ``gavo`` group:: sudo adduser `id -nu` gavo and make yourself a postgres superuser:: sudo -u postgres createuser -s `id -nu` If multiple people work in ``/var/gavo/inputs``, you will either want to play with POSIX ACLs or at least make things group-writable and make sure everything is group-owned by gavo. Also, set the setgid bit of all directories and subdirectories you create there, as in:: sudo chmod g+s /var/gavo/inputs That way, you minimise permission problems (see also ``help umask`` in bash or the equivalent in your shell if you don't use bash). Basic Gavo.rc Settings ---------------------- DaCHS keeps basic settings in ``/etc/gavo.rc``; a few of them you must set before you can go public. Here is a reference for what you minimally should give on a standalone server:: [general] maintainerAddress: you@mx.example.org [web] bindAddress: serverURL: http://mydc.example.org:8080 sitename: Example Observatory Archive The next few subsections explain the items here (and a few more that you will need in certain common deployment scenarios). More information on syntax and contents of gavo.rc is in the reference documentation's :dachsref:`Configuration Reference`. The Server URL, Ports, and Addresses '''''''''''''''''''''''''''''''''''' In today's internet with its reverse proxies and firewalls, a service cannot usually figure out by itself what it can be reached as. That is why you will almost certainly have to tell DaCHS in ``[web]serverURL``. For the basic setup, we recommend you try things with HTTP first. HTTPS adds quite a bit that can go wrong. See `Enabling HTTPS`_ on how to enable HTTPS once the rest runs as you would like it to run. Also note that serverURL is reflected in several internal tables of DaCHS. If you change it later, remember to run ``dachs pub -a`` to update them; the side effect is that (at least if you use DaCHS' built-in publishing registry) this will update the links in the VO Registry as well. There are a few scenarios: Standalone DaCHS ................ Connecting the DaCHS server directly to the internet is simpler and preferable if you can convince your network folks. You will still have to tell DaCHS what its machine is called from the outside. Also, by default DaCHS only listens on the loopback address, which in practice means you can only reach it from the host it runs on. This is mainly a matter of precaution: We feel a program should have informed consent before running a public service. To make it listen on all addresses configured for your machine, set ``[web]bindAddress`` to the empty string (you could also use the public IP here, but then DaCHS won't listen on localhost any more, and you will have to adapt some examples). Hence, the minimal configuration required for a public site is:: [web] bindAddress: serverURL: http://mydc.example.org:8080 In practice, it looks a lot less provisional if you run on port 80; on the other hand, DaCHS needs to be started as root when ordered to listen on a privileged port (it drops privileges soon, though). That would then look like this:: [web] bindAddress: serverPort: 80 serverURL: http://mydc.example.org Reverse Proxy ............. If there is an nginx or apache that serves as a reverse proxy and DaCHS does not talk with the net itself, you need to use the server name of the virtual host of that reverse proxy in serverURL, perhaps like:: [web] serverURL: https://the.proxy.url adaptProtocol: False If the reverse proxy runs on a different machine, you will additionally need an empty ``bindAddress`` as above. The ``adaptProtocol`` in there means that DaCHS will not try to change http vs. https in links towards itself depending on whether you speak with in encrypted; you'll almost always want this in proxy operation, in particular if your proxy is of the ugly sort that indisciminately “upgrades” to https. If the reverse proxy runs on the same machine, it may proxy other servers as well, and then it is likely that something else is already listening on port 8080. In that case, change serverPort without changing the serverURL, like this:: [web] serverURL: http://the.proxy.url serverPort: 8081 adaptProtocol: False – where serverPort obviously needs to be adapted in the proxy configuration, too. The Site Name ''''''''''''' The machinery needs some short name for the site in so many places that this is in the basic configuration (rather than the basic metadata, where it should reasonably be). This should be a very terse phrase saying who you are; for Heidelberg, that's “GAVO Data Centre”. That phrase goes into the ``[web]sitename`` item. The Maintainer Address '''''''''''''''''''''' There are several situations when the server wants to contact you (e.g., failed cron jobs). To let it do that, teach the server it is on to send mail (we like the nullmailer package for things like that) and set ``[general]maintainerAddress`` to a mail address you regularly read. Enabling e-mail for a server may be a challenge these day; theoretically, all it would take is install the nullmailer package and configure it with some SMTP smarthost. In practice, this may now be a lot more difficult. Ask your institution's mail people. If that does not go anywhere, you could install the exim4 package and configure a local address (for instance, dachsroot) as the maintainer address. You would then read the mail locally on your server. If all else fails, you can configure:: sendmail: cat >> /var/gavo/logs/dead.letters and then have a look at that dead.letters file now and then. But letting your computer send mail is still the preferred thing by far. Path Configuration '''''''''''''''''' If you would like to (perhaps temporarily, for instance in a ``~/.gavorc``) override where DaCHS looks for its tree (the ``/var/gavo`` we have been assuming here; this is often called ``$GAVO_ROOT`` in the reference documentation), set ``[general]rootDir``, perhaps like this:: [general] rootDir: /home/user/gavo You can also override other paths (see :dachsref:`Section [general]`), which are then are interpreted relative to rootDir unless they start with a slash. Two items you may want to change if your rootDir is on a network file system are ``[general]tempDir`` and ``[general]cacheDir`` – things probably run more smoothly if they are local. The Admin Password '''''''''''''''''' You can influence a running DaCHS server through its web interface after authenticating as ``gavoadmin``. The password to use for that is set in the ``[web]adminpasswd`` configuration item (and has no default, which means no remote administration is possible). You normally want this on a site that has publications now and then, as having it lets ``dachs imp`` and ``dachs pub`` tell the web component to release some caches, and it also makes ``dachs serve expire`` work (which is sometimes useful to expunge cached RDs). Authenticated gavoadmin can also access all restricted resources. The password is normally stored in clear text in gavo.rc. This lets a running DaCHS job (e.g., dachs imp or dachs pub) use it to clear caches on a running server. If you don't want or need that – you can always reload or restart manually –, you can store the admin password in hashed form. Run ``dachs adm hashPassword`` to do that Otherwise, as in other places, DaCHS assumes the machine it runs on is trusted. You can restrict gavo.rc access to the gavo group, but of course that only helps against people who do not need to run the ``dachs`` command (which needs access to ``gavo.rc``). Basic Metadata -------------- The next configuration file you must edit before going public is ``/var/gavo/etc/defaultmeta.txt``. This is the root of metadata inheritance (see near the end of `Global Metadata`_); some items (like who to contact) will be the same for almost all your services. Also, some metadata (like your site's description) does not sit on any service to begin with. All this information is given in ``defaultmeta.txt`` in, essentially, lines of ``: `` (in fact, the file is written in the :dachsref:`meta stream format`). In principle, you could set any meta item here that you want to be satisfied if metadata just “falls through”. Items you must override in what DaCHS gives you after installation include: * ``publisher`` – A short, human-readable name for you (for us, somewhat lamely, “The GAVO DC Team”). * ``contact.name`` – A human-readable name for some entity people should write to. This is not necessarily different from publisher, but ideally people can write "Dear " in their mails. * ``contact.address`` – A contact address for surface mail (we've never received mail as far as I remember, but it sure looks nice, and perhaps one day grateful astronomers will send nice presents...). * ``contact.email`` – An email address. It will be published on web pages, so there probably should be some kind of spam filter in front of it. * ``contact.telephone`` – A telephone number people can call if things really look bad. This has happened, and I don't think we have ever had spam calls on the number we've been publishing for more than a decade, so I'd say it's reasonably safe. * ``creator.name`` – A name to use when you give no creator in your resource descriptors. This is used in the creation of some technical records, so put something sensible there; contact.name probably is a good fallback if you are unsure. * ``creator.logo`` – A URL for a logo to use when none is given in the resource metadata. Use a small PNG here; if you've put in a logo_medium.png (see `Customising the Web Interface`_), just use ``/favicon.png``, which is a suitably scaled version of that provided by DaCHS. * ``authority.creationDate`` – This is just a characteristic date for your site; dachs init has filled in the date it was run on, which probably is good enough. * ``authority.shortName`` – A very short (<13 characters) identifier for your site (we're using “GAVO DC”). This is used in a few short names of built-in resources which are limited to 16 characters total. Make this string too long and your registry records will not validate. * ``site.description`` – A description of your site that gives humans some idea what they are looking at. Example: ``The GAVO data centre provides VO publication services to all interested parties on behalf of the German Astrophysical Virtual Observatory.`` Use backslashes an the end of the lines to break long lines. This is used when registering your publishing registry, and it is the introductory paragraph on the default root page. * ``_noresultwarning`` – You can probably leave it like it is; this is a text that is displayed on empty results on forms; since individual services may want to override it, it is configured through the meta system. Since favicon.png was mentioned in there: You can give DaCHS a URL to a Favicon_ in the ``[web]favicon`` configuration item. This could be something in static (in which case you would write ``/static/my-fav-icon.png``) or a full URL. DaCHS provides a scaled copy of your logo that might be suitable on ``/favicon.png``, which you might use as well. .. _tutorial chapter on the registry: tutorial.html#the-registry-interface .. _meta stream format: ref.html#meta-stream-format .. _favicon: https://en.wikipedia.org/wiki/Favicon The Userconfig RD ----------------- In addition to gavo.rc and defaultmeta.conf, there is a third source for configuration information in DaCHS: the userconfig RD. As the name says, it is a DaCHS resource descriptor, but it mainly contains STREAMs or similar material for inclusion somewhere else. Doing this in a way that doesn't break regularly on updates is not simple, and therefore there is a bit procedure involved. Creating a Userconfig RD '''''''''''''''''''''''' DaCHS has a *builtin* RD ``//userconfig`` that is updated as you update DaCHS. It always contains fallbacks for everything that can be in userconfig used by the core code. When DaCHS attempts to resolve a reference into the Userconfig RD (i.e., looks for an element with a given id), it first looks it up in your local copy (which resides in ``/var/gavo/etc``) and then in the distributed userconfig. That way, DaCHS can add new things in the userconfig without forcing you to merge on every update. To create your local, editable copy, pull the distributed RD into ``/var/gavo/etc/userconfig.rd`` (unless, of course, that file already exists):: cd `dachs config configDir` dachs admin dumpDF //userconfig > userconfig.rd Maintaining and Changing the Userconfig RD '''''''''''''''''''''''''''''''''''''''''' If you already have a local userconfig.rd and cannot find something referenced in a userconfig in there, instead run:: dachs admin dumpDF //userconfig | less then pick out the element(s) for whatever you want to change and copy them into your own ``etc/userconfig.rd``. Changes to userconfig.rd are picked up by DaCHS but will usually not be visible in the RDs they end up in. This is because DaCHS does not track which RDs make use of userconfig, so these will typically need to be reloaded manually. For instance, if you changed TAP examples, you'd need to run:: gavo serve exp //tap to make your change show up in the web interface. Although usually not necessary, you can reload userconfig itself using:: gavo serve exp % Note that for ``dachs serve exp`` to work, you need to set `The Admin Password`_, and if all that confuses you you can always just restart the server. Referencing into the Userconfig RD '''''''''''''''''''''''''''''''''' While `Referencing in DaCHS`_ in principle applies to the userconfig RD, normal RDs cannot do the fallback between the local and built-in RD. For this, there is a special syntax that denotes this “magic” RD ``%``. This is what the ``exp %`` above was about. Within an RD, you reference items from userconfig as ``%#id``, as in ``%#obscore-extracolumns``. You can also define your own material in the userconfig RD. As an example, if you have a list of authors that publish a lot of data collections between them and you want to re-use that declaration in multiple RDs, you would put something like:: creator.name: Author, F. creator.logo: http://example.org/logos/author_f.png Author, S.; Co-Author, N.; Ontributor, C. into your userconfig.rd and then, in each RD that describes work from them, just have:: Claiming an Authority --------------------- If you want to run your own publishing registry (which you should if you will publish more than a handful resources), you need to claim and describe an authority. You can skip this section if you intend to `Register Through purx`_ or to `Register Through Web Forms`_. Choosing your Authority ''''''''''''''''''''''' In the VO, every publisher has something like a “namespace”, within which they are free to name things; that namespace is called your **authority**, it it needs to be gobally unique so VO identifiers are globally unique. To pick one, think of a resonably short string defining your site; use dots to structure this as necessary. In Heidelberg, we are using ``org.gavo.dc``, somewhat reflecting our DNS name. Today, I'd probably choose ``gavo.dc``. Since astronomy is small enough, there is no need for deeply-nested hierarchy. If you have picked a string, make sure it is not taken yet. A convenient way to do that is through WIRR_. There, constrain *Resource Type* to *Authority*, then *Add Constraint* and constrain IVOID to the authority you have chosen (`preconfigured WIRR query`_). .. _WIRR: https://dc.g-vo.org/WIRR .. _preconfigured WIRR query: http://dc.zah.uni-heidelberg.de/wirr/q/ui/fixed?field0=restype&operator0=%3D&operand0=vg%3Aauthority&field1=ivoid&operator1=%3D&operand1=gavo.dc&MAXREC=20&OFFSET=0 If nothing comes back, the authority is still free and you can enter it in your gavo.rc in the ``[ivoa]authority`` item, like this:: [ivoa] authority: my-auth Registering your Authority '''''''''''''''''''''''''' The authority truly only belongs to you once you have added an authority record to the VO Registry. The chances that someone else will beat you to a non-trivial authority are slim, though, so do not worry if you have to delay a bit before going live with the following. It doesn't hurt to re-run the WIRR check before you actually go live, though. You will first have do define your authority record. To do that, you have to edit your userconfig RD. See `The Userconfig RD`_ if you have not set one up yet. Once you have your ``etc/userconfig.rd``, look for the ``registry-interfacerecords`` stream. This contains two (relevant) elements: a resource record for claiming your authority, and the registry service, which is what the VO registry will later talk to. While the latter usually does not manual intervention, in the authority, you must replace the the metaString macros defaulting to UNCONFIGURED (you could set the corresponding items in defaultmeta instead, but the additional indirection will not buy you anything), specifically: * authority.title – A human-readable descriptor of what the authority corresponds to. * authority.description – A sentence or two on what the authority you are using means. This could be the same as site.description if all you are claiming authority for is that; if you are claiming authority for your institute or organisation, this obviously should be different. * referenceURL – A URL at which people can learn more about your data centre. As an example, the following could work:: resType: authority creationDate: \metaString{authority.creationDate} title: The Utopia Observatory Data Center Authority shortName: \\metaString{authority.shortName} subject: Authority managingOrg: \\metaString{publisher} referenceURL: http://utopia.example.org identifier: ivo://\getConfig{ivoa}{authority} sets: ivo_managed The Data Center at the Observatory of Utopia manages lots of substantial data sets created by the bright scientists all over the world. All data collections published there are registered under this authority. You can, of course, override any of what is filled through the remaining macros, too, but you should have a good reason to do so. Once you have done that, you can proceed to `Preparing your Publishing Registry`_ to make the VO Registry see all this. Or you can keep customising your DaCHS installation first. Customising the Web Interface ----------------------------- DaCHS talks to web browsers, too. To customise the appearance, you can override almost everything DaCHS delivers. This section discusses the various hooks, roughly going from things everyone will want to do to things interesting if you want your site to look more exciting than our rather plain default look. The Logo '''''''' The least you should do is replace the DaCHS logo in ``/var/gavo/web/nv_static/img/logo_medium.png`` with something pertaining to you. Scale the bitmap to about 250 pixels width or so (it will typically be used at 100 pt in the CSS). In general, anything you put into ``/var/gavo/web/nv_static`` will be visible underneath ``/static/`` on the web. See also `Customising the Web Interface`_. The Root Page ''''''''''''' When people dereference your data centre URI, they will see the root.html template rendered. The default is a simple list of services published in the local set (see `Registering Services and Data`_), together with site.description from your defaultmeta. This may do when you plan to publish more than a few but less than many services. If you only have some very few services that furthermore will rarely, if ever, change, it is probably perferable to just write a static root page in plain XHTML in ``/var/gavo/web/templates/root.html`` (use ``nv_static`` from ancillary material as mentioned in `The Logo`_). If you expect to have many services, you should probably derive your root page from the root page that we use in Heidelberg; more on all this in :dachsdoc:`templating.html#the-root-template`. Note that for DaCHS to notice you have overridden a built-in file, you must restart it once. If you have special needs, there is also the ``[web]root`` configuration item. Set that to a relative path to replace the root service (which defines custom data functions such as titleList, chunkedServicesList and such used in the root templates) with a completely different service. Typical uses for this include pointing people directly to the service on single-service installations, or pointing to the ADQL form when what you are really doing is run a TAP-only system; in that latter case, you would say:: [web] root: __system__/adql/query/form Extra Sidebar Items ''''''''''''''''''' It is rather common to want to stick extra material into the sidebar. Overriding the sidebar template is a possibility, and it will probably survive several upgrades without breaking big time. However, if it is just a few links or lines of text you want in there, there is the ``_sidebarlocal`` meta item that the default sidebar templates includes near its foot. There are the ``sidebarnote`` and ``sidebarmicro`` classes you can use make things fit; and, in order to pull raw HTML all the way through DaCHS' guts, you have to use the raw metadata format. Hence, in ``etc/defaultmeta.txt``, you would put something like (see :dachsref:`Stream Metadata` for details):: _sidebarlocal: raw:
\ Try ADQL to query our data.
Note that you can easily ruin the well-formedness of your document in this way; the material is really included in the HTML without any adaptation (except UTF-8 encoding). A recommended use for this is to link to a privacy policy. DaCHS tries hard not to store any data that could fall under the GDPR or similar legislation, but it's still nice to tell that to people; if you do process personal data, you must have some declaration like that in most juristictions anyway. To have a privacy policy, go to ``/var/gavo/web/nv_static`` and put the following data into privacy.shtml:: Privacy Policy

Privacy Policy

It's nothing personal. Really. In particular, while we do store our logs for a year, we never keep the source IPs.

You will see the result at http://localhost:8080/static/privacy.shtml. To have this linked from your sidebar, just add the following to your default meta:: _sidebarlocal: raw: The somewhat odd extension ``.shtml`` for DaCHS means that the file will be interpreted as a template over the service ``//services#root``, which means that you can use the usual render functions and data items as per :dachsdoc:`templating.html`. This is be an easy way to have your page fit into the look of the remaining site. You could also use plain ``.html``, in which case you could even escape the necessity to provide well-formed XHTML. Operator CSS '''''''''''' To override css rules we distribute or add new rules, avoid changing or overriding gavo_dc.css, as that will be a liability when upgrading. Instead, drop a CSS file somewhere (recommended location: ``/var/gavo/web/nv_static/user.css``) and add a configuration item in ``[web]operatorCSS``. With the recommended location, this would work out to be:: [web] operatorCSS: /static/user.css in /etc/gavo.rc. This can also be an external URL, but we recommend against that, as that would force a browser to open one external connection per web page delivered. By far the most common complaint is that we are limiting the width of p and li elements to 40em. We believe that text lines much longer than that are hard to read and should be avoided. On pages with tables where users might actually want their browsers to fill the entire screen, this choice cannot be made through a sensible choice of the width of the user agent window but requires CSS intervention. Having said that, if you really think you want window-filling text lines, just put:: p, li { max-width: none; } into your operator CSS. Overiding Built-in Web Resources '''''''''''''''''''''''''''''''' As mentioned in `The Logo`_, you can put files and directories into ``/var/gavo/web/nv_static`` and have them show up on the web under ``/static``. These files will override built-in resources of DaCHS if they overwrite their names. This might be intersting if you would like to fix a bug or try something out. The originals sit in DaCHS' resource tree under web. Hence, if you want a local help file, in ``nv_static`` say:: dachs adm dump web/help.shtml > help.shtml DaCHS decides between built-in and overridden when a resource is first accessed. Hence, when you first override a given resource, you have to restart the server. After that, DaCHS will pick up updates by itself. Here are some built-in resources that you might want to override: * ``help.shtml`` – the help file. Unfortunately, we blurb quite a lot about GAVO in there right now. We'll think of something more parametrisable, but meanwhile you may want to have your own version. This changes rarely, so it probably won't hurt to just fork it. * ``web/xsl/dachs-xsl-config.xsl`` – this lets you customise (somewhat) the responses of various pages that are just XML with XSLT when viewed through the browser. This includes the OAI-PMH pages, UWS management, and VOSI responses. * ``web/css/gavo_dc.css`` – the central CSS; only override this during bug fixing or similar. For anything more permanent, use `Operator CSS`_. * ``web/js/gavo.js`` – this is most of the custom javascript that DaCHS uses here and there. Again, only override this for debugging (in which case you will always want to set ``[web]jsSource`` to ``True`` to keep DaCHS from minifying your code). If you need custom javascript globally on a permanent basis, let us know and we'll do something similar to the operator CSS. On the meaning of the ``.shtml`` extension, see the end of `Extra Sidebar Items`_. Overridden System RDs ''''''''''''''''''''' You can also dump system RDs into ``/var/gavo/inputs/__system__`` (**not** ``__system`` -- that's something else) edit them there; after a restart, DaCHS will use your copy then. Doing this is even more of a liability for upgrades than overriding the CSS or the Javascript. Only do that when debugging or to fix a transient bug until a fixed package is released. The local namespace ------------------- Here's a feature you should only use sparingly if at all: if you have code you need from multiple different RDs, you can put it into ``/var/gavo/etc/local.py``. Whatever is defined in there is then visible to DaCHS code in the ``api.local`` namespace. DaCHS will not automatically reload this module when it is updated. However, you can call ``api.reloadLocal()`` to force such a reload. Local-using server-side code might want to call that when you develop local.py itself. Configuring dependencies ------------------------ DaCHS' behaviour is partly determined by what dependent libraries do, and we try to ensure that the respective configuration is in your configDir (by default, ``/var/gavo/etc/``). Right now, this is true for matplotlib, which, once you have imported ``gavo.base``, is in ``/matplotlibrc`` and astropy, which is configured in ``/matplotlibrc/astropy/astropy.cfg``. Interacting with the VO Registry ================================ If you have tried any of the VOTT examples mentioned in the `Introduction`_, you will have realised that you are not in the VO unless your service is registered. There are several options for how to do that. In this chapter, we will first discuss three options for doing the registration. The remainer of the chapter is then devoted to the preferred option, i.e., using DaCHS' built-in publishing registry. For an introduction into the VO Registry, see :bibcode:`2014A&C.....7..101D`. Getting into the Registry ------------------------- To make your services visible to the VO, you have to get things called **searchable registries** to index your metadata. The first condition for that is that this metadata is organised in a standard structure called VOResource. DaCHS does the organisation for you: on each info page (they're linked from the sidebar, but you can also get them by replacing the renderer name in the access URL with ``info``) there is a link to “VOResource XML”, and the fact that you are now interested in this proves that you start becoming a VO nerd according to what it says there. This piece of XML needs to get “harvested” by the searchable registries. Usually, this happens through a protocol called OAI-PMH. DaCHS speaks this protocol, and most of the rest of this chapter deals with the wonderful things you can do if you let it do that. However, for very small sites (one or two services, we'd say), preparing and running a **publishing registry** (that's what the OAI-PMH endpoints are called), including `Claiming an Authority`_ may be overdoing it. Instead you could either: * use purx, or * register through web forms. The remainder of this section is devoted to these two options. If want to run a publishing registry, you can skip it. Register Through purx ''''''''''''''''''''' purx_ is a little service run by GAVO that lets people put arbitrary (but valid) VOResource records on some web server and have this service talk to the VO Registry to get them in there. Purx will pick up changes to the records, polling them roughtly daily, and propagate their contents to the VO Registry as necessary. To use purx, get the URL of your service's VOResource XML as described above. Paste this URL into purx' enrollment field, wait for a mail to arrive, confirm the publication, and you are done. Your service will have a service identifier starting with ``ivo://purx``, but the rest should contain material from your RD id. To make purx stop publishing your record, you can take down the service and wait for purx to give up after a couple of months. A faster way would be to have the VOResource link return an HTTP 403 status code, which DaCHS so far cannot do (selectively). Let us know if you need it. The good thing about purx is that once enrolled updates are automatic. Still, the extra HTTP-based polling increases latency and adds fragility. And you don't get to have your own authority in the identifiers, which sometimes is inconvenient (e.g., when `Validating your Services`_). .. _purx: http://dc.g-vo.org/purx/q/enroll/custom Register through Web Forms '''''''''''''''''''''''''' Both the `EuroVO registry`_ at ESAC and the `NAVO registry`_ at STScI let you register resources using a web browser and filling out web forms. When you are running DaCHS, this is not terribly attractive, because if you have followed `Global Metadata`_ for your RD and the first two sections of `Configuring DaCHS`_, DaCHS already has essentially all information you would enter there. Going through one of the web-based services still lets you offload authority management to them. And, at least on the EuroVO service, you can simply upload the XML DaCHS generates for you – see the “Create new Resource from File” (or, later, the “Edit Resource” tab in the details for a resource). The material to upload is again what the “VOResource XML” link on your service's info page points to. You will, however, probably have to edit the ``identifier``. Preparing your Publishing Registry ---------------------------------- The VO Registry needs to know a few things about you before it accepts you as a full member; in particular, you must register your authority and your publishing registry. These are defined in the ``//services`` RD (well, in truth, the definition is the userconfig RD, cf. `Registering your Authority`_), so the first records you have to add to your registry come from there; use ``dachs pub`` to do that:: dachs pub //services Depending on other services you are running, you may want to publish the following additional built-in resources at this (or any later) point:: # If you're publishing spectra, images, and the like, in particular # if you produce pubDIDs dachs pub //products # If you run a TAP service dachs pub //tap # If you run a TAP service and want to point people to the built-in # web interface dachs pub //adql # If you have an obscore service with images and want to also # offer the material via SIAv2 dachs pub //siap2 Registering Services and Data ----------------------------- As you have already seen in `Service Definitions`_, DaCHS will only tell the Registry about a service if there is an :dachsref:`Element publish` in it; in `Publishing DaCHS-Managed Tables via TAP`_, you have seen that this element can also be used in :dachsref:`Element table` or :dachsref:`Element data` to tell the Registry about TAP-queriable tables. The ``publish`` element requires a specification of the OAI set the resources shall be published to. Unless you have specific applications, only two sets are relevant: ``ivo_managed`` for publishing to the VO through DaCHS' built-in publishing registry, and ``local`` for publishing to your data centre's service roster on its front page. Other sets could be introduced and used for, e.g., specific sub-rosters, but you'd have to tell DaCHS about them in ``[ivoa]validOAISets``. In services, the ``publish`` element needs, in addition, a ``render`` attribute, giving the renderer the publication is for. With ``ivo_managed``, one publication essentially always corresponds to one capability of the VOResource record. For example, a typical pattern could be:: This generates one capability each for the simple cone search and a browser-based interface; the browser-based interface is, in addition, listed in the local service roster, typically on the root template. When you publish tables (or collection of tables via a ``data`` element), the notion of renderers makes no sense. Instead, you can use a ``service`` attribute to say which service serves that data, except that when you publish tables that have ``adql="True"``, the local TAP service is automatically considered to be a service for that data. So, to publish an ADQL-queriable table to the VO for querying via TAP, just write:: within the table element. A table containing, e.g., data that's queried in a SIAP service in a different RD, would require something like:: shortName: My external service description: This service does wonderful things, even though\ it's not based on GAVO's DaCHS software. http://wherever.else/svc
The rest of the RD would then give all the normal metadata as per `Global Metadata`_ ``dachs pub`` should complain if there's metadata missing. In case you want register services in authorities other than ``[ivoa]authority``, DaCHS lets you do that, too. Edit `the userconfig RD`_, and locate your authority definition in the ``registry-interfacerecords`` STREAM. Copy the element and edit what is different in your new authority. A bit further down in the userconfig, there is the service with the id ``registry``; it says what authorities are being managed by your registry, and since you are registering a new one, you have to add it there, too, by adding a ``managedAuthority`` meta, like this:: edu.euro-vo.org When done, run ``dachs pub //services`` to update DaCHS' resource tables. Registering Web Interfaces to DAL Services ------------------------------------------ A typical situation is that you have a standard service (SSA, SCS, SIAP, etc) and a form-based custom service on the same data. Since the form-based service caters to humans, it can require quite different input parameters (and thus usually cores) and output tables, and so you will usually have a different service for it. If you want to publish both services to the VO, you could add ``publish`` elements with ``sets="ivo_managed"`` to both ``service`` elements – but that would yield two resource records (which you then should link via relatedTo metas). At least when the form interface does not add significant functionality, this would usually seem undesirable – e.g., your service would show up twice in sufficiently unconstrained responses from the VO Registry. Therefore, it is typically preferable to add the web interface as a capability to the resource record of the standard service. To let you do that, the ``publish`` element takes an optional ``service`` attribute containing the id of a service that should be used to fill the capability's metadata. Here's an example:: Form-based service ... ... Note that since both services are published through the same resource, they share a title – the one of the parent of publish. You hence need to make sure that that title fits the browser service, too. For instance, don't say “Parrot Survey SSAP“ but rather “Parrot Survey Spectra“ (which, I would propose, is preferable anyway). Creating an Organisation Record ------------------------------- If you want to fill out the publisherID meta item – and there's no strong reason to do so at this point –, you have to create a registry record for you, i.e., the organisation that runs the data centre and hence the registry. If you still want to have a publisherId, put something like:: resType: organization creationDate: 2007-12-19T12:00:00 title: GAVO Heidelberg Data Centre subject: Organization referenceURL: http://www.ari.uni-heidelberg.de identifier: ivo://org.gavo.dc/org sets: ivo_managed Operating at the Astronomisches Rechen-Institut (ARI) (part of Centre for Astronomy of Heidelberg University) on behalf of the German VO organisation GAVO, the GAVO Heidelberg Data Centre is a one-stop shop for astronomical data publication projects of (almost) any description. We are also active within the IVOA in standards development and Registry operations. into your userconfig.rd's ``registry-interfacerecords`` and then say ``dachs pub //services``. Of course, you'll have to change things to match your situation; in particular, make sure identifier simply is ``ivo:///org`` – and that is what you'll use as publisherID. Server Operations ================= This section covers a few topics that concern the daily, monthly, or yearly maintenance of your DaCHS server. Starting and stopping the server -------------------------------- The ``dachs serve`` subcommand is used to control the server. ``dachs serve start`` starts the server, changes the user to what is specified in the ``[web]user`` config item if it has the privileges to do so and detaches from the terminal. Analoguosly, ``dachs serve stop`` stops the server. To reload some of the server configuration (e.g., the resource descriptors, the vanity map, and the /etc/gavo.rc and ~/.gavorc files), run ``dachs serve reload``. This does not reload database profiles, and not all configuration items are applied (e.g., changes to the bind address and port only take effect after a restart). If you remove a configuration item entriely, their built-in defaults do not get restored on reload either. Finally, ``dachs serve restart`` restarts the server. The start, stop, reload, and restart operations generally should be run as root; you can run them as the server user (by default, gavo), too, as long as the server doesn't try to bind to a privileged port (lower than 1025). All this can and should be packed into a startup script or the equivalent entity for the init system of your choice. Our Debian package provides both a `System V-style init script`_ and a `systemd unit`_. They would typically be installed to /etc/init.d/dachs and /etc/systemd/system/dachs, respectively (but this might, of course, be different if you're running non-Debian systems). With this, on your typical system, you can control DaCHS with:: sudo service dachs start sudo service dachs stop sudo service dachs restart For development work or to see what is going on, you can run ``dachs serve debug``; this does not detach and does not change users, and it also gives a lot more tracebacks in the logs. See `Debugging`_ for how to use this to figure out misbehaviour. .. _System V-style init script: https://salsa.debian.org/debian-astro-team/gavodachs/blob/gavo/debian/gavodachs2-server.dachs.init .. _systemd unit: https://salsa.debian.org/debian-astro-team/gavodachs/blob/gavo/debian/gavodachs2-server.dachs.service Access Logs ----------- While I claim that access numbers are meaningless until you know enough of a domain that you don't need the access numbers any more to figure out whether or not something is a good idea, most data providers at some point find they need access logs. Hence, DaCHS does write logs, albeit without IP addresses by default. We are doing this to keep you quite certainly GDPR-clean unless you configure DaCHS otherwise. However, that is quite clearly not enough to compute the “number of distinct hosts“ statistic that is part of the industry standard of Bad Metrics. To be able to compute something like that, you need to enable the logging of IPs and then install something that somehow evaluates the files in /var/gavo/logs. Doing the first simply means to add:: logFormat: combined to your ``/etc/gavo.rc``'s ``web`` section. After that, you need to restart the server for the new setting to take effect. While you can probably use whatever log analysis program you want for these files (I'd hope it's close enough to the apache combined log format), you can simply use DaCHS to do some basic accounting. That way, you do not need extra tools. You will also in all likelihood treat your users' data a bit more carefully. For this, check out our accesslog resource:: cd `dachs config inputsDir` svn checkout http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/accesslogs (this needs the subversion package; your could also use ``wget -r``, but this way you can simply call ``svn update`` in accesslogs to pull upgrades). The ``accesslogs/q`` RD contains everything that's necessary to regularly (daily at 2:30 UTC, right now) pull logs files into the database and then remove them. It tries a few tricks to weed out robots (which of course has limits), and it will pseudonymise IP addresses so that they cannot be correlated across months. When the first logs are ingested, go to http://localhost:8080/accesslogs/q/stats to look at your server's stats. To figure out access statistics for some specific service, constrain the URI using VizieR-like character patterns. For instance, ``= /__system__/tap*`` will compute the statistics for TAP accesses (the ``=`` requests case-sensitive pattern matches). For certain protocols, constraining to POST requests lets you better guess what was going on, but the IVOA S-protocols treat GET and POST analogous. If you install the ``python3-geoip`` package, accesslogs/q will also fill the ``origin`` column with the ISO country code of what geoip thinks is the country of origin of the request. Authorities misguided enough to want to see logs in the first place tend to be so throroughly misguided as to be interested in “data” of that kind, too. Since origin is even more speculative than access counting, the current public interface does not show it in any way. But if you need some figures for some tedious report, you could execute on your local postgres prompt something like:: select origin, count(*) as ct from accesslogs.accs where acc_year in (2020, 2021) group by origin and perhaps make a beancounter or two happy. Validating your Services ------------------------ While DaCHS tries to follow the standards in order to ensure maximum interoperability, there are always things that can go wrong; in particular, there are many ways write RDs such that the resulting services are invalid. Some places operate validators that tell you if there is something wrong with your site. If you are logged in as gavoadmin, you can run some of these interactively from the service info pages. We recommend, however, to peruse your results on http://heasarc.gsfc.nasa.gov/vo/validation/vresults.pl (the interface lets you select just your services). Also, if you run a TAP service, you should run something like:: stilts taplint tapurl=/tap | grep "^E-" – stilts you can get from where you get TOPCAT_ from. Errors here could be due to DaCHS mis-implementing TAP or because DaCHS implements some TAP proposals stilts doesn't expect; in case of doubt, as on dachs-support. However, it is far more common that your metadata got our of whack for some reason or other. For instance, you may have changed RD metadata (which controls the output on the tables endpoint) without running ``dachs imp -m`` which updates TAP_SCHEMA). An easy way to fix all such problems is to just run:: dachs imp //tap refresh-schema (since 2.9.3; use ``dachs adm updateTAP`` before). Dealing with RD Changes ----------------------- A DaCHS server will in general try to reload an RD if it finds it is changed on disk. If the updated RD cannot be loaded (for instance, because it is invalid XML), it will keep the old RD and leave error messages in the logs (dcErrors and dcInfos) – so, keep these in view when editing RDs under live services. DaCHS will *not* pick up changed RDs under some circumstances: * Built-in ``__system__`` RDs are not controlled. The main reason here is that these may actually have archive members rather than disk files behind them. * Only the file the RD was loaded from is checked. This means that if you override a built-in ``__system__`` RD with your own version in ``inputs/__system__``, DaCHS will not automatically pick that up. * Of course, if you edit a file only used by an RD (e.g., a custom core), DaCHS will also not notice a change; it only looks at the RD. * If you access an RD with no corresponding file and create that file afterwards, that change will also not be picked up automatically. When DaCHS doesn't notice a change to the RD, you can still expunge it from the server using ``dachs serve exp`` if you have set `The Admin Password`_). Admin Interfaces ---------------- Admin Web Interfaces '''''''''''''''''''' Some administrative operations on DaCHS can (or have to) be done from its web interface. To use these features, you have to set `The Admin Password`_. You can then use the “Log in” link in the side bar using ``gavoadmin`` as the user name. If you are logged in as gavoadmin, you should see an “Admin me”-link in the side bar of services. The page behind that link lets you block all services on the respective RD – where blocking means all requests are rejected until the RD is reloaded – and reload the RD (this is equivalent to running ``dachs serve exp``). In the form, you can also set scheduled down times. This is for VOSI, an interface clients could use to figure out whether a service can reasonable be expected to work. Since there don't seem to be clients exploiting the VOSI endpoints for such purposes so far, you probably don't need to bother. You can directly access the administration panel for an RD by accessing ``/seffe/``, e.g , ``seffe/__system__/services``. There are several more or less introspective resources within DaCHS that do not need authentication. Among those, there you should know ``/browse``. This gives a list of all RDs that have (ivo or local) published services or data in them. Links on the RD ids lead to info pages on the RDs, in particular giving tables and services within the RD. Admin CLI Interfaces '''''''''''''''''''' You can also perform various housekeeping operations using ``dachs admin``. Try ``dachs admin --help``. This includes `User/Group Management`_, precomputing automatic previews, and creating registry records for deleted services that got lost. Sometimes people execute stupid TAP queries on your machine (e.g., constraining ra and dec in separate intervals, which is both slow and physically at least doubtful) and you would like to give them a friendly nudge on how to do it better. This is when ``dachs admin tapabort`` is your friend. Call it with a TAP job id (which you can obtain from ``/tap/async`` unless people authenticated; if they did authenticate, you will have to check the ``tap_schema.tapjobs`` table directly in postgres) and a helpful (!) text to abort a TAP job and set it to an error state giving a short explanation what happened and the helpful text. robots.txt ---------- DaCHS answers to requests for robots.txt with a built-in resource that forbids to index URLs starting with ``/seffe`` and ``/login``. You may want to keep other pages out of indices. To exclude those, add a file ``robots.txt`` in your ``webDir`` (``/var/gavo/web`` by default) and add lines like, say:: Disallow: /browse The built-in rules will be prepended to whatever you specify in your user robots.txt. For more information on what you can put into robots.txt, see `Robot exclusion standard`_. .. _Robot exclusion standard: http://www.robotstxt.org/orig.html Two-Server Operation -------------------- When you want to run the database server on a different machine than the DaCHS server, the setup becomes a bit more involved. First, you only install python-gavodachs2 (rather than gavodachs2-server) on the application server. Then, you need to configure the two servers to talk to each other. Details depend on your setup, and you may want to apply extra hardening over the following. This assumes the machine DaCHS runs on is called appserver.local, the database server is dbserver.local To enable remote queries on the database server, on dbserver.local do something like:: $ cd /etc/postgres//main $ sudo vi posgresql.conf # edit so that listen_addresses = '*' # -- without that, the server only listens on the loopback address $ sudo vi pg_hba config # add a line like # host all all appserver.local/32 md5 # below the access line enabling access for 127.0.0.1 # you can be more restrictive (e.g., "gavo" instead of the first "all", # but that's left as an exercise to the reader $ sudo service postgresql restart # the following assumes you have the same username on appserver and on # dbserver, otherwise use the username on *appserver*. $ sudo -u postgres createuser -sP `id -nu` $ createdb -Ttemplate0 --encoding=UTF-8 --locale=C gavo After that, you have a database DaCHS can configure from your account on appserver, using the password you gave in the createuser step. To do that, on appserver.local say:: $ dachs init -d "host=dbserver.local user= password= dbname=gavo" After that, everything should work as in the one-machine case. Hint: do something like:: export PGHOST=dbserver.local export PGUSER= (or gavoadmin, if you prefer) in your appserver's ``.bashrc`` (or equivalent), and ``psql gavo`` will give you a database shell on dbserver. If you trust your appserver, you can even give your password in .pgpass. But don't complain if this bites you later... Enabling HTTPS -------------- DaCHS can natively speak HTTPS; you have to let it claim port 443 on all the network interfaces you bind it to, though. If, on the other hand, you use an external HTTPS termination (usually a reverse proxy), your best shot is probably to configure the reverse proxy's HTTPS URL as serverURL. You cannot currently have parallel HTTP in such a configuration, which is a bit unfortunate (see `HTTP preferred`_). With external HTTPS termination, also set ``adaptProtocol`` to ``False`` in ``/etc/gavo.rc``:: [web] serverURL: https://terminating.pro.xy adaptProtocol: False When, on the other hand, having DaCHS speak HTTPS natively, all you need to do is give it a secret key and a certificate (which really is a public key with a CA's signature); these need to come concatenated in PEM format in a file called ``/var/gavo/hazmat/bundle.pem``. Just so the whole crypto thing doesn't become a total carnival, you should have the hazmat directory owned by root and with permissions 700. DaCHS will only read it when starting up using ``dachs serve`` and before dropping privileges. That way, if there's a minor security issue in DaCHS, you at least don't necessarily lose your private key. Note that DaCHS only supports https on port 443 because that's all you are likely to get certificates for. Exception: ``dachs serve debug`` is intended to be run with user privileges and hence will listen to 40433 if it can read bundle.pem (which it shouldn't) – of course, you will get certificate errors when accessing that. But that's for debugging only anyway. If you really don't want to use letsencrypt (see the next section) and get your own key-certificate pairs, here's what will likely work:: (become root) mv my-key.key my-key.crt /var/gavo/hazmat (that's a move because you don't want secret keys linger around all over the place) cd /var/gavo/hazmat cat my-key.key my-key.crt > bundle.pem (exit root shell, restart DaCHS) In case you got a bundle of intermediate certificates on top, just append that to your /var/gavo/hazmat/bundle.pem. Letsencrypt ''''''''''' To get a certificate that is (hopefully) widely accepted, we strongly recommend you use letsencrypt. DaCHS has built-in support to update these certificates in time. Here's how to go about it:: # DaCHS internally calls a tool called acme-tiny; we much rather trust # it than any crypto code we'd hack together $ sudo apt install acme-tiny # now create the directory with the key material; we'll do all of this # as root; if you don't like this, at least make it a user distinct # from gavo and gavoadmin $ sudo bash # cd `dachs config rootDir` # mkdir hazmat # chmod 700 hazmat # cd hazmat # generate an account key; this is what identifies you against # letsencrypt # openssl genrsa 4096 > account.key # Generate your secret key; these are the holy bits if you believe # in https. # openssl genrsa 4096 > server.key # now tell dachs to have the certificate signed; warning: this will # try to restart the server # dachs serve updateCertificate # Done -- hit ^D to exit the root shell. Letsencrypt certificates are only good for three months. It's a good idea to renew them every two months. And that won't fly if you don't automate it. Automating it means your server will restart without manual intervention. That's bad because it might be down for a significant time (namely if someone is doing a large download at that time). As said below: HTTPS is an operational liability. Anyway, add the following to root's crontab (if you installed DaCHS from source, you'll probably have to use the full path to /usr/local/bin/dachs here):: 21 4 1 1,3,5,7,9,11 * dachs serve updateCertificate -- adapt the time (here, 4:21 local time) to avoid times when it's likely you'll have many users. Also, make sure you receive mails from cron from the machine that does this, because there's a lot that can go wrong here. When updating keys, DaCHS will just write error messages to stderr for cron to transport them. That's probably not a big additional liability, as you should teach the DaCHS server to send mails also for several other DaCHS subsystems. Finally, to register your https endpoints, add:: registerAlternative: True to the ``ivoa`` section in your ``/etc/gavo.rc``. To propagate the information, restart the server and then say ``dachs pub -a`` to make the registries re-harvest your site. If your machine is reachable through different host names (our box in Heidelberg, for instance, is both dc.g-vo.org and dc.zah.uni-heidelberg.de), tell DaCHS about it by listing all the non-primary names (i.e., anything that's not in serverURL) in a comma-separated list in ``[web]alternateHostnames``. HTTP preferred '''''''''''''' We strongly recommend to have an unencrypted HTTP endpoint, and have that as the primary interface even when you support HTTPS *unless* you actually have authenticated services. In particular, please to not forcibly redirect to HTTPS versions of pages; it breaks POST requests (which are quite common in the VO) and quite a few other things, too. DaCHS will give you the same effect in a sane way because (since version 2.2.3) it honours the upgrade-insecure-requests header. That way, clients that are confident they can deal with HTTPS can inform the server, and if the server thinks it's safe to upgrade, it can redirect them. Anything not prepared to deal with or interested in HTTPS will be unconcerned, and everything is under user control, as it should be. Here's the reasoning for preferring HTTP in the VO: HTTPS is largely ineffective against most privacy breaches: Almost all nation-states can issue certificates that almost all user systems will accept; companies or institutions wanting to eavesdrop on their employees can force http proxies and install their own CA certificates on their employee's machines; and of course most privacy violations are side effects of platforms executing loads of Javascript sailing into the browsers with valid certificates. On the other hand, networks become a lot more brittle with HTTPS: On the server side, the certificates need to be managed and regularly refreshed. And that's even before operators employ secure key management (which would probably require extra hardware or at least the use of passphrases). Worse, the client side needs to keep their CA bundles (that's essentially lists of private keys they trust) up to date. With browsers that often carry these along and are updated quite regularly on most boxes, that's marginally manageable. For the VO, clients as a rule are not browsers but tools like TOPCAT or Aladin that may use a CA bundle coming with the Java VM, which often is not as well maintained. And once people start using curl or pyVO, the operating system's CA bundle is used, which on many systems is a mess. The net effect is that a given service may appear to work in the browser and in a script, but not from TOPCAT. Oh my. Given the miniscule benefits and the serious operational implications, you should provide HTTP endpoints and register your those. If HTTPS is available, DaCHS will tell clients via the Registry's mirrorURL feature. If clients think HTTPS is worth it, they in this way can learn about support for it and use that as they see fit; and for browsers, see above on upgrade-insecure-requests. Upgrading --------- Upgrading DaCHS ''''''''''''''' In general, we try to make upgrades painless, but with a system allowing people to play tricks with intestines like DaCHS guarantees are hard. If you are running DaCHS for the longer term (i.e., more than 2 year so or), be sure to subscribe to `DaCHS-users`_. We announce new releases there, together with brief release notes pointing to possible spots of trouble. Ideally, you will have a development system and regression tests in place that let you diagnose problems before going to production. .. _DaCHS-users: http://lists.g-vo.org/cgi-bin/mailman/listinfo/dachs-users Upgrading Installations from Debian Packages ............................................ (1) Make sure you have enabled the intended distribution (*release* or *beta*) in your ``/etc/apt.sources`` (or equivalent) (2) Make sure all RDs DaCHS sees are in order:: dachs val -vc ALL_RECURSE (make that ``ALL`` before DaCHS 2.2). Warnings you can usually ignore, but try to the understand messages you get. You can hide known-broken RDs from DaCHS by dropping a file named DACHS_PRUNE into their directories. Broken RDs are very likely to break upgrades. Fix them or remove them. (3) Do the actual upgrade:: apt update apt upgrade (4) Run:: dachs val -tc ALL This will complain if any of our changes break your services. If that is true and the Changelog did not alert you to the issue, please complain immediately. We may still be able to fix things for other people. ``dachs val`` with a ``-c`` option might complain about mismatches between RD and on-disk metadata; there are several reasons why that may happen, including dumb or clever things we've done in the software. In any case, you should fix the problem, most easily by re-importing the respective table. Upgrading Installations from SVN ................................ If you run from a checkout of our version control system (or, for that matter, from our tarballs), the most important thing to remember is: After every update, do, as a user with ingestion privileges, run the restart sequence:: $ dachs val -c ALL $ sudo service dachs stop # (or whatever you use to stop the server) $ dachs upgrade $ sudo service dachs start # (or whatever you use to start the server) $ dachs test ALL In principle, ``dachs upgrade`` can run while the server is active, and with most updates, users won't even see errors, but since you need to restart anyway, why bother. On possible failures of the ``dachs val`` command, see the respective text for the packaged upgrade case. The steps to update depend on what you did to install out of the subversion checkout. If you initially said ``setup.py develop`` (which we recommend), all it takes to upgrade is:: $ cd $ svn update If you instead initially said ``setup.py install``, do:: $ cd $ svn update $ sudo python setup.py install Upgrading Postgres '''''''''''''''''' Postgres' on-disk formats change from version to version. This makes upgrades a bit of an adventure, in particular if you have large databases that take days to export and ingest. However, Debian helps a bit; even if you don't run Debian, the following might be helpful. The text assumes you are using the default cluster, which is called main. If – and that's likely if you installed DaCHS in the olden days – yours has a different name (we used to suggest “pgdata”), you will have to replace “main” with that name throughout. In case you do not remember, at least in Debian you can just ``ls /etc/postgresql/`` – what you see there is the name of your cluster. As it is easy to destroy all your tables at once if you mistype things here, it is wise to dump your database somewhere before starting with this:: pg_dump -f (some descriptive name).pgdump -F c gavo If you are using the Debian package, do that as ``dachsroot``; that user has the necessary privileges. The remainder of this is assumed to be executed as root. Let's say you are upgrading from 11 to 12. Since we will need these numbers quite a few times, set them in variables – but be *very* careful, if you get this wrong, the commands further down will nuke your database:: export OLDVER=13 export NEWVER=15 To upgrade, first install the new engine and extensions, for instance:: apt-get install postgresql-${NEWVER}-pgsphere postgresql-${NEWVER}-q3c postgresql-${NEWVER} If you use other extensions in your database (e.g., postgis), also install them now. To see what extensions you have installed, a command like:: dpkg-query -W "postgresql-${OLDVER}-*" is probably helpful. Other versioned packages you may want to install include: * ``postgresql-server-dev-$NEWVER`` if you use locally-built extensions. Rebuild and install those extensions *before* proceeding with the upgrade. * ``postgresql-plpython-$NEWVER`` if you use python DB extensions * ``postgresql-contrib-$NEWVER`` for several interesting index schemes and the like. You will probably first have to drop the new cluster the Debian maintainer scripts already have created, as pg_upgradecluster wants a clean slate:: pg_ctlcluster $NEWVER main stop # this is a good time to see if your installation still works; if not, # you stopped the currently active cluster and should *not* drop it pg_dropcluster $NEWVER main If these give errors to the effect that the “specified cluster does not exist”, that's fine. During the database update, postgres will be down, and hence DaCHS will not work. For a large database and a complicated upgrade, the update can take several hours (though, really, the last few upgrades for us were minute jobs although we have about 5 TB of tables). You should therefore let people know what is going on. As long as nobody really does anything with VOSI availability, the next best thing is to put your DaCHS installation in panic mode. To do this, place a file called MAINT into DaCHS' stateDir, maybe somewhat like this:: cat < `dachs config stateDir`/MAINT We're currently upgrading our database and cannot properly respond to requests. We expect to be back online late afternoon UTC, 2021-08-11. EOF Then say:: pg_upgradecluster -k -m upgrade -v $NEWVER $OLDVER main If you want to keep your database in a non-standard place, add another argument here (e.g., ``/big_partition/postgres/$NEWVER``). If you forgot how you did it, you can find the old location using:: grep data_directory /etc/postgresql/$OLDVER/main/postgresql.conf Do *not* use that old directory for the new cluster, but do keep the postgres version in whatever path you use. Note the ``-k`` option to the above command: this makes pg_upgrade use hard links instead of copies. This saves a lot of space and time. The downside is that once the upgrade has succeeded, the old server may go haywire because its files have been changed to fit the new version. In our experience, that's a price worth paying. If this fails, **do not panic**. Most of the time it's some extension you forgot to install, and once you've done that, everything will be fine. The upgrade script points you to the logs of the operation, which should let you figure out what to do. When all is done, pg_upgradecluster configures the old cluster to no longer start automatically and the new cluster to listen where the old one was. You can now probably run your services again, so, remove MAINT in the stateDir, restart the DaCHS server and run your regression tests (you wrote some, didn't you?). However, quite typically the table stats are wrong or ununderstandable to the new version, so for large tables you may see timeouts and the like. In pre-15 versions of pg_upgradecluster, it created a shell script to update the statistics. For these versions, see the command's output for how to call that script (just be sure to execute it as the postgres user). In newer postgreses, simply run:: sudo -u postgres /usr/lib/postgresql/$NEWVER/bin/vacuumdb --all --analyze Pg_upgrade has also printed a command that will remove the old cluster; execute it, and finally deinstall the old postgres packages to avoid confusion – locate them somehow like:: dpkg -l | grep "postgres.*$OLDVER" What is left is one last step, that is, informing postgres about any upgrade extensions that DaCHS uses. This needs to be run as a database superuser, and so DaCHS itself cannot do it. It can, however, produce the necessary commands. So, become ``dachsroot`` again and as that user, execute:: dachs upgrade -e | psql gavo When that's all done, take a deep breath and relax. Possible Issues during upgrade .............................. Frankly, there's lots. We'll update this section based your feedback. (1) Some versions of pgsphere had bad declarations of negators and commutators. These manifest themselves as errors like:: pg_restore: [archiver (db)] could not execute query: ERROR: argument of negator must be a name Command was: CREATE OPERATOR @ ( PROCEDURE = "sline_contains_point_com", LEFTARG = "spoint", RIGHTARG = "sline", COMMUTAT... during upgrades. If you see something like this, you need to remove dangling references from the the ``pg_operator`` system catalog before commencing. So, in the *old* cluster, run:: UPDATE pg_operator AS o SET oprcom=0 WHERE NOT EXISTS ( SELECT 1 FROM pg_operator AS i WHERE i.oid=o.oprcom) AND oprcom!=0 and:: UPDATE pg_operator AS o SET oprnegate=0 WHERE NOT EXISTS ( SELECT 1 FROM pg_operator AS i WHERE i.oid=o.oprnegate) AND oprnegate!=0 (2) Remove versioned tablespaces after failures. pg_upgrade in principle does the right thing with tablespaces: It has per-version names for the actual data directories. However, when an upgrade has failed, that extra data directory is not removed. Another attempt at upgrading then usually fails because that directory is already there. For instance, you'll see in the tablespace /media/db_overflow:: $ ls PG_9.4_201409291/ PG_9.6_201608131/ In that case, you will need to remove the 9.6 directory before proceeding. Topics in DaCHS Publishing ========================== This section discusses various DaCHS aspects in somewhat more detail than in what is in `DaCHS Basics`_. The idea is that you can skim the material and return later when a given problem turns up. Metadata Specials ----------------- The reference manual has a fairly extensive section on :dachsref:`defining metadata`; much of it is tutorial-style, so we encourage you to skim that section at some point. This section takes a look at some metadata-related points that merit discussion on a wider scope. Ranking Schemas and Tables '''''''''''''''''''''''''' (all this since version 2.9.3) You can influence the sequence in which certain clients (like TOPCATs after 2024) display your schemas or tables by ranking them. In typcial DaCHS deployments, where there may be numerous schemas with not terribly many tables, this is mainly interesting for schemas, i.e., RDs. To set an RD's “rank”, set its ``schema-rank`` meta:: 500 There is nothing wrong with having multiple RDs that have the same schema rank. Clients will show these next to each other. We recommend choosing relatively large schema ranks; that way, when a data collection comes in that you would like to fit right between resource A and resource B, there will likely still be an integer between them. At the GAVO data centre, our scheme roughly is: * 10 is the built-in //tap and //adql RDs (but that's subject to change if it turns out TAP_SCHEMA near the top gets on peoples' nerves). * 20 for high-value (in the sense of: output of non-trivial funded projects), original (in the sense of: we are/were the first publishers of this data) data collections * 50 for well-known datasets that people will likely want for cross-matches and similar * 100 for lower-value original data collections that are likely still of high scientific relevance * 500 for rather technical, VO-infrastructure resources * 1000 for data we mainly keep for archiving Anything else is NULL, which means clients will sort them towards the end. Against that, in typical deployments the analogous ``table-rank`` meta probably is less interesting, and you will probably get by with just a few small integers even for fairly complex data. Not setting ranks will sort the respective schemas and tables to the end of this lists. If you play around with the ranks, use ``dachs imp -m`` to update the ranks in the TAP_SCHEMA. If you manipulate a lot of ranks at one time, you can sync the entire TAP_SCHEMA with your RDs by saying ``dachs imp //tap refresh``. Authors, or: Nested Sequential Meta ''''''''''''''''''''''''''''''''''' If you read :dachsref:`RMI-Style Metadata` in the reference, you will notice that the real meta key for the authors of the data is ``creator.name`` (please pay particular attention to what we are saying there about the format of that string). However, in the templates and our examples, you will see we are using ``creator`` throughout. The reason for this is that ``creator`` is complex metadata (there's name, logo, and possibly more), and setting multiple pieces of complex metadata according to the rules discussed in :dachsref:`Defining Metadata` has a nasty pitfall with these. Since this pitfall probably will not bite you outside of ``creator``, you might decide to skip this and just never write ``creator.name`` in your RDs. But then it is safer to understand the issue and read on. So, suppose you had more than one creator. What you then want is a metadata structure like this:: +-- creator -- name (Arthur) | +-- creator -- name (Berta) However, if you write:: creator.name: Arthur creator.name: Berta or, equivalently:: Arthur Berta by rules of :dachsref:`Defining Metadata`, you will get this:: +-- creator -- name (Arthur) | +------ name (Berta) i.e., one creator with two names, which is both invalid and wrong. To avoid this, you have to insert a new creator node in between, and write:: creator.name: Arthur creator: creator.name: Berta This yields the upper (correct) meta structure. Since the creator.name case is so common, DaCHS provides a shortcut, which you should simply always use to define the creators: if you set ``creator`` directly, DaCHS will expect a string of the form:: , {; , } (i.e., Last, I.-form separated by semicolons, as in “Foo, X.; Bar, Q.; et al”) and split it up into the proper structure. You can mix the two notations, for instance if you want to set a logo on the first creator:: Chandrasekhar, S. http://sit.in/chandra.png Copernicus, N.; Gallilei, G. The Products Table ------------------ DaCHS has a central place in which it keeps metadata on datasets it serves: The product table :dachsref:`dc.products`. It is used to assign media types (is it a FITS or a text file?), access control (who owns the file and should it be handed out to everybody?), associated thumbnails, and the table through which the product is actually published (i.e., the table underlying the SIAP, SSAP, or whatever service). This last part, the source table name, is crucial information, because if that table is torn down, the corresponding entries in the product table need to be deleted. Hence, if you ever get it wrong, you need to manually connect to the database and issue a command like ``DELETE FROM products WHERE sourcetable=''``. This source table name is almost always declared in a :dachsref:`//products#define` rowfilter in the grammar feeding the table, which minimally looks like this:: "emi.main" Make sure you do not forget the quotes around the value here – as usual in DaCHS procs, what you bind here are python expressions. You can use macros, so if you're not sure what schema you will eventually choose, you could have written ``"\schema.main"``, too. Unless you are serving FITS files, you also have to give the media type here, for instance:: "emi.main" "image/jpeg" More on Tables -------------- Table Notes ''''''''''' Frequently, you need to say more about a column than is appropriate in the few-phrase description. In catalogue descriptions and VizieR, such situations are handled using notes, and DaCHS follows suit. The notes themselves are kept in meta elements belonging to tables. Since the notes tend to be markup-heavy, their default format is reStructuredText. When entering notes in RDs, there is an attribute ``tag`` on these meta items:: ... The meaning of the flag is as follows: ===== ========== value meaning ===== ========== 1 value is 2 2 value is 1 ===== ========== ...
To assoicate a column with a note, use the column's note attribute:: As tag, you may use basically any string, but it's a good idea to keep it to numbers or at least characters not requiring URL encoding. The notes will be present in HTML table heads, table and service descriptions, etc. If you need to link to one, there is the built-in tablenote renderer that takes the table and the note from its query path. The most convenient way to is it is through the built-in vanity name tablenote, where you would access the note above using a URL like ``http://your.server/tablenote/demoschema.demo/1``. Space-Time Metadata ''''''''''''''''''' As soon as you have coordinates, you will want to declare coordinate metadata on them, i.e., reference frames, roles played by columns (say, x is the time derivative of y, and x1 is a galactic latitude), the position for which times and positions are given for, and so on. Collectively, this is known as declaring space-time coordinates or STC for short. The problem here is that this got a bad start in the VO and went downhill from there. And that is why there is, to this day, no real model for how to express these things. DaCHS has been using a language called STC-S to let operators declare such metadata. It is clear that we will have to change this at some point, and for time metadata, we are already using something else that will probably be roughly stable. Apologies for this state of affairs. We've tried to fix it, but so far without much success. Legacy STC-S definitions ........................ STC-S was an attempt to define a compact notation for describing STC structures. While `the STC-S note`_ still exists, it is clear that neither the underlying model nor the mechanism as such will be retained. Also, few clients interpret the declarations DaCHS makes from this (Aladin does), so it may be not worth the effort to put these declarations in. On the other hand, if you do it now, it will be a lot simpler to add good declarations when there finally is a usable STC model in the VO. .. _the STC-S note: http://www.ivoa.net/Documents/Notes/STC-S/STC-S-20090527.html The basic idea is that STC-S strings are defined in children of table elements, with Note-style STC extended to allow references to atomic table columns in quoted strings (and to geometry-valued columns in square brackets), for instance:: Position ICRS "ra" "dec" Error "e_ra" "e_dec" Position FK4 Epoch J1950.0 "ra_orig" "dec_orig" You do not need to change anything in the column definitions themselves; this is pure annotation. If you refer to non-existing columns, RD parse errors will be thrown. The full STC-S grammar is humonguos and you don't want to bother. Until a proper STC model comes around, we hope the following examples, taken from various `RDs used in Heidelberg`_, will give you enough material to see what to do:: Position ICRS Epoch J2015.0 "ra" "dec" Error "ra_error" "dec_error" Velocity "pmra" "pmde" Error "sigpmr" "sigpmd" Position ICRS Epoch "epu" "ra_ucac" "de_ucac" Velocity "pmra" "pmde" Error "sigpmr" "sigpmd" Position Galactic "glon" "glat" Polygon ICRS [s_region] PositionInterval ICRS [bbox] Position J2000 "raj2000" "dej2000" Redshift OPTICAL "redshift" Time TT 2000-01-01T00:00:00 Position ICRS Epoch J2000 "alphaFloat" "deltaFloat" Error "RAErr" "DecErr" Velocity "pmRA" "pmDE" Error "PMRAErr" "PMDEErr" Time TT "detection_time" Position ICRS "raj2000" "dej2000" Spectral "energy_cor" unit keV TimeInterval UTC TOPOCENTER "time_min" "time_max" Position ICRS SPHER3 Epoch J2010 "alpha" "delta" "distance" Velocity "mualpha" "mudelta" "radialvelocity" Position ICRS SPHER3 "raj2000" "dej2000" "dist" Size "rcluster" "rcluster" -1 Velocity "pmra" "pmde" "rv" Error "e_pm" "e_pm" "e_rv" New-style SIL definitions ......................... Regrettably, the Coords data model is not really usable for annotating database tables. However, you can (and should) use an ad-hoc, DaCHS-specific DM to provide frames and such. You can do that using a little language called Simple Instance Language or SIL that you can use in dm children of table elements. The reference manual has an introduction to the syntax in :dachsref:`Annotation Using SIL`. Since version 2.7.3, DaCHS in interprets time and space children from votable:Coords instances. They are defined in dm children in tables in this way:: (votable:Coords) { time: (votable:TimeCoordinate) { frame: (votable:TimeFrame) { timescale: TCB refPosition: BARYCENTER time0: 0 } location: @obs_time } space: (votable:SphericalCoordinate) { frame: (votable:SpaceFrame) { orientation: ICRS refPosition: TOPOCENTER epoch: "J2000.0" } longitude: @ra latitude: @dec distance: @dist pm_longitude: @pmra pm_latitude: @pmdec rv: @rv } } To adjust this to your tables, pick a timescale from http://www.g-vo.org/rdf/timescale, a reference position from http://www.g-vo.org/rdf/refposition, and change time0 to the JD of the epoch of your calendar; 0 is for JD itself or 2400000.5 for MJD. If you have some other time origin else, DaCHS can help you. For instance:: >>> from gavo import api >>> import datetime >>> api.dateTimeToJdn(datetime.datetime(1970, 1, 1)) 2440587.5 computes the JD of the Unix epoch. Finally, to set the location, reference the column with a leading @. If you use a literal here, the implicit unit is day. See also :dachsref:`STC annotation`. At this point, DaCHS produces VOTable COOSYS and TIMESYS elements from this data. We hope that if and when there is a suitable annotation based on an IVOA data model, we can translate this to whatever it is. Defining Views '''''''''''''' It is quite regularly preferable to represent data in multiple tables (for instance, for normal form reasons); however, outside of TAP you cannot expose such table structures in VO protocols unless you re-join the various tables to some de-normalised table or, in SQL terms, a view. Other reasons to define views include the mapping of nonstandard data to standard table structures, which is what DaCHS does internally for `SSAP tables`_ and when `publishing anything through obscore`_. If you want to manually define a view, declare a table as usual and then give it a ``viewStatement`` – done. In practice, there are a few things you should keep in mind to make such a view definition maintainable. First, you will obviously define the tables involved, for instance::
The stars from the gfh table having counterparts in the master catalogue, together with those counterparts. When just copying columns, `active tags`_ come in handy to copy over column definitions from the two tables; note the use of ``.`` in referencing:: for col in context.getById("gfh"): yield {'item': col.name} As shown in the second loop, you can go as far as to programmatically obtain the column names for your source tables. Only do that if you are sure you actually want to follow changes in source table structure in the view. After declaring the columns, what is left is actually defining the view (see above for the FEED):: CREATE VIEW \curtable AS ( SELECT \colNames FROM (SELECT catno, raj2000, dej2000, pmra AS pmraMaster, pmde AS pmdeMaster, mv AS mvMaster, mb AS mbMaster, component AS compMaster FROM \schema.master) AS m JOIN \schema.identified AS idf ON (masterNo=catno) JOIN \schema.gfh USING (catid, catan))
The viewStatement essentially is more or less arbitrary SQL. For robustness against changes of table structure or schema or table name changes (as in: you should get errors quickly if things are broken), however, you should probably always start with:: CREATE VIEW \curtable AS ( SELECT \colNames FROM The macro ``\curtable`` will automatically adjust to whatever schema and table name is given in the RD, and ``\colNames`` gives all the names of the columns; using it, you will get errors instead of silent failures if you add or remove columns in the table definition but fail to adjust the view statement. When making these tables, it pays to be a bit careful with the ``data`` elements, as of course the view depends on the source tables. In particular, when you re-import one of the source tables, the view will get dropped, and it is a good idea to tell DaCHS to automatically re-make it. The recommended setup for this looks like this:: ... ... – essentially, you make the view in a non-auto data which gets imported every time the source tables are re-made. Note that making the ``data`` for one source table when the other doesn't exist and will not be made in the same import will lead to an error in the view creation. This is harmless for the import of the table you imported, as ``data`` creation on ``recreateAfter`` happens in a separate transaction. Also note that ``dachs limits``, when run on an RD, will ignore views, partly because their columns will typcially already have statistics in the originating tables, partly because Postgres cannot TABLESAMPLE views, which may make column statistics really expensive. If your views contain, for instance, unions, that will yield table metadata that is seriously wrong. While you can update the statistics by passing the identifier of the table descriptor into a dachs limits call, it is preferable to not this in the RD. For that, add:: True to the table element describing the view. Materialised Views '''''''''''''''''' Materialised views work like regular views, but instead of re-writing the query every time, an actual table is created. This *sometimes* is faster, and you can create indices over the results. The disadvantage is that as the underlying tables change, the materialised view will not change automatically. And of course materialised views take up disk space, and there are quite a few situations in which they actually is slower, sometimes by large factors. There still are situations in which materialised view are useful. To define materialised views in DaCHS you do as for `defining views`_, except that instead of ``CREATE VIEW`` you write ``CREATE MATERIALIZED VIEW`` in the view statement. Of course, any indices you may have on the source tables will not help queries against the materialised view. Hence, you will have to add ``index`` elements (again) as necessary. ``dachs limits`` will skip materialised views, too, by default. For them, that has a lot less sense, so you will almost always want to add:: 1 in materialised views. You can update materialised views from the postgresql command line, using something like:: REFRESH MATERIALIZED VIEW myschema.matview This will lock the relation to concurrent queries, which will wait until the view is updated. There is a CONCURRENTLY version of this avoiding that lock-out, but it will need to be used with care. See the postgres docs if you think that matters a lot for your use of materialised views. You could tell DaCHS to regularly update a materialised view if necessary, for instance with something like:: with base.getWritableAdminConn() as conn: conn.execute("REFRESH MATERIALIZED VIEW \schema.matview") See :dachsref:`Element execute` for how to specify more complex cadences, and note that DaCHS lives on UTC if you want to keep such updates outside of your local rush hours. More on Importing Recipes ------------------------- Doing Incremental Imports ''''''''''''''''''''''''' When running archives of ongoing projects, it is usually desirable to import often, but ignore items that are already in the database to make the imports quick and cheap. The key ingredient to accomplish that is the :dachsref:`Element ignoreSources`, which lets you tell DaCHS which files of those it found in ``sources`` it should not ingest (again). What this deals with are paths relative to DaCHS' inputsDir. Unless you did something special, the accref column in tables having data products contains exactly these inputsDir-relative paths. There are various ways to communicate accrefs to ignore: from lines in a file, by a pattern, or from the database. This last case is what most commonly is used in incremental imports, in the form of the ``fromdb`` attribute of ``ignoreSources``. DaCHS, however, when you import something, will always tear down all tables being made, so just ignoring some sources will not be sufficient. To make DaCHS keep tables mentioned in ``make``, give the :dachsref:`Element data` an ``update`` attribute. In sum, the pattern is:: ... Of course, you have to adapt the source pattern and the table to get the accrefs from. For non-data product sources (e.g., files you feed into catalogs), you will have to use some other technique, because these do not have accrefs and you will have to manage the sources already processed manually. An example for how this can be done is the ``rr.imported`` table in :samplerd:`rr/q`. Even more on this is found in the reference documentation's :dachsref:`Updating Data Descriptors`. Creating pgSphere Geometries '''''''''''''''''''''''''''' DaCHS lets you store spherical geometries (polygons, circles, and MOCs) in database columns. Obscore and EPN-TAP have the ``s_region`` column holding such geometries, and you can have custom columns with geometry types that facilitate writing interesting TAP queries. Also, DaCHS is moving towards using pgSphere's spoint for spatial indices (instead of q3c, which has been the preferred spatial index until DaCHS2). To feed such columns, you first have to choose a type for the column; DaCHS' geometry columns are type-clean, i.e., if a column has a polygon in one row, it all other rows will have to have a polygon (or NULL) there, too – rather than, say, a circle or a point. The choice of types is: * ``spoint`` – a simple point. You should not have spoints in ``s_region``, as common queries will run contains or intersects against them, and that rarely does what users expect with points. * ``scircle`` – a spherical circle. pgSphere limits those to radii smaller than pi/2. * ``spoly`` – a spherical polygon, defined through a number of edges. pgSphere currently has rather severe limitations on what these can be; in particular, you cannot have intersecting edges, and their maximum size has limits, too. For complex geometries, prefer smoc. * ``smoc`` – a Multi-Order Coverage map (MOC). MOCs are, conceptually, lists of HEALPixes and thus can represent arbitrarily areas on the sky. If you have possibly large, possibly non-convex, possibly non-simply connected areas, they are what you are looking for. Creating values of these types in DaCHS would usually use classes from the ``gavo.utils.pgsphere`` module, which is available as ``pgsphere`` in rowmakers. These are usually constructed with arguments in rad; for this and other reasons, we recommend using their ``fromDALI`` class methods. These take the canonical representations forseen by DALI_ for VOTables, which are usually just arrays for floating point values in degrees. So, to fill an spoint column, you would say something like:: pgsphere.SPoint.fromDALI([ float(@centerLong), float(@centerLat)]) A circle with a radius given in arcminutes would be:: pgsphere.SPoint.fromDALI([ float(@ra), float(@dec), float(@radius)/60]) A polygon is constructed with longitudes and latitudes interleaved:: pgsphere.SPoly.fromDALI([@x1, @y1, @x2, @y2, @x3, @y3]) The DALI representation of a MOC is a string of whitespace-separated order/pixel lists. The details are in section 2.3.2 of the MOC specification. There are various ways in which you can come up with such strings. If you have a list of points, decide on the order you want your MOC to give the coverage on (6 is about one degree, and each order gives you a factor of two in linear resolution). Then, make circles of about that resolution around your points, turn these into circles and join the MOCs. For instance:: 10 circles = [] for x, y in @points: circles.append( pgsphere.SCircle.fromDALI([x, y, 6/order])) m = circles.pop().asSMoc(order=order) for c in circles: m.moc += c.asSMoc(order=order) @s_region = m This is based on the fact that a ``pgsphere.SMoc``'s ``moc`` attribute (currently) contains a pymoc.MOC instance; even if we change the backend implementation (which is likely), the pattern should keep working. Similarly, for a bunch of polygons, you could do this:: 10 @s_region = pgsphere.SPoly.fromDALI(polys[0] ).asSMoc(order) for vertices in polys[1:]: @s_region.moc += pgsphere.SPoly.fromDALI(vertices ).asSMoc(order) Note that the HEALPix library DaCHS uses currently (2020) cannot deal with non-convex polygons and may slightly over- or underestimate the coverage of circles and polygons. If either limitation bothers you: The `CDS HEALPix library`_ provides a nice alternative for which there is a convenient wrapper in `CDS-HEALPix-python`_. If you produce an ASCII MOC from whatever you compute using these tools, you can pass them to ``pgsphere.SMoc.fromASCII``. See also :samplerd:`openngc/q` for sample usage of the relatively naked rust HEALPix library. .. _CDS HEALPix library: https://github.com/cds-astro/cds-healpix-rust/ .. _CDS-HEALPix-python: https://github.com/cds-astro/cds-healpix-python/ Skipping Things ''''''''''''''' Sometimes there are a few “broken” things in material that is otherwise just fine: A couple of broken FITSes, or some lines of a CSV that should not make it into a database table. While the ``-c`` flag to ``dachs import`` will let you conveniently ignore individual sources (note that this is rather certainly not what you want with the bad rows), it will also hide genuine errors and thus should only be used in development. In production, you should skip the bad sources or rows. Skipping Sources ................ You can skip known-bad sources by accref using :dachsref:`Element ignoreSources`, using the ``fromfile`` attribute. More commonly, however, you can tell sources to skip by criteria like certain items in the file names, the presence or absence of certain FITS headers, or similar. The preferred method to do that these days is raising ``SkipThis`` in a row filter. For instance, to skip all files that have ``_failed.`` in their name:: if "_failed" in rowIter.sourceToken: raise SkipThis("Ignoring failed observation") yield row ... While you could just as well make the ``yield`` conditional on the negation of your condition, raising ``SkipThis`` makes the intention clearer and may (some day) give clearer indications of what is going on in the UI. If you want to skip whenever a certain header is missing, do something like:: if not "CD1_1" in row: raise SkipThis("Ignoring uncalibrated observation") yield row ... In case you need to ward off totally broken FITS (or other) files that crash the FITS parser, you cannot use a row filter because that will only run once the parser has run. However, you can abuse the `Source Fields`_ computer to skip them before the FITS parser sees them:: try: _ = pyfits.open( os.path.join( base.getConfig("inputsDir"), sourceToken)) except Exception as ex: raise SkipThis("Skipping utterly broken {}: {}".format( sourceToken, ex)) return {} – though my advice in such a situation would be to fix the broken files using an external processor (see :dachsdoc:`processors.html`). DaCHS still contains a declarative facility for skipping sources in the form of :dachsref:`Element ignoreOn` and :dachsref:`Triggers`. Although experience has shown these to be too unflexible to be worth it, still feel free to use them – we will not remove them any time soon. Skipping Rows ............. To skip individual rows of a source, raise ``IgnoreThisRow`` in a procedure application, like this:: if int(@colX)>22: raise IgnoreThisRow() Note that if you raised ``SkipThis`` here, you would skip the entire rest of the source, which is probably not what you want. Row makers also admit the :dachsref:`Element ignoreOn` mentioned in `Skipping Sources`_; as for them, we believe the declarative means rather typically are are not flexible enough, and skipping using Python code is usually about as clear and evolvable. More on Grammars ---------------- As discussed in `Parsing Input Data`_, grammars are what turn just about any kind of input into neat sequences of Python dictionaries that are then processed (and possibly multiplied) in row filters; whatever comes out of this is then handed over to row makers and turned into database rows. There are many :dachsref:`grammars available` in DaCHS. This section discusses some of them in a more leisurely fashion than the terse descriptions in the reference documentation, and it covers a few additional aspects of what can go on in any grammar. Source Fields ''''''''''''' It is a fairly common situation that you want to make available certain properties of a source file to the row makers; the most common case is that the data providers have encoded information like object names, times, or whatever, in file names, or that there is some auxiliary information in a different files. In these situations, it is often efficient to use :dachsref:`Element sourceFields`. It contains a standard procedure definition (i.e., you could predefine those and bind parameters), but usually you will just fill in the code. This code is called once for each source processed, and receives the ``sourceToken`` as argument. It must return a dictionary, the key/value pairs of which will be added to all rows returned by the row iterator. In addition to the ``sourceToken``, you also have access to the data that will be fed from the grammar. This can be used to, e.g., retrieve the resource directory (``data.dd.rd.resdir``) or data descriptor properties (``data.dd.getProperty("whatever")``). As a trivial example, here is how to only parse a field identifier from a source name once per source (rather than have this expression in a row maker and to that once per row):: return {"field": int(sourceToken.split("bul_sc")[-1].split(".pm.gz")[0])} A more complex example, taken from :samplerd:`lightmeter/q`, looks up some calibration values from a different table based on an identifier again pulled from the source file – clearly, you wouldn't want to do one query per row when there's thousands of rows coming from one source:: stationId = os.path.split(os.path.split(sourceToken)[0])[1] if re.match("[0-9]+$", stationId): raise base.SkipThis("Archive directory") try: with base.getTableConn() as conn: stationPars = list(conn.query( "SELECT calibA, calibB, calibC, calibD, timeCorrection" " FROM lightmeter.stations" " WHERE stationId=%(stationId)s", locals()))[0] calibA, calibB, calibC, calibD, timeCorrection = stationPars except IndexError: raise base.ValidationError("No station data for %s. Import" " stationsdata first." %stationId, "stationId", stationId) if calibA is None: raise base.SkipThis("No calibration for %s."%stationId) srcKey = "/".join(utils.getRelativePath(sourceToken, base.getConfig("inputsDir"), liberalChars=True).split("/")[2:]) return locals() On getTableConn, see :dachsref:`Database Queries`. Note, however, that you need a special pattern when you want to query the table that you are filling. This is because ``sourceFields`` run in the midst of a transaction updating the table. So, something like:: with base.getTableConn() as conn: conn.queryToDicts("select ... from ") will wait for the transaction to finish. But the transaction is waiting for data that will only come when the query finishes – this is a deadlock, and dachs imp will just sit there and wait (see also :dachsdoc:`commonproblems.html#deadlocks`). To get around this, you need to query using the data's connection. So, instead write:: yield data.connection.queryToDicts(...) preFilters '''''''''' DaCHS Grammars parsing from files have a ``preFilter`` attribute. This can contain a pipeline that has to read from stdin and produce its output on stdout. The typical use case is to decompress input files without bothering DaCHS, e.g., using ``preFilter="zcat`` (for which there's the historical ``gunzip="True"`` alias – don't use that any more), ``preFilter="bzcat"``, or ``preFilter="xzcat"``. However, if you have some weird format for which there is a dumping tool that produces something that can be turned into CSV from a stream coming in from stdin something like the following provides an elegant way to get the data in with minimal code:: reGrammars '''''''''' The :dachsref:`Element reGrammar` is the swiss army knife of text-parsing grammars when neither :dachsref:`element columnGrammar` nor `csvGrammars`_ will do. The idea here is that you give one regular expression each to separate the file into records and to split the the records into fields, and that you simply enumerate the names used in the mapping. In the simplest case – whitespace separated columns in lines containing no other whitespace, with one comment line at the top –, such a grammar could be specified like this:: raMin, raMax, decMin, decMax, EVI, AV, AI reGrammars have a few tricks built-in to make them a bit more versatile. The following lets you simulate properly parsing some bzipped database dump by making some strong assumptions:: "?,(\s*")? \((.*)\) primflag, measure_no, paper_id, source_no, source_name The ``recordCleaner`` used here is an RE matched against every record and that contains exactly one capturing group. If it matches, the group will be used to apply the separators; in this case, it simply discards the parentheses you have in your typical database dump. Records not matching at all are discarded (which, admittedly, makes it easy to miss errors in ways that will lose data silently). An alternative, and somewhat safer, way to get rid of junk in the input is to use the ``commentPat`` attribute. That is an RE that is removed from record before trying to parse it. The reference documentation mentions the classic case: ``(?m)^#.*$`` will throw away lines starting with a hash. Another gung-ho feature of reGrammars is the ``lax`` attribute. Normally, reGrammars will fail on records that do not have exactly one field per name you put in. Some ugly input may skip “optional” fields at the end – with ``lax="True"``, DaCHS will map those to empty strings. Finally, sometimes files have junk at the end. This is when ``stopPat`` comes in handy: When a record matches that RE, the entire remainder of the input is ignored. fitsProdGrammars '''''''''''''''' This grammar exposes FITS headers as rawdicts. Since both represent essentially the same data structure – sequences of key-value pairs, you can get away with just:: – and that would cover a lot of use cases that read FITS images. If you have FITS tables, this grammar is not for you; use :dachsref:`Element fitsTableGrammar` instead, and if you have large FITS tables, have a look at `Grammars in C`_ (even if you don't know C). DaCHS uses pyfits to parse the headers and thus supports all the major conventions (CONTINUE cards are transparent, as are HIERARCH-style long keywords). Neither HISTORY nor COMMENT cards are present in the rawdicts. There are some snags, anyway. For example, DaCHS by default reads the header bytes using specialised code that is somewhat more robust and has more desirable behaviour with odd files than the pyfits one (you can deselect it by setting ``qnd="False"`` if necessary). This gives up collecting header blocks if it has not found the end card after ``maxHeaderBlocks`` blocks. The default ``maxHeaderBlocks`` is chosen to be 40, which is reasonable for reasonable FITSes. However, we have seen in the wild massive abuse of comment cards to hold entire tables. If you're unlucky enough to have to handle such files, you may have to raise this. You also have to use ``qnd="False"`` if you parse from compressed FITS images. It is fairly common for FITS keywords to contain a dash (``-``). Since rawdict keys are supposed to be python identifier (e.g., for @-referencing), fitsProdGrammars translate these to underscores. If further cleanup is necessary, there is :dachsref:`Element mapKeys` that lets you write things like:: PROP NAM The element should only be used to fix crazy names. Actual mapping of names should be performed in rowmakers, because that's where you will look for them in two months. FITS files are somewhat more complex, and fitsProdGrammars expose some of this. If you need to parse from a header other than the primary one, give the 0-base extension number in `hdu` attribute. For the full power of pyfits (but also all incompatibilities that are introduced by using that power), there is the ``hdusField`` attribute. This gives the key under which the pyfits HDU is visible in the rawdict. Be warned that using this might make your grammar *dramatically* slower, in particular when it operates on gzipped FITS files (which are, incidentally, transparently supported if their file names end in “.gz”). csvGrammars ''''''''''' csvGrammar parses from files containing comma separated values. It actually is a thin wrapper around python's csv module, and thus it is fairly forgiving about the idiosyncrasies of CSV (e.g., quotes around all, some, or no values). You can configure the delimiter character via the same-named attribute (it defaults to, expectably, comma). CsvGrammars by default expect input files to follow the convention that the first row contains the column names, and it will use whatever is there as rawdict keys. If these names are not usable as Python identifiers, use :dachsref:`Element mapKeys` to map them to other keys so you can have nice @-references in your row maker. If the first CSV line does not contain the names, you must use the ``names`` child to give these names, like this:: The csvGrammar will not complain if there are more fields in the file than you give names for; any extra values are reachable through ``@NOTASSIGNED`` (which should probably only be used to catch exceptional circumstances). If your input files follow the (silly) convention of having the colum names in a comment in the first line, like this:: # ra, dec, mag 13,2, 14.45, -0.33 you have to give ``names`` as above and make the csvGrammar ignore the first line with ``topIgnoredLines="1"``. There is currently no way to ignore comments in any other way; if you need that, you should probably retrofit that functionality in Python's csv module. Python-defined grammars ''''''''''''''''''''''' Sometimes inputs are so nasty that DaCHS' grammars really can't be used to parse them. For these situations, DaCHS has :dachsref:`Element customGrammar` and :dachsref:`Element embeddedGrammar` that let you just slap in pieces of Python yielding dictionaries as grammars. First off: before you go there, think hard if one of DaCHS' built-in grammars really can't do the job (including :dachsref:`Element binaryGrammar`). Long experience has shown that custom code is much more likely to break in the evolution of your system, Python, and DaCHS itself. Custom grammars come in modules of their own, which is a good thing because you can test them outside of DaCHS. But they break the principle of “all you need is in the RD”, so don't use them unless you must. An introduction to them is in :dachsref:`Writing Custom Grammars` in the reference documentation. Embedded grammars are somewhat more light-weight. They in particular come in handy when you need to generate a few rows, like this:: "a"*4 for i in self.sourceToken: yield {'index': i, 'data': testData} Essentially, they are usual DaCHS procedures that ``yield`` rawdicts in whatever way you like. They see the source to be parsed in ``self.sourceToken``. While this usually is a file name (that you woujld then pass to ``open``, you can also use the string for something else, as is done here. Combined with:: abcd5 the grammar above would produce the rawdicts:: {'index': 'a', 'data': 'aaaa'} {'index': 'b', 'data': 'aaaa'} {'index': 'c', 'data': 'aaaa'} {'index': 'd', 'data': 'aaaa'} {'index': '5', 'data': 'aaaa'} Embedded grammars are intended to be used for small and simple tasks, and if anything goes wrong, the error messages produced by default are not very useful. So, when things get a bit more complicated, program defensively and keep a note of where in the source you are, somehwhat like this:: with open(self.sourceToken) as f: for line_number, line in f: try: ... except Exception as ex: raise utils.SourceParseError( "Bad line: {}".format(ex), offending=line, location=str(line_number+1), source=self.sourceToken) Grammars in C ''''''''''''' If you have large data collections (millions to billions of rows), going through the all the steps of parsing, row making, and postgres ingestion with lots of Python in between can become prohibitively slow. In such cases, you can use :dachsref:`Element directGrammar` (or “a booster” for short). They let you bypass everything DaCHS and almost everything Postgres and produce binary dump material that is just slurped into the database at maximum speed. This normally requires some knowledge of C, although DaCHS tries to help you if you're not terribly C-savvy. The one exception are FITS binary tables; DaCHS can generate C code for pulling them in without further intervention. Boosters are a longer story, which we tell in :dachsdoc:booster.html. More on CondDescs ----------------- CondDescs were introduced in `Service Definitions`_ above. Since they are powerful tools when building service interfaces, we cast a second look at them here. Automatic and manual control '''''''''''''''''''''''''''' The standard idiom introduced above:: adapts to protocols; for instance, if somecol is a floating point variable, the condDesc will accept VizieR-like expressions in HTML forms, PQL-like intervals (“min/max”) in legacy DAL protocols and DALI-like intervals (“min max”) in modern DAL protocols. Sometimes you need more control, for instance, when there should always be just one single string regardless of what renderer the service is used through. In that case, define the input key directly:: The extra attribute is there to make sure that even a column that is required in the table remains optional in the query (you can skip that if your column is not required in the first place). You can also define the input key from scratch:: If you add an enumeration of possible values, like this:: DaCHS will try to communicate the choices to clients; in HTML forms, for instance, this will translate into a pop-up menu. Limiting condDescs by renderer '''''''''''''''''''''''''''''' In order to let you emulate ``buildFrom`` behaviour of making different input key types depending on the renderer even when you cannot use ``buildFrom``, there are two properties on inputKeys are defined to only show inputs for certain renderers, viz., ``onlyForRenderer`` and ``notForRenderer``. Both have single strings as values. This is intended mainly for cases like SIAP and SCS where there are “human-oriented” versions of the input fields available. The built-in SCS and SIAP conditions already to that, so you can give both scs and humanSCS conditions in a core. Here is how you would define an input key that is only used for the form renderer:: form Phrase makers ''''''''''''' In all the condDescs we have seen so far, the SQL generated becomes conjunction (i.e., OR expression) of the condDesc's input key's individual SQL expressions – unless you set the ``joiner`` attribute to AND, that is (incidentally, constraints from different condDescs are always combined using AND in DaCHS). The SQL expression derived from an input key is a simple comparison for equality for plain types, a test for overlap for interval-valued parameters, and even more complex expressions for VizieR-like expressions in the form renderer. For complete control over what SQL is generated, condDescs may contain code in :dachsref:`Element phraseMaker`. This, again, is a procedure application, quite like with rowmaker procs, except that the signature of condDesc code is different. Phrase maker code has the following names available: * inputKeys – the list of input keys for the parent CondDesc * inPars – a dictionary mapping inputKey names to the values provided by the user * outPars – a dictionary that is later used as the parameter dictionary to the query. The code should amend the outPars dictionary with the keys mentioned in the conditions. The conditions themselves are yielded. So, a very simple condDesc with generated SQL could look like this:: outPars["xxyy"] = "x"*inPars.get("val", 20) yield "someColumn=%(xxyy)s" However, using fixed names in outPars is not recommended, if only because condDescs could be used multiple times. The recommended way uses the ``base.getSQLKey(name, value, outPars)`` function, where name is more or less arbitrary but should usually be the input key name, value is what you would like to see in the SQL, and outPars is a dictionary that you get passed into your phrase maker. All that you should do with outPars is pass it to this function. ``getSQLKey`` will return a key unique to the query in question and enter the value into the outPars dictionary under that key. While that sounds complicated, it is actually rather harmless, as shown in the following real-world example that lets users input date, time and an interval in split-up form (e.g., when you cannot hope anyone will try to write the equivalent vizier-like expressions):: baseTS = datetime.datetime.combine(inPars["date"], inPars["time"]) dt = datetime.timedelta(minutes=inPars["within"]) yield "date BETWEEN %%(%s)s AND %%(%s)s"%( base.getSQLKey("date", baseTS-dt, outPars), base.getSQLKey("date", baseTS+dt, outPars)) STC coverage ------------ Most resources you will publish have a certain coverage in space, time and spectrum. The Registry can represent this coverage, and this is obviously really helpful in many discovery operations. So, if you can, declare your resources' coverages (or “footprints”). In DaCHS, you can do that manually using the children of :dachsref:`Element coverage`. These are: * ``spatial``: This contains something called an ASCII MOC, which is explained in the `MOC Recommendation`_ . Coming up with those essentially always requires some tooling, so unless you really want to say “all sky” (which is ``0/0-11``) you should probably let DaCHS work it out (see below). * ``temporal``: You can have multiple of those; each needs to be a space-separated pair of either floats (interpreted as MJD) or DALI-conforming ISO strings (YYYY-MM-DD). For many tables, DaCHS can automatically work this out, but it is not very diligent and will just use the minimal and maximal time within the resource. In particular when you have resources with multiple observation runs, it is preferable to manually enter them. * ``spectral``: Again, DaCHS lets you have multiple of those, and they are pairs of floats giving spectral coordinates in Joule (in case you are wondering: it's energy so it properly works for non-EM messengers as well, and it's Joules because that sucks for everyone except ultra-high-energy cosmic ray people who deserve a break because they have so few messengers). Giving these manually is a particularly good idea if you have several narrow bands, because DaCHS will not notice that when automatically determining the coverage. For many common table types (SSAP, Obscore, SIAP, SCS), DaCHS can automatically work out the coverages (with the caveats mentioned above). To make it do that, use :dachsref:`Element updater`, which lets you define the table DaCHS should derive the coverage from (or, for odd cases, several tables for the various axes). The results of these coverage estimations are stored in the database table ``dc.rdmeta``. Note there can only be one coverage per RD. This means that if to things have different coverage, they must be in two different RDs. You can mix and match manual and automatic coverages; DaCHS will not overwrite manually given coverage information. Hence, if you say:: 3.02e-19 3.03e-19 1.63e-18 1.64e-18 you would let DaCHS try to determine the temporal and spatial coverage from the ``main`` table, but you would say that your spectral coverage is around Lyman α and Balmer α. .. _MOC Recommendation: http://ivoa.net/documents/MOC/ Writing Examples ---------------- As part of the `DALI standard`_, there is a convention for giving examples for service usage; the idea is to produce HTML that is useful for human consumption with additional, RDFa-based, markup to let clients figure out how to fill their interface forms; the prime example of where that works nicely is in TOPCAT's TAP client (check the examples button, the service-provided entry). DaCHS lets you write such examples in ReStructuredText with some extra markup that is turned into the machine-readable semantics. TAP examples '''''''''''' The most immediately useful examples are those for TAP/ADQL. While examples normally reside in the service elements they illustrate, that doesn't work for TAP, because the service is defined in the built-in RD ``//tap``. Rather than change that (with all the trouble that would give when upgrading DaCHS), instead add the TAP examples in userconfig.rd; to prepare for that, see `The userconfig RD`_. Once you have your userconfig.rd, locate the STREAM with the id ``tapexamples``. One example is already given in the userconfig.rd of the distribution; take that as a template for your own examples. Each example resides in an ``_example`` meta element. This has a mandatory ``title`` attribute, and its content format is fixed to ReStructuredText. The built-in example looks like this:: To locate columns “by physics”, as it were, use UCD in :taptable:`tap_schema.columns`. For instance, to find everything talking about the mid-infrared about 10µm, you could write: .. tapquery:: SELECT * FROM tap_schema.columns WHERE description LIKE '%em.IR.8-15um%' In addition to title, the only thing mandatory is a block marked ``.. tapquery::`` that contains the ADQL query you want to show. A typical client would fill this into whatever its UI provides to write queries. Of course, you should explain what the query does and how it does this. ReStructuredText lets you have basic markup, including, if necessary, section headings if your example gets long enough. Optionally, you can give the client a hint what table the example pertains to using the ``:taptable:`` interpreted text role. A client would typically use that to restrict the display of the example to states in which it assumes the user wishes to runs queries against that particular table. Note again that DaCHS does not pick up changes to ``userconfig`` automatically. Hence, after adding or changing examples, you have to run:: dachs serve exp % //tap on the server (if you have set `The Admin Password`_). To see your shiny new example(s), point your browser to ``/tap/examples``. Datalink examples ''''''''''''''''' DaCHS also contains provisional support for examples associated with datalink services. We cannot recommend a client to try them with at this point, but of course having services with useful examples will make client support more attractive. To add an example to a datalink service, add an ``_example`` meta with a title attribute to the service definition; for instance:: FEROS Datalink Service On published datasets like :dl-id:`ivo://org.gavo.dc/~?feros/q/f04031.bdf`, this service lets you to cutouts, translations into FITS binary tables, ASCII, and possibly more, as well as simple recalibration. There is only one interpreted text role in there so far, ``dl-id``. That's a PubDID the service will generate a datalink document for. To see the example, point your browser to the service URL with the examples renderer. If the above fragment were in the RD ``flash/q``, the URL would thus work out to be ``/flash/q/sdl/examples``. Since DaCHS picks up changes to “normal” RDs automatically, you will immediately see changes to your example text. Generic examples '''''''''''''''' The `DALI standard`_ defines a term *generic-parameter* which can be used to annotate all kinds of services; this may come in particularly handy with the api renderer. To use it, you can write something like:: Publisher dataset identifiers have a query part, but the IVORN part still has to resolve: :genparam:`uri(ivo://org.gavo.dc/~?feros/data/f89411.vot)` New-style standardIds use fragments to refer to standard keys within vstd:Standard records, as in :genparam:`uri(ivo://org.gavo.dc/std/glots#tables-1.0)` As you can see, generic parameters are defined using a ``genparam`` interpreted text role; the content of that role is the parameter name and then, in parentheses, the parameter value. Don't worry if your parameter value has a closing parenthesis in it; DaCHS only recognises the parenthesis that ends the genparam if it is at the end of the interpreted text. Note that such examples best sit in the service rather than top-level in the RD; if they were direct children of the RD, they would appear in all services defined in the RD. DaCHS does not yet have support for the *capability* property defined by DALI. Ask if you need it. Hierarchical Examples ''''''''''''''''''''' In particular for TAP, you may collect so many examples over the time that a flat list becomes unwieldy. In that case, you can collect examples in stand-alone DALI examples documents (which are XHTML) or other services and reference these extra examples collections from, say, the TAP examples. You can co-locate these within other services or create an extra examples-only service, for instance:: RegTAP examples This service has the examples from the RegTAP spec in the form of a DALI examples document. As the capability type is in :taptable:`rr.capability`, whereas the access URL can be found from :taptable:`rr.interface`, this requires a (natural) join. .. tapquery:: SELECT ivoid, access_url FROM rr.capability NATURAL JOIN rr.interface WHERE standard_id like 'ivo://ivoa.net/std/tap%' AND intf_role='std' AND authenticated_only=0 Translating the intention into “find all SIA services that mention spiral in either the subject (from :taptable:`rr.res_subject`), the description, or the title (which are in :taptable:`rr.resource`)“, the query would become: .. tapquery:: SELECT ivoid, access_url FROM rr.capability NATURAL JOIN rr.resource NATURAL JOIN rr.interface NATURAL JOIN rr.res_subject WHERE standard_id like 'ivo://ivoa.net/std/sia%' AND intf_role='std' AND ( 1=ivo_nocasematch(res_subject, '%spiral%') OR 1=ivo_hasword(res_description, 'spiral') OR 1=ivo_hasword(res_title, 'spiral')) If this were within the RD rr/q, you could then amend your ``tapexamples`` STREAM in the userconfig with:: moreExamples: rr/q#ex moreExamples.title: RegTAP queries Suffiently advanced clients will pick this up and display it in something like a submenu. You can repeat moreExamples, and you can use URLs rather than service references in there (provided there are indeed DALI examples documents at those URLs). For instance, to include the Heidelberg obscore examples (which may work for you to some extent), you could add:: moreExamples: https://dc.g-vo.org/static/obscore-examples.shtml moreExamples.title: ObsCore queries Inspecting the Triples '''''''''''''''''''''' The best way to make sure your RDFa actually says what you intend it to say is to transform it to the Turtle_ (alias N3) format, which is reasonably human-readable when you have grokked the basics of RDFa (`recommended reading`_). .. _Turtle: https://en.wikipedia.org/wiki/Turtle_(syntax) .. _recommended reading: http://eprints.gla.ac.uk/101484/ There used to be a `simple web service`_ for this kind of thing, but it seems it has fallen into disrepair, and in general RDFa tooling at this point seems somewhat marginal. In early 2023, what I did was create a virtual python environment (see virtualenv) and then running:: pip install pyrdfa requests .. _simple web service: http://www.w3.org/2012/pyRdfa/Overview.html (don't do that with sudo: only use pip with virtual environments). Then you can run a program like this:: from rdflib import Graph g = Graph() g.parse("http://dc.g-vo.org/tap/examples", format='rdfa') print(g.serialize(format="turtle")) (changing the URI as appropriate). Debugging --------- Writing RDs is like programming, and sometimes it involves actual programming in Python. Hence, you'll get things wrong, and DaCHS probably will annoy you. Giving good error messages – neither drowning you in a deluge of mostly-useless information nor swallowing important pieces of data, neither claiming too much nor to little, not misleading you about what is actually going on – is a high art, and we are aware we should be doing better. Your feedback will help us improve. Still, with a few hints and techniques figuring out what's wrong isn't much harder in DaCHS than with your average programming system. In the following, we collect some hints on what to do if things don't work. General Hints ''''''''''''' * If you get error messages, be sure to check our `hints on common problems`_ – many commonly encountered problems, in particular the ones that give particularly baffling messages, are explained there with suggestions for how to fix them. * The dachs command takes a ``--hints`` switch. With it, error messages are frequently accompanied with – you guessed it – hints on what might cause the problem and possible solutions. Note that ``--hints``, like the other debugging switches, goes between ``dachs`` and the subcommand name, as in ``dachs --hints serve``. * Validate your RD. This is, in general, a good idea before doing anything with the RD, since it will allow you to more easily catch errors than the in all likelihood even more byzantine error messages that may arise when something goes wrong later. The ``dachs val`` subcommand takes one or more RDs. If you don't understand its output, complain to gavo@ari.uni-heidelberg.de – the command is really intended to help you catch errors, and if it doesn't do so, it's a bug. * Check the logs. They are in /var/gavo/logs by default, and there's dcErrors and dcInfos (you'll want to look at both). * When hunting bugs, it's usually a good idea to enable the logging of (many) tracebacks by passing the ``--debug`` flag to ``dachs``. * If you're trying to figure out server behaviour, don't run the server daemonized but use ``dachs serve debug`` instead; this won't detach and log to stdout. The user executing the serve command needs to be in the ``gavo`` group, or you'll get permission problems. * With this (or if the problem is in non-server DaCHS code in the first place), the Python debugger is your friend. The ``dachs`` command has an option ``--enable-pdb`` that will dump you into the debugger where an uncaught exception happened (as the server should never let an uncaught exception through, that's not useful with ``dachs serve``). If that doesn't help you, you can set breakpoints (e.g., in your own in-RD procedure defininitions) by writing ``import pdb; pdb.set_trace()``. * To see what SQL is actually sent to the database, set the GAVO_SQL_DEBUG environment variable to any value. This could look like this:: env GAVO_SQL_DEBUG=1 dachs imp q create Don't be too alarmed by the deluge of queries you'll see while DaCHS starts up; this is just making sure that the tables DaCHS cannot live without are present. * If ``dachs serve start`` doesn't actually cause the server to run, something went wrong after detaching from the controlling terminal. The messages are in the logs directory, in the file ``serverStderr``. Debugging Rowmakers ''''''''''''''''''' When debugging rowmakers, it helps keeping in mind that there are two dictionaries that play a role here: The incoming rawdict and the rowdict that is being built. Having a clear image which key is where at which point is helpful. Also, it is important to remember that first all ``var`` elements are processed, then all ``apply`` elements, and finally all ``map`` elements. Within each class, elements are processed in the order of definition, with ``simpleMaps`` and ``idmaps`` after all explicit maps. Since only applys contain larger amounts of code, debugging will mostly focus on them. If you want to look at one of your own applys, just set breakpoints as discussed above (``import pdb; pdb.set_trace()``). If you'd just like to have a look at what ``vars`` and ``results`` look like at some point, just add a ````. This will, again, drop you into a debugger. One thing that may baffle you when you write row makers are unexpected primary key clashes (or indeed, violations of other constraints enforced by the database) or just failures where invalid data is passed to the database. Since DaCHS ships out rather large batches of rows (5000 by default) to the database if you let it, the errors will typically occur far from the lines that cause them. In such a situation, pass ``-b 1`` to ``dachs imp`` – this will slow down the import quite dramatically, but the import will fail exactly at the source or row that causes the problem. Debugging Services '''''''''''''''''' Debugging services is of course an extra challenge since code runs deep in the bowels of a complex system, with timeouts, threads, and all kinds of nastyness involved. The first adivce is: Pull whatever is mysterious into your development system and run ``dachs serve debug``. You can even drop into the python debugger while the server processes a request (the server will of course block while you are in the debugger). It's a bit tricky to communicate with the debugger in between the log messages of ``dachs serve debug``, and there's no readline support since pdb doesn't think it's running within a terminal there, but it's definitely doable. Note that in python3, the debugger by default changes SIGINT processing so pressing Control-C will drop you back into the debugger. With ``dachs serve debug``, that behaviour is unwelcome, because it makes terminating the server a bit of a steeplechase. Hence, when setting breakpoints in server code, use:: import pdb;pdb.Pdb(nosigint=True).set_trace() Another challenge is that sometimes problems manifest themselves in a running server. In that case it's sometimes useful to open a manhole into the server. One reasonably convenient way to do this is to put a special RD somewhere (e.g., ``__tests/manhole.rd``) and use some RD-embedded code to introspect the server. We usually use a datalink service for this since it keeps things nicely self-contained – an obvious alternative with less bending could be a python core returning a pair of ``text/plain`` and a string. Here's how something that fiddles out a column property would look like:: prop from gavo import base col = base.resolveCrossId("arihip/q#main.hipno") payload = "Result: %s"%( getattr(col, inputTable.getParam("prop"), "(err)")) return ("text/plain", payload.encode("ascii")) See :dachsref:`the qp renderer` for how the ``queryField`` magic is to be understood. The net effect is that looking at http://localhost:8080/manhole/cq/qp/type you can find out what DaCHS things the type of arihip.main's hipno column is – and you can try description, ucd, or whatever in that last URI segment, too. Note that you can edit manhole.rd, save it, and reload the page; the server will notice your changes and display the new result without a restart. Case Studies '''''''''''' In this section we will damage the arihip RD in various ways, show you how the errors manifest themselves, and how you could try and figure them out. It's highly recommended to play the scenarios; it's very well possible that the actual messages will look differently for you if we've changed and hopefully improved the software. We're thankful if you point us to outdated examples. XML syntax errors ................. These are mostly easy to diagnose (and fix), except that sometimes the errors show up too late. For an example of a dramatic failure, delete the closing tag of the ``source`` meta. The result then is:: arihip > dachs imp q *** Error: In /home/msdemlei/gavo/inputs/arihip/q.rd: mismatched tag: line 674, column 2 In this particular case, a dedicated XML validator gives more helpful diagnostics:: /arihip > xmlstarlet val -e q.rd q.rd:674.12: Opening and ending tag mismatch: meta line 72 and resource ^ q.rd - invalid The reason DaCHS does so bad here is that ``meta`` elements are allowed to have all kinds of children (for typed meta). DaCHS does better when you add some random XML fragment, e.g., the wonder element in the next example::
foo This time, DaCHS gets it pretty much right:: msdemlei@victor:/home/msdemlei/gavo/inputs/arihip > dachs imp q *** Error: At /home/msdemlei/gavo/inputs/arihip/q.rd, (112, 4): table elements have no wonder attributes or children. Well-formedness problems frequently turn up with embedded python code or reStructuredText markup. To see what happens then, delete the opening CDATA sequence in:: dachs imp q *** Error: In /home/msdemlei/gavo/inputs/arihip/q.rd: not well-formed (invalid token): line 411, column 24 If you look up the indicated line, you will see some table markup. Now that you are warned, you will probably see immediately that there is a less-than sign there, and that is not allowed in XML parsed character data. While writing reStructuredText (or Python, for that matter), it is easy to forget that < and & are magic to XML. CDATA is your friend for embedded formal languages. Now for a particularly nasty syntax error that is not XML-related at all. To trigger it, go to the note meta with the ``src`` tag and remove whitespace from the second column line like this:: The srcSel field indicates which catalogue the astrometric solution was taken from using the following codes: That results in:: arihip > dachs --debug imp q *** Error: At /home/msdemlei/gavo/inputs/arihip/q.rd, (317, 4): Bad text in meta value (Bad indent in line u' was taken from using the following codes:') If you go to the position given, you'll notice it's the end of the meta element. What bugs DaCHS there is somewhat pythonesque. Both Python and reStructuredText are indentation-sensitive. In particular, even if you, say, indent all lines in a Python program by two blanks, the result will be invalid source code. Hence, DaCHS removes leading indentation from the content of all elements containing such material, and the amount of indentation is goverened by the second line, where the line with the opening tag counts; in the example above, DaCHS tries to subtract from every subsequent line the indentation of the line starting with “The srcSel field” (the first line is the one containing the tag). This kind of problem is particularly insidious if you are mixing blanks with tabs for indentation. Don't do this in python, don't do this in RDs. All this means that RDs can break if XML processing tools normalize whitespace in certain elements. This is a bit unfortunate since the way RDs are written, they are not even completely wrong in doing this. So, the bottom line is: you need careful instructions to standard XML processors if you actually want to *write* RDs. However, if you feel the need to write RDs programmatically, you're probably doing it wrong: RDs already generate things, and if another generation layer seems necessary, that would indicate a design problem *somewhere* – possibly in DaCHS. Rowmaker trouble ................ Since rowmakers are a fairly thin layer on top of Python, it is easy to elicit fairly confusing messages here. To see how this looks in an easy case, change the kbin mapping to:: parsWithNull(@kbin, str, "9") (note the missing e). The result is an error message that looks a bit frightening with rawdicts as large as in this case:: arihip > dachs imp q Making data arihip/q#import Starting /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz Done /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz, read 1 *** Error: Row alphaHMS -> '0 0 0.216002' ddeHIP -> '+ 0.93' [...lots of lines...] t_ra_mod -> '91.45' vrad -> '' Field kbin: While building kbin in : name 'parsWithNull' is not defined The many lines are a dump of the rawdict that is being processed; it is a bit bulky, but it frequently helps figuring out problems that only occur when there is an odd value in the incoming data. The “while building kbin” tells you where in the rowmaker something went wrong: At the mapping of kbin. The ```` part would be more informative if we had given the rowmaker element an id. Doing that obviously is advisable when you have multiple rowmakers in one RD, as for instance:: That woud improve the message to:: Field kbin: While building kbin in make_main_row: name 'parsWithNull' is not defined Now undo the changes and try another frequent problem by deleting the vrad mapping, i.e., removing the entire element:: parseWithNull( @vrad, lambda a:float(killBlanks(a)), "") This gives:: /arihip > dachs imp -M 12 q [...] Field vrad: While building vrad in None: could not convert string to float: + 8.3 What's going on? Something is going on with vrad; specifically, the string '+ 8.3' cannot be turned into a float (which is because it's not a valid float literal). But since we are no longer mapping vrad, why is the machinery even trying that? Well, this is a consequence of the ``idmaps="*"`` of this rowmaker. When there is a key vrad in the rawdict and a column vrad is required, the default transformation from string to float is tried (and this fails in this case). What happens if no such input key is present? To find out, additionally remove the line:: vrad: 188-195 from the column grammar's ``colDefs`` element. The result is:: arihip > dachs imp -M 12 q Making data import Starting /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz Source hit import limit, import aborted. Done /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz, read 13 Shipped 13/13 Create index Primary key on arihip.main Create index main_mv Create index main_q3c_main Rows affected: 13 – a clean import. However, all vrads in the table are now, of course, NULL:: arihip > dachs info arihip/q#main col min avg max hasnulls [...] vrad None None None True [...] – watch out for those. DaCHS does not consider it an error if a key is missing in the rowdict (which is in many situations desirable); the missing value is just replaced with None. You can forbid NULLs using the ``required`` attribute on the column vrad by editing it like this:: This now yields an error; note that the message is fairly generic, but if you consider that rawdicts are mappings, you will at least see the logic in it:: arihip > dachs imp -M 12 q Making data arihip/q#import Starting /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz Done /home/msdemlei/gavo/inputs/arihip/data/data.txt.gz, read 1 *** Error: Row alphaHMS -> '0 0 0.216002' [...] t_ra_mod -> '91.45' Field vrad: While building vrad in : Key 'vrad' not found in a mapping. Incidentally, the ``--enable-pdb`` main option to ``dachs`` sometimes is helpful in such situations, as it will dump you in the Python debugger at the place of the problem. Undo all changes to the arihip RD to continue. If you embed code yourself, the potential for challenging bugs is yet larger. To see how basic problems are reported, add:: ddt somewhere within the rowmaker. This yields:: arihip > dachs imp q [...] Field procf11fed4c: While executing procf11fed4c in : name 'ddt' is not defined The message coming from the bowels of python is clear enough (``'ddt' is not defined), but the alphabet soup surrounding it is not. This name was invented by DaCHS since no ``name`` (it's not id as the same name might be used in two different rowmakers) attribute was given to ``apply``. Try it; the error message will be much more palatable with it. The error message still could be more helpful, however; consider code like:: a = 1 b = 2/a c = 3/(a+b) d = 4/(a+b+c+1) e = 5/d The error message here is:: While executing proc985706c in None: integer division or modulo by zero So – where does this happen? To get an idea, you can pass the ``–debug`` flag to ``dachs``, which in the ``dcInfo`` log file at least yields:: 2013-09-23 15:57:37,304 [INFO 24430] Swallowed the exception below, re-raising Field proc985706c: While executing proc985706c in None: integer division or modulo by zero Traceback (most recent call last): File ".../gavo/rscdef/rmkdef.py", line 583, in __call__ exec self.code in self.globals, locals File "generated mapper code", line 13, in File "", line 9, in proc985706c ZeroDivisionError: integer division or modulo by zero The trouble is that the source code line isn't given, and the source that refers to isn't yet visible to you in this case. We promise to improve the management of the source code. Until then, frankly, the nicest way to debug stuff like this is to write:: import pdb; pdb.set_trace() ... and then use the python debugger to figure things out. Deleting Resources ------------------ Sometimes data collections become outdated, typically because new data has come in and an improved reduction even of existing data is made available (a “data release”). Our recommendation is to in most cases overwrite the previous data in order to avoid making people use known-bad data. Not infrequently, however, one wants to keep old data releases around, perhaps for easy comparisons, perhaps because software that is still in use uses the outdated resources. As an aside, the commonly advocated reason for keeping outdated data around that some archived workflow still produce identical results is, I claim, a weak reason to keep around terabytes of data. Be that as it may, at some point you should make stale data less discoverable, and perhaps at some point drop it altogether. As a general policy, we recommend to take data from old data releases out of obscore and similar globally discoveable resources, with the argument that whoever whats old data at least should expliticly look for it. Keeping SSAP or SIAP services one or two releases back published may be a nice service at this point, but an attractive alternative that puts outdated data out of casual users' view is to drop the old services and keep the data available through datalink on the successors. When dropping services and other published resources for one reason or other, you should give people a chance to find the updates. From a registry point of view, this is not trivial, because deleted services (which may be kept in the registry forever) do not have metadata. You can, however, stretch the rules a bit and give, in the updated service, a *Continues* relationship to the old ivoid. To set this up, look for all ``publish`` elements in the old RD, and look for their counterparts in the new RD. Then figure out the ivoids of the publihed resources in the old RD (the ``/browse`` trick introduced in `Trying the services`_ may help you do that), and in each counterpart add a *Continues* meta with the old ivoid in the ``ivoId`` attribute and the old title as a body, for instance:: Gaia Data Release 1 (DR1) gaia_source Then delete all ``publish`` elements in the old resource and ``dachs pub`` both the old and the new RD. After that, you can ``dachs drop`` the old RD; note that this will only drop data items automatically made. If you have DDs with ``auto="False"`` that are building tables (cf. ``dachs show dds``), you will have to explicitly name them in the drop. Do *not* ``dachs drop -f`` such RDs; that will remove the RD from DaCHS' memory and prevent the creation of deleted records, which in turn will lead to zombie services hanging around in the Registry. In case you're wondering why this is bending the rules: Well, you're not supposed to hand out ivoids of things that are not in the Registry, and the ivoid of the deleted service by definition isn't in the Registry any more. We will have to think about it a bit more in the VO. Now that you've done the right thing I can tell you that regrettably, very few people will actually notice it, because few people look for ivoids in the Registry to figure out updates. Instead, they'll just point their user agents at your old URIs, and after you've dropped the underlying database tables, they will simply 404 on your users without further comment. To fix this, give the old RD a ``superseded`` meta that ought to point to whereever the updated material is, for instance:: We do not publish Gaia DR1 data here any more. If you actually need DR1 data, refer to the full Gaia mirrors, for instance `the one at ARI`_. Otherwise, please use more recent data releases, for instance `eDR3`_. .. _the one at ARI: http://gaia.ari.uni-heidelberg.de .. eDR3: /browse/gaia/q3 Whatever you write there will be shown alongside the 404s when people try to access the services, and it will be prominently shown in the resource info. Regrettably, the message will not be shown when people follow links to tableinfos – for a dropped table, DaCHS does not remember where it might once have been defined, and thus it will not see the RD and just return an unspecific 404. Changing this would require quite a bit of bookkeeping which seems disproportionate for a relatively exotic case. If you just supersede a single service in a larger RD, you can also attach the ``superseded`` meta to a single service (which means you would keep its ``service`` element around, but you can give it a ``nullCore``). Restricting Access ------------------ For one reason or another, you may need to restrict access to services or data products for a while. To support this, DaCHS has a simple user and group management and lets you declare individual services or datasets as requiring authorisation. Right now, this is rather basic and only provides what is known as “mild security”, since it is based on HTTP basic authentication, which, unless you run HTTPS only, means that eavesdroppers can trivially find out passwords. Given that we are not dealing with sensitive information and snooping attacks on our connections don't sound likely, we consider that marginally ok. Still, admonish your users to not use valuable passwords in DaCHS. Or restrict your site to HTTPS access. Given enough client support, however, we will certainly add support for fancier authentication schemes (OAuth2?); the authorisation mechanisms discussed here should not be (materially) affected. At this point, recent versions of TOPCAT_, Aladin_, pyVO, and Splat_ support HTTP basic authentication, so if you can live without federation and accept that the various tools will request the credentials separately, your users can probably work with proprietary data reasonably well. User/Group Management ''''''''''''''''''''' The DaCHS user administration is fashioned a bit after the Unix user/group model, except that there always is a group corresponding to a user. To create a user and its group, use ``dachs admin adduser``, like this:: dachs admin adduser kroisos notsecret “Remove when xy is public” This command adds the user kroisos with the password notsecret and an optional comment reminding future operators what to do with the identity. To add existing users to groups, use ``dachs admin addtogroup``, like this:: dachs admin addtogroup kroisos happy – this adds kroisos to the happy group, and whoever can authenticate as kroisos will be allowed access to any products or services restricted to happy. To discover further commands manipulating the user table, try:: dachs admin --help *Important*: When you use authentication, please set the ``[web]realm`` configuration item to some string reasonably characteristic for your site. DaCHS sends that realm with the authentication challenge, and even if these days, it's not terribly common to show it any more since it's been abused in various ways, it may still be shown to users. Protecting Services ''''''''''''''''''' To password-protect entire services, use the ``limitTo`` child of the ``service`` element, for instance:: ... Any access to the service will then challenge the client to authenticate as a user belonging to the group given in ``limitTo``. Only one such group can be given. If you forsee the need for complex authorization schemes (rather than “there's one user on my system, and whoever's authorized get its credentials”), it is probably a good idea to create one user per service and add “real” users to the corresponding group as necessary. Embargoing Products ''''''''''''''''''' DaCHS' products subsystem has the notion of owners and embargo periods, which allows public services to deliver metadata on products during their proprietarity period, while handing out the data itself only to authorized clients. The embargo will automatically be lifted once the proprietarity period is over. To make a ”product” (e.g., spectrum or image) proprietary, in the :dachsref:`//products#define` rowfilter, set the ``owner`` and ``embargo`` keys. Owner is the name of a user created as described above, embargo must eventually become a timestamp, so you will in general come up with an ISO datetime string or a Python ``datetime.datetime`` instance. Here is an example that says images become public a year after the observation:: parseTimestamp(row["DATE_OBS"] )+datetime.timedelta(days=365) "danish" "danish.data" This is, in our view, an acceptable policy, but many observers want weird policies (try to talk them out of it, since such behaviour is not nice, and it leads to a bad user experience in the VO as a whole). You can get as fancy (or antisocial) as you like using custom rowfilters, as in the following example that sets a default embargo for the end of 2008, except for calibration frames and the observations of two objects made in 2003:: "maidanak" getEmbargo(row) "maidanak.rawframes" An embargoed product can only be retrieved by the owner until the embargo period is over. What you give as ``owner`` is a group name; if someone can authenticate as the member of a group, they can access the data – see above for details on how to create users and groups. Active Tags ----------- Sometimes RDs need to contain material in several places, or there are tags that basically are copies of each other with minimal variation; we have already introduced the case in the DaCHS Basics as `Metaprogramming: Macros and LOOPs`_. In the spirit of DRY_, it is desirable to avoid the repetition, if only because when the RD develops, you may edit some repeated item in one place but not the other, at which point you'll have a bug. .. _DRY: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself To avoid repetition (and some other minor things), DaCHS has active tags, which mark up elements within resource descriptor XML that do not directly contribute to result tree. Their typical use is to “record” event sequences and replay them later. In DaCHS, you can tell active tags from others by their all-uppercase names. Normal elements in DaCHS always have names starting with lower case letters. STREAM and FEED ''''''''''''''' When you have a sequence of elements that you need in several places, you can put these elements into an :dachsref:`Element STREAM` and replay them with :dachsref:`Element FEED` in as many places as you want. You have seen several examples of that in this tutorial, for instance, in:: To see what this plays back, look for an element with id ``coreDescs`` in the RD ``scs``, which in turn you can most easily access with:: dachs adm dumpDF //scs | less You will see that this element looks like this:: [...] The ``doc`` element is local to the STREAM; make sure that you use it to drop a few words on what the stream is supposed to do – your future self will be grateful. All other elements will end up exactly like typed there whereever there is a corresponding FEED. Another example where a simple duplication of elements is useful is in :samplerd:`fk6/q`, where a catalogue has been split into two parts. Both parts share a lot of the columns, but there are some extra columns specific to each part. To deal with this, the RD defines a STREAM:: //scs#pgs-pos-index Part I of the FK6 (successor to Basic FK5) 878 ...
... However, more commonly there will be slight variations between the replayed elements. Consider, for example, the license declaration we have seen above:: Since CC-0 recommends that you explitictly state what you distribute, we have to put in something different every time we replay the stream. If you look at the stream (``adm dump`` is your friend again), you will see how this works:: [...] \\RSTcc0{\what} http://creativecommons.org/publicdomain/zero/1.0/ The basis here is that all attributes to ``FEED`` are available for macro expansion. On the other hand, ``FEED`` does one level of macro expansion, which explains the double backslash in front of ``RSTCcc0``: after the first macro expansion, this becomes a single backslash. Hence with ``what="ARIHIP"`` as above, the ``FEED``'s net effect is the same as if someone had typed:: \RSTcc0{ARIHIP} http://creativecommons.org/publicdomain/zero/1.0/ Another example for a parameterised ``STREAM`` is in :samplerd:`sp_ace/q`, where multiple output fields of a code have possibly asymmetric confidence intervals:: A param with a confidence interval (give name, ucd, unit, tablehead and description and get three params including asymmetric error bars. This is used later on like this:: LOOP '''' :dachsref:`Element LOOP` takes the idea of having multiple ``FEED`` elements further and makes it easy to generate lots of fillers, possibly even programmatically. The simplest way to use ``LOOP`` is by giving a space-separated list of “items”:: The ``events`` child of the ``LOOP`` element is something like an immediate ``STREAM``; you could use a stream instead and reference it in a ``source`` attribute, but that's rarely easier to read. These events are then replayed to the parser for each item in the LOOP's ``listItems`` attribute. Each occurrence of the ``\item`` macro is replaced with the current item. So, in the resulting RD tree, the fragment above will have the same result as:: Sometimes the list items are used in multiple places in the same document. To avoid having to maintain multiple lists, you can define macros using RD's ``macDef`` element; this could look like this:: U B V R
parseFromString(MAG_\item) ...
Note that macro names must be at least two characters long. Frequently, the loop variable should not just take on a single string. For such cases you can feed in tuples. The most convenient way to do this is LOOP's ``csvItems`` attribute. The content of this is a string literal containing comma separated values *with labels*, i.e., parsable with Python's ``csv.DictReader``. In your events, you can then refer to the labeled items using macros. For example:: band,source U 10-12 V 13-16
\source
You can take this even further and generate the replacements with Python code. For instance, in :samplerd:`califa/q3`, I needed to pull dozens of columns through a ``parseWithNull`` in a row maker. To maintain the list of such columns was already a challenge, and so I had the computer do it by pulling the table out of the parse context (which is a complex object you usually don't see; ``codeItems`` runs while the RD is being parsed, and so there are additional limits to what you can see) and retrieving all ”flag” columns. Those are the ones that need a uniform parsing:: for col in context.getById("cubes"): if col.name.startswith("flag_"): yield {"item": col.name} parseWithNull(@\item, int, -1) Programmatically Filling RD Elements ------------------------------------ The LOOP_ active tag just introduced can be (ab-) used when you need to move around element content in ways not supported by DaCHS otherwise. The basic idea is to use ``codeItems`` to fetch information and hand it over through macros (which are defined by yielding dictionaries from the code items as details above). This section collects a few examples of how this technique can be applied. Copying Coverage '''''''''''''''' Say you have a resource A that really is just some sort of “value-added“ service on top of another resource B. It stands to reason that A should declare the same coverage as B. If the coverage in B were given literal, you could say:: in A if you had furnished B's coverage element with the id ``cov``. But usually, the coverage will have an updater rather than actual, copyable elements. The updater will not work in A, and even if it were, computing the coverage usually is expensive, and it's wise to avoid computing it twice. Fortunately, using ``LOOP``, you can get B's computed coverage into A. The first thing you need to figure out is that the computed coverage is kept in the :dachsref:`dc.rdmeta` table (it's per-resource metadata, after all). Inspecting this table, you see that spatial is just text (a MOC, ready for inclusion into an RD), whereas temporal and spectral are arrays of double, which you will have to serialise before putting them into the RD. In DaCHS' current implementation these are always 2-arrays, which makes this a good deal simpler. This way, I only need to produce one ``temporal`` and ``spectral`` child each in the :dachsref:`Element coverage`. If there were multiple intervals, I would need separate loops for those (and that may happen in the future). As long as that is not the case, you can re-use a coverage like this:: with base.getTableConn() as conn: for row in conn.query("SELECT spatial, temporal, spectral" " FROM dc.rdmeta" " WHERE sourcerd='__system__/obscore'"): yield { "spatial": row[0], "temporal": " ".join(str(v) for v in row[1]), "spectral": " ".join(str(v) for v in row[2])} \spatial \temporal \spectral This pattern assumes you are sure coverage information is available on all axes. In case you want to be robust against NULLs, you could add clauses like ``"" if row[1] is None else`` or similar. Note how this naturally becomes a no-op if there is no obscore coverage: The code items will not yield anything then, and no coverage element will be produced in A. In case there are multiple result rows – this is impossible in this case as sourcerd is a primary key on rdmeta –, the last row would win due to DaCHS' overwriting rules; this is about as good as any other choice (except *perhaps* raising an error). Creating Placeholders ''''''''''''''''''''' When you build condDescs from multiple database columns as in `Example: Form-based obscore`_, DaCHS has no idea of the ranges on the resulting parameter(s). It's still not nice to have no placeholders in Web forms (say). You can, however, compute them based on the table statistics. DaCHS manages these in the system table :dachsref:`dc.simple_col_stats`. When building a parameter out of, say, ``t_min`` and ``t_max``, you could find the limits using a query like:: SELECT column_name, min_value, max_value FROM dc.simple_col_stats WHERE tablename='ivoa.ObsCore' AND column_name in ('t_min', 't_max') However, making a useful placeholder takes some extra manipulation. If, as is likely in such a case, the input is through VizieR-like expressions, you would need to format the MJDs that we get here to ISO time stamps. Also note that the values in simple_col_stats are not floats but VOTable literals in strings (this is so it can also represent, say, datetimes); you hence need to parse values you pull from that table. To set the actual placeholder, you use a property. The total input key would then be:: from gavo import base, stc with base.getTableConn() as conn: limits = dict((r[0], r[1:]) for r in conn.query("SELECT column_name, min_value, max_value" " FROM dc.simple_col_stats" " WHERE tablename='ivoa.ObsCore'" " AND column_name in ('t_min', 't_max')")) if "t_min" in limits: yield {"obscore_tmin": stc.mjdToDateTime(float(limits["t_min"][0]) ).date().isoformat(), "obscore_tmax": stc.mjdToDateTime(float(limits["t_max"][1]) ).date().isoformat()} \obscore_tmin .. \obscore_tmax Contrary to our loop pattern in `Copying Coverage`_, here we need to explicitly handle the case of missing statistics. This is why we are checking for the presence of ``t_min`` in the results from the database. Licensing --------- Within the astronomical community, licensing issues have traditionally played a minor role – if you referenced properly, using data from other people was not only ok, it was encouraged. We should keep it that way, even in the days of easy transmission and copying. Still, formal statements about how your data may be used are required if, for instance, the data or parts thereof are re-distributed with software, or are aggregated and re-published. Such statements are called licences, and to facilitate re-use you should choose and declare a licence. In DaCHS, you can do that manually using the following two meta items: * rights – this is free text transmitted to the registry that should state the conditions under which the data can be used, shared, etc. This can include requests for acknowledgement, but don't be overly verbose here; if, for instance, you'd like to acknowledge all your contributors, rather put that into a ``_longdoc`` meta and reference that from rights. * rightsURI – a URI of a common licence; this is so machines can figure out conditions if necessary. For now, only include it if your licence actually gives such a URI. In practice, you should stick to a well-known licence and probably don't bother with these two items by hand. DaCHS comes with support for CC-0 (essentially, no restrictions; that's not precisely a licence, but it stands in well for one), CC-BY (state where you got the data from if re-publishing), and CC-BY-SA (as for CC-BY, but also require anyone who re-uses the data to share what they got from you at least as liberally as you shared it). Let us know if you want other licences supported, and we'll do something about it. To apply them, replay one of the streams * :dachsref:`//procs#license-cc0` * :dachsref:`//procs#license-cc-by` * :dachsref:`//procs#license-cc-by-sa` into the RD and give a ``what`` attribute saying what is licensed (usually, the data). And yes, we know that CC-0 isn't a licence; still, this arrangement seemed logical, and it stands in for a licence, anyway. For instance:: If you get to choose the licence but are not sure which one to pick: Our advice is to go for CC-0. This is although we do believe in the GPL for software, which has strong traits of CC-BY-SA in that it prevents others from making your code proprietary. But data is not code. While it is probably unreasonable to combine more than a dozen pieces of source code from different parties into a single reasonable piece of software (excepting distributions, but these have special rules, and modern web browsers, which are not reasonable), it is totally reasonable to include data from thousands of sources. Even something as permissive as CC-BY then becomes a liability – if you published the result of such an effort, in your ``rights`` meta you would have to reproduce a thousand attributions, and that is absolutely not worth it. In case your data providers are nervous about putting their work into, in effect, the public domain: Licensing has **nothing** whatsoever to do with requiring people to properly cite and/or acknowledge. That is part of good scientific practice and is unrelated to legal issues. Licensing is necessary for hassle-free re-use of the data in situations where it needs to be re-distributed; the classic example would be a star catalogue within a planetarium software. Sometimes you want an additional acknowleding clause. In that case, just add another rights meta item:: If you use this data, please acknowledge: "This work made use of the HDAP which was produced at Landessternwarte Heidelberg-Königstuhl under grant No. 00.071.2005 of the Klaus-Tschira-Foundation." The data in this service was calibrated using Astrometry.net, :bibcode:`2010AJ....139.1782L`. Some Words on Times ------------------- Among the messier data types in astronomical databases are dates and times – they come in lots of crazy input formats, they can be represented in lots of different ways in the database, they are expected in lots of crazy output formats, plus there's a host of exciting metadata on them, including time scales and reference positions. With DaCHS, we recommend one of the following ways of storing dates and times (written as attributes of column): * ``name="x_mjd" type="double precision" unit="d"`` – a modified julian date * ``type="double precision" unit="d"`` – a julian date * ``type="double precision" unit="s"`` – a unix timestamp * ``type="double precision" unit="yr"`` – a Julian year with fractions * ``type="timestamp"`` – a postgresql timestamp All other things being equal, we recommend using MJDs; most VO data models and protocols employ them, and they are fairly easy to query against. In HTML forms, they are easily displayed as human-readable datetimes by using an ``displayHint="type=humanDate"`` (which you can do for the others, too, of course). And yes, historically DaCHS did use the column name to tell MJDs from JDs. We used to use the xtype for that (and that still works), but that is incompatible with the SODA/SIAv2 use of xtype. As of DaCHS version 2.8, you should add a full time annotation for MJD times (cf. `New-style SIL definitions`_). This could look like this:: (votable:Coords) { time: { frame: { timescale: UTC refPosition: TOPOCENTER time0: MJD-origin } location: @my_time } Admittedly, that is a bit verbose, and perhaps we will experiment with shortcuts later. The Julian years are a good choice, too, and they are immediately human-readable to some extent. They are certainly the representation of choice for epochs and equinoxes. Note that the storage of Bessel years is strongly discouraged. Use the ``bYearToDateTime`` function to transform them to datetime instances which you can then map to any recommended representation. While timestamps might sound like a good idea in that they are the proper native type to manipulate dates and times with, they usually are a bad choice. The main reason is that in ADQL there is basically no support of timestamps at all, which makes any manipulation of them in ADQL queries virtually impossible. If you're sure your table will never turn up on a TAP service, that doesn't hurt much, but can you be sure? All this did not mention any UCDs or utypes that may apply. UCDs should not, in general, depend on the time format chosen; all of the above could be used for quantities like ``time.creation``, ``time.end``, ``time.epoch``, ``time.equinox``, ``time.processing``, ``time.release``, ``time.start``, and more. The SIAP version 1 protocol made a funky exception there, defining an ``VOX:Image_MJDateObs`` UCD; everything about that UCD is horrible, and it is generally accepted in the VO that this was an error. Finally, there is advanced metadata, in particular time zones, time scales (i.e., how does the time pass?) and reference positions (i.e., where is the clock positioned?). Time zones are not supported at all in the VO. All times are for the Greenwich meridian (i.e., they should be close to UTC). The time scales are important on the level of seconds; time scales known to the VO are listed on http://www.g-vo.org/rdf/timescale. To understand a bit better where all this comes from, we recommend a look at the fairly readable explanation at http://stjarnhimlen.se/comp/time.html. The reference positions impact arrival times by simple geometry; common reference positions include “roughly earth” and “roughly sun”, which leads to differences on the order of ten minutes. Additionally, there are relativistic effects at a level of milliseconds or below; they need to be declared for high precision work since a clock in the barycenter of the solar system will (evaporate but before that) run slower than one on Pluto due to relativistic effects of various sorts. Obviously, any number of reference positions might be necessary at some point (“a clock on the Voyager II space probe”). In the VO, we try to stay on top of this by collecting “well-known” reference positions on http://www.g-vo.org/rdf/refposition. This is list will grow over time, but it is unlikely that many clients will be able to do much with anything except GEOCENTER, EMBARYCENTER, BARYCENTER, and HELIOCENTER. That does not mean that it is not a good idea to declare something else. Good clients might at least adjust estimates of systematic (or even random) errors based on such metadata. To declare them, use the procedure described in `New-style SIL definitions`_ for now. Apologies that this is so obscure, but for once that's really not our fault. .. _DALI: http://ivoa.net/documents/DALI/ .. _EPN-TAP proposed specification: https://voparis-confluence.obspm.fr/display/VES/EPN-TAP+v2+parameter+description .. _EuroVO registry: http://registry.euro-vo.org .. _IVOA identifiers: http://www.ivoa.net/documents/IVOAIdentifiers/20150709/index.html .. _OAI-PMH: http://www.openarchives.org/OAI/openarchivesprotocol.html .. _PyPDS: https://github.com/RyanBalfanz/PyPDS .. _RofR: http://rofr.ivoa.net/ .. _Virtual Observatory: http://ivoa.net .. _VOTT: http://dc.g-vo.org/VOTT .. _aladin: http://aladin.u-strasbg.fr/aladin.gml .. _dachs-support: http://lists.g-vo.org/cgi-bin/mailman/listinfo/dachs-support .. _dali standard: http://ivoa.net/documents/DALI .. _gavo ucd resolver: http://dc.g-vo.org/ucds/ui/ui/form .. _gavo's adql short course: http://www.g-vo.org/adql .. _hints on common problems: http://docs.g-vo.org/commonproblems.html .. _installation guide: http://docs.g-vo.org/DaCHS/install.html .. _manually add our repository: http://soft.g-vo.org/repo .. _NAVO Registry: http://vao.stsci.edu/vo-directory/publishing/ .. _Obscore: http://ivoa.net/documents/ObsCore/ .. _pgsphere: http://pgsphere.projects.pgfoundry.org/ .. _RDs used in Heidelberg: http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/ .. _reStructuredText: http://docutils.sourceforge.net/rst.html .. _reference documentation: http://docs.g-vo.org/DaCHS/ref.html .. _splat: http://star-www.dur.ac.uk/~pdraper/splat/splat-vo .. _topcat: http://www.star.bris.ac.uk/~mbt/topcat/ .. _vounits: http://www.ivoa.net/documents/VOUnits/index.html .. [#port] Another reason this may fail is if you have something else listening on port 8080 already, in with case you can create a file ``.gavorc`` in your home directory and add:: [web] port: 4040 there. Of course, you have to adjust the URLs given in the tutorial then. .. [#binding] This assumes you are doing this on your local machine (which we recommend to get started). If you do this on a remote machine, you will obviously have to replace the localhost in the url with the machine's host name. However, that still will not work as, by default, DaCHS only binds to the loopback address. To change this, edit or create the file ``/etc/gavo.rc`` to include at least:: [web] bindAddress: (yes, there's nothing behind “bindAddress:”). Restart the server and you should see the output. .. [#rr] If you are running DaCHS on a reasonably stable platform, you are most welcome to join it. You would have to check out the resdir at http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/rr/, follow the instructions in the README (after which you have a fully functionaly RegTAP registry). To be part of the machines that provide fallbacks in case of failures, please contact the Heidelberg maintainers. Incidentally, you can also run a mirror of the web interface to the registry then; you would just have to check out http://svn.ari.uni-heidelberg.de/svn/gavo/hdinputs/wirr/. .. [#belim] ``dachs limits`` on DaCHS 2.3 and earlier does something somwhat different, and you should ignore everything we say here if you're still on DaCHS 2.3. Additionally in the DaCHS version that is in Debian bullseye (which is also a 2.3) the limits subcommand is broken. Bottom line: If you want dachs limits, use 2.4 or newer. .. [#prevepn] DaCHS has had support for version 0.37, which is in the //epntap RD. Do not use that for new projects. .. |date| date:: .. _CC-0: http://creativecommons.org/publicdomain/zero/1.0