DACHS(1) | General Commands Manual | DACHS(1) |
dachs - data publishing infrastructure for the Virtual Observatory (VO)
dachs [global-options] <subcommand> [options] function-argument ...
dachs provides support for data ingestion and publishing, for metadata handling, and for a variety of VO protocols and standards, e.g. the Table Access Protocol (TAP) or the Simple Cone Search (SCS).
There are numerous sub-commands covering the various tasks (importing, controlling the server, running tests, etc).
Subcommand names can be abbreviated to the shortest unique prefix.
A central concept of DaCHS is the Resource Descriptor (RD), and XML description of a data collection including metadata, ingestion rules, service definitions, and regression tests. They are usually referenced through their RD ids, which are the relative paths from DaCHS' inputs directory to the file containing the RD, with the conventional extension .rd stripped. For instance, in a default install, the file /var/gavo/inputs/myrsc/q.rd would have myrsc/q as RD id.
Most commands dealing with RD ids will also pick up RDs if referenced by path; in the example above, if you are in /var/gavo/inputs/myrsc, you could also reference the RD as either q or q.rd.
Several commands take references to RD elements (table definitions, exec items, direct grammar, etc). These consist of an RD id as just discussed, a hash mark, and the XML id of the target element. Tables have an id automatically, for other elements you may have to add an artificial id.
Global options are given before the subcommand name.
Synopsis:
dachs admin [-h] subsubfunction [subfunction-arguments ...]
This is a somewhat random collection of commands related to administering a data center. In particular, this is where you create and edit accounts.
subsubcommands can be abbreviated as long as the abbreviation is unique. For instance dachs adm xsd will do an XSD validation.
For more information on the subsubfunctions, pass a -h flag.
Synopsis:
dachs config [section-name] config-key
This outputs values of DaCHS' configuration to stdout. section-name defaults to general. This is most commonly used to make external components aware of DaCHS' file locations, e.g., through inputs_dir=$(dachs config inputsDir).
See the operator's guide for a documentation on DaCHS' configuration options.
Synposis:
dachs datapack [-h] {create,load} ...
Management of full DaCHS resources in the frictionless data package format. Note that while DaCHS-created data packages should work as normal data packages, DaCHS can only (automatically) load data packages generated by DaCHS itself -- there simply is not enough metadata in generic data packages to produce usable resources.
Synopsis:
dachs datapack create [-h] id dest
Positional arguments:
The command creates a data package containing the RD, a README in the resource directory if it exists, auxiliary files (like a booster grammar or a custom core) the RD may have, and all data files that the sources in the RD yield; for collecting these files, sources elements with ignoreSources children reading from the database are ignored, because that would make behaviour a bit too unpredictable.
To include further files, you can use the datapack-extrafiles property on the RD. This must contain a json sequence literal with resdir-relative shell patterns that should be included as well. Non-existing paths and directories in this list are silently ignored. You cannot include files outside of the resource directory in a data package.
Synopsis:
dachs datapack load [-h] [-t] [--force] source
Positional argument:
Optional arguments:
This unpacks a DaCHS-produced data package, imports it, and runs tests on it (which you will want to suppress if you do not have a DaCHS server running locally).
Synopsis:
drop [-h] [-s] [--all] rd-id [dd-id ...]
This is the reverse of import: Tables created by a dachs imp with identical arguments are being torn down by dachs drop. This will not work reliably if the RD has been modified between the imp and the drop, in particular if the RD has been deleted. In such situations, you can use the -f flag, which unconditionally tears down everything DaCHS has recorded as coming from the referenced RD.
Synopsis:
dachs dump [-h] {load,create,ls} ...
This is an interface to dumping database tables and inspecting and restoring the generated dumps.
This is mainly intended for small to medium tables that are just kept in the database, e.g., DaCHS' administrative tables and the like. For normal user tables, built from science data, doing re-imports is the recommended way to deal with data loss.
In particular, this command is not designed (at this point) for really large tables. For technical reasons (that could be overcome), currently the individual dumps are kept in memory during creation (but not during loading).
Before loading, the target tables are dropped if they are already present; note that this will also drop any views they might be part of, as well as any rows that have foreign keys on this table. After loading, indices and primary keys on the restored tables will be recreated, but no scripts or similar are run. Hence, you will have to manually re-create any dependent resources after a restore.
This also means that, for example, restoring a table with products without also restoring //products#products will lose the products that previously were in the restored table from the products table.
Synopsis:
dachs dump create [options] dumpFile ids [ids ...]
Dump one or more tables to DaCHS' dump format. When you pass in RD ids, all onDisk-tables defined in the RD will be dumped.
Positional arguments:
Options:
Synopsis:
dachs dump load [-h] source
Restore table(s) from a file created by the dump create subcommand before
Positional argument:
Synopsis:
dachs dump ls [-h] source
List tables and dump metadata from a DaCHS dump.
Positional arguments:
Synopsis:
dachs gencol [-h] [-f {fits,vot,viz}] FILE
The gencol subcommand quickly generates a set of basic column elements for inclusion in RDs. It is particularly convenient together with dachs start.
This currently understands FITS binary tables, VOTables and Vizier-Style byte-by-byte descriptions (the latter somewhat haphazardly). It assumes there is only one table in each file (and will ignore others in the FITS and VizieR case).
It will extract what metadata there is and emit formatted RD XML to stdout; this should work just fine for cut-and-paste into <table> elements in RD. Note that usually, important metadata will be missing (e.g., UCDs for FITS and VizieR), and usually data providers are not terribly careful when writing metadata. Hence, you should definitely thoroughly review the elements created before using them in a public service.
The program tries to guess the applicable parser from a file name (e.g., README or anything ending in .txt will be treated as Vizier-like). Where this fails, use the -f option.
For Vizier-like descriptions, the program also dumps colDefs material for use in columnGrammar-s.
Synopsis:
dachs import [options] rd-name [data-id]
This subcommand is used to ingest data described by an RD. For special applications, ingestion can be restricted to specific data items within an RD.
Synopsis:
dachs info [-h] table-id
This displays column statistics about the table referred to in the argument (which must be a fully qualified table name resolvable by the database system).
Synopsis:
dachs init [-h] [-d DSN] [--nodb]
This initialises DaCHS' file system and database environment. Calling dachs init on an existing site should not damage anything. It might, however, fix things if, for instance, permissions on some directories went funny.
Synopsis:
dachs limits [-h] [-t] [-s P] [-d] itemId [itemId ...]
This creates or updates DaCHS' estimates of various metadata of RDs or tables, in particular the STC coverage, the size of the tables and, unless -t is given, the column statistics. This may take a long time for large tables, in which case you may use the --sample-percent option, which makes DaCHS only look at a subset of the full data.
When running dachs limits on an RD, it will skip views under the assumption that in the typical case, column metadata for the interesting columns will already have been obtained for the source tables. For views for which this is assumption is wrong, set the forceStats property to True.
You should in general run dachs limits before publishing a resource.
Synopsis:
dachs mkboost [option] <id-of-directGrammar>
This writes a C source skeleton for using the direct grammar referenced to fill a database table. See the Guide to Write Booster Grammars in the DaCHS documentation for how to use this command.
Synopsis:
dachs mkrd [option] sample
Rudimentary support for generating RDs from data. This was probably a bad idea. Better use dachs start.
Synopsis:
dachs pub [-h] [-a] [-m] [-k] [-u] rd [rd ...]
This marks data and/or services contained in an RD as published; this will make them displayed in DaCHS' portal page or pushed to the VO registry through DaCHS' OAI-PMH endpoint. See the Operator's Guide for details.
Synopsis:
dachs purge [-h] tablename [tablename...]
This will delete tables in the database and also remove their metadata from DaCHS' internal tables (e.g., TAP_SCHEMA, table of published records). Use this if dachs drop fails for to remove some table for one reason or another.
Synopsis:
dachs serve [-h] {debug | reload | restart | start | stop}
This exposes various functionality for managing DaCHS' server component. While these usually are being called through init scripts or systemd components, the debug subfunction is very convenient during service development off the production environment.
serve start, stop, and restart manage a PID file, which DaCHS does not do when running under systemd. Hence, do not use these subcommands when systemd manages your DaCHS. expireRD is fine, though.
The start subcommand accepts a --foreground/-f option. If you pass it, the server will not double-detach and not redirect stderr to a log file and instead stay in the foreground; it will also not check for PID files or create them. In contrast to debug, it will change users, etc., and not put the DaCHS into debug mode. This option is primarily useful for systemd units.
Synopsis:
dachs start [-h] (list|<data-type-tag>)
The start subcommand generates a template RD for a certain type of data that you can then fill out. The data-type-tag can be something like scs (for catalogs), siap (for images), or ssap (for spectra). Pass list to see what is available.
The template uses the name of current directory as resdir and schema name. That means that if starting a data collection, you should create its resdir as a child of GAVO_ROOT/inputs and execute dachs start in that dicrectory.
To fill out the template RD, load it into a text editor and, ina first go, search for the pattern %.*%. You should see enough hints from what is between the percent signs and the environment to get a reasonable shot at filling things out. Then reead the comments; very typcially, you can get an extremely basic data publication without that, but a good service will normally require some extra work beyond filling things out.
Synopsis:
dachs test [-h] [-v] [-V] [-d] [-t TAG] [-R N] [-T SECONDS] [-D FILE]
[-w SECONDS] [-u SERVERURL] [-n NTHREADS]
[--seed RANDOMSEED] [-k KEYWORDS]
id
This runs regression tests embedded in the whatever is reference by id (can be an RD, a regression suite, or a single regression test). For details, see the chapter on regression testing in the DaCHS Reference Manual.
Synopsis:
dachs validate [-h] [-x] [-v] rd [rd...]
This checks RDs for well-formedness and some aspects of VO-friendliness
Synopsis:
dachs upgrade
Each DaCHS version has an associated database schema version, encoding the structure of DaCHS' (and the implemented protocol versions') ideas of how system and user tables should look like. dachs upgrade attempts to work out how to change the database to match the expectations of the current version and executes the respective code. It will not touch its data structures if it decrees that the installation is up to date.
Operating system packages will usually try to run dachs upgrade as part of their management operation. In case dachs upgrade requires manual intervention, this may fail, in which case operators may need to call dachs upgrade manually.
Operators keeping a manually installed DaCHS should run dachs upgrade after each svn update or update from tar.
dachs upgrade cannot perform actions requiring superuser privileges, since none of its roles have those. Currently, this is mainly updating postgres extensions DaCHS uses (if you use extra ones, you can configure DaCHS' watch list in [db]managedExtensions). dachs upgrade -e will attempt to figure out the instructions necessary to update extensions and write them to stdout. Hence, operators should execute something like dachs upgrade -e | psql gavo from a database superuser account after upgrading postgres extensions.
Synopsis:
dachs adql query
This subcommand executes ADQL queries locally and writes the resulting VOTable to stdout. We consider removing it.
The subcommands show, stc are deprecated and not documented here. They may disappear without further notice.
the subcommands taprun, dlrun, uwsrun, gendoc, raise are used internally and should not be directly used by DaCHS operators.
To report bugs and request support, please use our support mailing list http://lists.g-vo.org/cgi-bin/mailman/listinfo/dachs-support.
Comprehensive, if always incomplete documentation on DaCHS is available in several documents available at http://docs.g-vo.org/DaCHS/ (upstream site with PDF downloads and the formatted reference documentation) and http://dachs-doc.readthedocs.io/en/latest/index.html (with facilities for updating the documents).
Copyright © 2017 The GAVO project. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Markus Demleitner <gavo@ari.uni-heidelberg.de>
2021-02-04 | 2.3 |