Author: | Markus Demleitner |
---|---|
Email: | gavo@ari.uni-heidelberg.de |
Contents
TAP ("Table Access Protocol") is a protocol that lets you send database queries and data to servers within the Virtual Observatory (VO). Like the WWW, TAP is built on HTTP, but working it with a web browser is tedious to impossible.
You can use the tapquery module from GAVO's votable package, but for ad-hoc work and quick queries, something shell-like is much nicer; therefore, we have wrapped tapquery into the TAP shell. It comes with job and server management, SAMP integration, and command line completion.
For installation instructions, refer to the tapsh distribution page. Since tapsh does not include a results viewer, you should install topcat alongside.
This document should suffice to get you up to speed with TAP. In particular, you should never need to read IVOA's TAP specification. It is aimed at implementors, not users, and cannot be understood without largish body additional specifications. If you miss information on TAP in this document, we consider this as a bug and will gladly fix it.
TAP transmits queries in the Astronomical Data Query Language (ADQL). This document is not an introduction to ADQL. The IVOA ADQL specification may be helpful to people already proficient in SQL. Unfortunately, there is not yet a real "introduction" to ADQL; you could peruse the lecture notes for a talk on ADQL; send us further questions (and, of course, answers) to expand them into a real introduction.
To see tapsh in action, open a terminal and type "tapsh" (without the quotes). If the command cannot be found, you will need to amend your shell's search path; if you cannot figure out how to do this locally, you are welcome to ask us. Then, start topcat (the sequence is not that critical, but it's easier this way).
After these preliminaries, type into the terminal window with tapsh's prompt:
server ivo://org.gavo.dc<tab> select table_name from tap_schema.tables! select * from tap_schema.tables;
(<tab> means "hit the Tab key").
The first select should show you the table names available on the remote server in the tapsh itself. The second select command should send its result to topcat.
At any time, tapsh has at most one current server and at most one current job. Both are persistent over sessions (where we do not define the behavior if you run several tapshs in parallel; there should be no reason to do so).
In tapsh, servers are identified by their IVOA identifiers; those are URIs (in effect, strings) starting with ivo://. To set the current server, use server command. Of course, nobody expects you to memorize the rather clumsy identifiers, so tapsh has tab-completion. Simply hit the tab key after server to see the servers known to tapsh; type a few characters and hit tab again until the IOVA id is complete.
To create a new job, type in your query ("select ..."). This job then becomes the current job. The job command lets you switch to another job (again, you can use tab completion). To see a list of all jobs known to tapsh at any given moment, use the ls command.
A TAP job has a phase on the server side. This phase is one of
TAP is a one-way protocol, meaning that all communication is initiated by the client. In consequence, tapsh has to explicitely ask the server for the phase of a job. It does so on certain commands (e.g., job and update). The result is cached in the last known phase. This is what the ls command shows, and it may be out of date. Use the job command to obtain the real remote status at any given time.
The phases ABORTED, ERROR, and COMPLETED are final phases.
Jobs have server-side identifiers. Since these may be weird strings (and might not be unique across servers), tapsh gives every job a nickname. These are locally unique strings used as (mostly optional) arguments to many of tapsh's commands. Tapsh auto-generates a hopefully cute-sounding six-letter nickname when you create a job. You can use the nick command to change the current job's nickname. This may be convenient to assign mnemonic names to jobs, and it is particularly useful in scripts. Nicknames must be composed of (ASCII) alphabetic characters exclusively.
VO-enabled applications communicate using a protocol called SAMP designed to let them exchange tables and other data. tapsh is a SAMP client (and will start a hub if none is running yet). In conjunction with tapsh, you should at least run topcat to have a way to (conveniently) inspect the result tables.
Although many setups are conceivable, for now you should first start the tapsh and then topcat. You can see everything is working properly if you see an icon of a tap (ok, faucet) in topcat's SAMP panel. If you don't, click the connection icon in the lower right corner of topcat's main window, wait for the disconnection, and then click on it again to reconnect. If then there's no tap logo, complain to us.
The first step in crafting a query is figuring out which tables contain data relevant to what you are interested in. Tapsh relies on a supporting service in the GAVO data center to do this, the Global TAP Schema or GloTS for short. The metasearch command opens a web browser with a form letting you search for columns by UCD and table and column keywords. The result table shows, among other things, the IVOA id of the service containing this column. Use tapsh's server command (either copying and pasting or tab completion) to set the current server.
The GloTS result page has links to the table metdata that you will need to craft your query; at any time, you can use the tables command to get a browser window giving the tables available on the current server.
Simply type in a query to create a new job. This does not start your query. After a select command, the job is is PENDING. This is the time to execute the keepfor, limit, and upload commands.
To actually start a job use the start command to immediately regain a prompt (e.g., to start further queries, and of course to check your job's progress), or the run command to have tapsh block and poll the remote status now and then and return when the job has reached a final state.
When a job has successfully finished (i.e., ends up in the COMPLETED phase), a result is made available on the server. This result always is in VOTable format for jobs created by tapsh.
To view it, send it to SAMP client, typically topcat. You can use the broadcast command to send it to all registered clients, or the send command to send it to a specific client. In the typical case, you would say send to topcat.
You can also save the result to a local file using the save command. Note that the local file name must always be in double quotes; you would thus say save "result table".
Since the select.../run/send to topcat sequence is so common, there is a shortcut: Simply append a semicolon to the query, and tapsh will do the run and send for you.
For simple result sets, changing to topcat (or some other SAMP client) may be too tedious. In those cases, you can get an ASCII rendition of the result set within tapsh using the dump command. Again, there's a shortcut for select. If you append an exclamation mark rather than a semicolon to the query, the result will be dumped rather than sent.
Not that sending, saving or dumping a result does not consume it; it needs to be retransferred from the server for each operation, though.
When done with a job, you should use the delete command to remove it from both tapsh's job list and the server.
Most of the job manipulation commands optionally accept a nickname argument and will then operate on the job selected.
One of the most powerful features of TAP is the ability to upload data into (temporary) database tables. Not all servers support this feature, though.
Tapsh's upload command has the form upload <something> as <tablename> and is issued after a select and before start or run; the somethings are discussed below, tablename is an SQL identifier (i.e., roughly an alphabetic character followed by alphanumeric characters). The contents of the uploaded file will appear to the query as tap_upload.<tablename>. You can upload more than one file.
To upload data, it must be in VOTable format. This is always true for tapsh's jobs, so -- and this probably is the most common use -- you can upload the last job's (or the last jobs') result(s). To do this, say upload result <nickname> as <tablename>.
Another source of VOTables is the interprocess communication SAMP. You can, for example, come up with some data in VODesktop or manipulate a table in topcat and then use SAMP to send it to tapsh. After that, you can say upload samp as <tablename>. tapsh will always use the last table sent to it in this form.
Many VOTables may have a (http) URL. In particular, all VO cone searches end up having GETtable URLs. To upload from some URL, just enter the URL like upload http://xxx as <tablename>. Only http URLs are permitted at this point.
Finally, you can upload local data, where the file name needs to be enclosed in double quotes: upload "<path to file>" as <tablename>. If your data is something other than a VOTable, topcat will help you convert it to VOTables. Note that no database server will handle sexagesimal coordinates (minutes and seconds detached). If you have those, convert them to decimal degrees first -- again, TOPCAT will usually help you there with its hmsToRadians or dmsToRadians functions.
Since the expression for a simple, positional crossmatch is quite verbose, you can type Escape-M (for "match") to insert a template. If you want further macros of this sort, use the readline facilities.
You could, for example, add:
$if Python Control-n: "select * from " $endif
to your ~/.inputrc; this would make Control-N insert "insert select * from into" tapsh's command line. Yes, you cannot now use different bindings for the tapsh and the python interpreter. We'll retrofit that on demand.
Tapsh supports a limited form of scripting, with the idea that you can write your queries in files (e.g., for documentation purposes) and then execute these files. In scripts, you can use all tapsh commands; in addition, you can use continuation lines to split long command lines for readability. Just have a backslash as the last character of a line, like this:
select field1, field2, field3\ from foo\ join bar\ where field4 between 4 and 5\ and field6<field2
To execute the script, just pass its name to tapsh. If you insist, you can even write:
#!/usr/bin/env tapsh
in the first line of your script, make it executable and then run the tapsh script as a command.
Note that we will not add control structures (loops, selection, etc) to tapsh. If you need this, you should use python. If you run tapsh, you already have the votable tapquery module on your system that you can use for that. Further scripting (e.g., of SAMP) could be effected by using tapsh itself as a python module. We will document this operation mode on request.
This section contains a synopsis of the online help; you can find this information also by typing help and the command name.
addserver <id> <accessurl> -- adds a non-registred server to the server list
To use non-registred servers, you can manually add them using their access URL, which is the URL of the query endpoint without any sync or async.
The server id can be any string not containing whitespace.
Note that manually added servers will be lost when you use the refreshServers command. You could write a short script to add your preferred local servers if you really wanted to, but the correct way is to bug the server operators to register their servers. There's no documented way to remove a server locally added except by removing them all using refreshServers.
broadcast [<nick>] -- broadcasts a result to all connected SAMP clients.
The URL of the current or selected job's result will be sent to all SAMP clients connected and listening to table.load.votable. The function will error out if SAMP has been disabled or if no result currently exists.
delete [<nick>] -- destroys a job server-side and locally
Without an argument, the current job is deleted, otherwise the named job is deleted. If the job is running, it is aborted.
It is generally considered nice to delete a job when done with it. You can use the purge command to delete many jobs at a time for convenience.
dump [<nick>] -- dumps the query result to the console
If a job has a result, dump retrieves it and formats it as a rough ASCII table. The result is not beautiful, and no attempt is made to accomodate to screen widths. Thus, for nontrivial result sets, you should really use "send to topcat" or similar.
error [<nick>] -- give error message of current or selected job
TAP servers give error messages under a special resource. The error command retrieves that resource, tries to parse it as a VOTable and returns TAP error messages contained in that VOTable if successful. Maformed responses will be printed verbatim.
Without an argument, the error command shows the message for the current job, otherwise for the job specified using its nickname.
The behavior of this command is unspecified if the job is not in the ERROR phase.
exec <file name> -- creates a query from the content of file name
exec lets you prepare and edit queries in your favourite editor.
help -- gives a short synopsis of available commands
Without an argument, help prints a synopsis of the commands available. With an argument, it will print a longer explanation of what the command does.
job [<nick>] -- gets/sets current job
A TAP job has rich server-side metadata, in particular the query issued. The job command shows a synopsis of this data, retrieved from the server. It updates the "last known" phase information, too.
Without an argument, job shows the current job's metadata, otherwise that of the named job.
Certain DSA servers produce malformed responses that lead tapsh to believe the job was dead on the remote side. Don't worry about this, that's just a harmless protocol mismatch.
keepfor <num> -- ask the server to keep the job for another <num> days
TAP servers only keep jobs around for a limited period. To find out when they are scheduled for destruction, just the job command.
The keepfor command requests to change that destruction time to <num> days in the future. TAP servers are free to ignore or modify your request, which is why the command outputs the new destruction time.
limit <num> -- tell the server to let the job run for up to <num> seconds
TAP servers (usually) enforce limits on the execution time of queries, such that a job is aborted after, say, an hour or a day. For long-running queries, that period may be too short.
With the limit command, you can ask the server to raise (or lower) this time limit to the number of seconds specified in the argument. The server is free to discard or modify this request. Therefore, the new time limit is printed by the command.
ls -- lists known jobs
tapsh keeps a list of jobs started but not deleted. The ls command lets you inspect that list. From this list, you can see what jobs you can make the current job again using the job command.
The phases given in ls are the phases it found when tapsh last looked. To have ls show current remote phases, use the update command.
metasearch [<colstuff> [/ <tablewords>]] -- search for queriable columns
GAVO's Global TAP Schema GloTS offers a web interface for locating tables accessible using TAP. The metasearch command opens a web browser window letting you search by UCDs, table keywords and column keywords.
metasearch by itself just opens a browser window with GloTS' form interface. If you give argument(s), you will be sent to a result page. A first argument that contains dots or a semicolon will be interpreted as a UCD pattern (* and ? work as in shell patterns), anything else will be column description words.
nick <nick name> -- sets the current job's nick name
Using nick, you can assign mnemonic names to the current job. This is particularly useful in scripts, since it lets you refer to jobs previously created in, e.g., upload commands.
If a job other than the current one already has the given nick, it is deleted on the server. Again, this is what you want in scripts.
Nicknames must consist exclusively of ASCII alphabetic characters (a-z, A-Z). The automatically assigned nicknames are always six lowercase characters long, with consonants and vowels alternating (in the hope that the result is pronounceable). It is a good idea to use names constructed differently for your fixed nicks.
Example:
select * from table1 nick FirstRes run select * from table1 join tap_upload.uploaded upload result FirstRes as uploaded run
purge |failed|all -- deletes server-gone/failed/all jobs
Manually deleting jobs becomes tedious, in particular when experimenting. The purge command lets you delete all job removed on the server side (default; you will need to run update to make tapsh aware of server-side destructions), or additionally jobs in error phases (i.e., those in ERROR or ABORTED phases) when the failed keyword is given.
purge all destroys all known jobs. This, in particular, means you lose all results still stored server-side.
quit -- shuts down tapsh
You don't need help for this, do you?
refreshservers -- update the list of TAP servers from the GAVO DC
tapsh relies on the resources of GAVO's global TAP schema (GloTS), a registry of TAP services and the tables contained within them. When run for the first time, it will retrieve a list of TAP services known to GloTS and keep a local copy.
Since new TAP servers come around now and then, you should rund refreshservers periodically to keep up to date. A clear sign you should do so is when the server command complains about unknown servers.
run [<nick>] -- starts a job and blocks until finished
Without an argument, run starts the current job, otherwise the named job is run. Tapsh will block until the job has reached a final phase.
You can abort the job started by run by hitting ^C. There currently is no way to "background" a running job. Pester us if you want this.
Use the start command to start a job and immediately have a prompt again.
After run, the job will be in a final state. Use the job command to see which it is.
save "<file name>" [<nick>] -- save the job's result to a local file.
The save command lets you retrieve a completed job's result and store it in a local file. You must enclose the local file name in double quotes, giving full paths is legal.
select <query tail> -- creates a new query on the current server
The select command creates a new job on the current server. This does not start the execution of the query, it just creates a job. No syntax checks are performed locally, and the servers will usually only check the syntax when the query is actually executed.
The job created will be the new current job.
tapsh will automatically make up a nickname for the job. You can change the local nickname using the nick command.
Since it is such a common operation, there is a shortcut to immediately process the job and send its result to topcat on successful completion: Just append a semicolon (;) to the query. This is equivalent to the sequence:
select foo from bar run send to topcat
Similarly, if you append a bang (!) to a query, the result is immediately retrieved and dumped.
Some TAP servers run DSA. In late 2010, those still speak an early dialect of ADQL. You will notice this when, on executing the select statement, you will receive some error message talking about ADQL/X for valid ADQL. Mostly, you can fix this by prefixing all column names with the table name, like so:
select t.ra from dsaControlledTable as t where t.dec<-89
send [<nick>] to <recpient> -- sends a result to a SAMP client <recipient>
Send works like broadcast, except that you can select the SAMP client to send to. Try the tab key to see the client names assigned by tapsh. The first topcat instance will always be called topcat, though.
The client list is only updated every 10 seconds, so you may have to wait a bit if you hit Tab and only then started the destination client.
server [<ivo-id>] -- gets/sets the current server (try TAB for ids)
tapsh almost always has a current server, the one new jobs are created on. Only TAP servers known to tapsh can be set (i.e., you cannot just grab an ivo id from somewhere). Use the refreshservers command to update tapsh's internal list of TAP servers from GAVO's global TAP schema.
The ids used by tapsh are IVOA identifiers (except where you assigned them yourself using addserver). These are, admittedly, somewhat unwieldy, but that's what tab completion is for.
setpar <parName> <value> -- sets a remote parameter.
Internally, TAP has a number of parameters. You can set them using this command. However, since most parameters are actually managed by other commands, you can break a lot here. So, only use this if you know what you're doing.
A useful example could be setPar FORMAT fits to ask a TAP server to return the result as a fits binary table. However, such a result would then not work as an upload.
start [<nick>] -- starts a job server-side and immediately returns
Without an argument, the current job is started, otherwise the named job is started. Contrary to the the run command, you immediately get a tapsh prompt again. You can check the job's status using the job command.
tables [<ivo-id>] -- inspect a server's tables in a web browser window
GAVO's global TAP schema offers a web interface for inspecting the tables on the current server. The table command opens a web browser window that shows the tables available and lets you inspect the table schemata.
update [all] -- updates phases for non-finished or all jobs
The update command causes tapsh to query the servers of all jobs not known to tapsh to be in a final phase (i.e., those in PENDING, QUEUED, or EXECUTING) for their phases and update their metadata accordingly.
With the optional all argument, even jobs in final phases are updated. This is mainly to detect jobs deleted server-side (e.g., because their destruction time has passed). These will then go to the (tapsh-internal) DESTROYED phase.
Note that tapsh will not monitor its jobs automatically, so the output of the ls command may be out of date. The "known" phase of a job will be updated by the run and job commands as well.
upload (result <nick>|samp|"local file"|URL) as <name> -- adds an upload to the current job
The upload command allows you to temporarily ingest your own data or previous results into the server database. Not all TAP servers support this.
The data uploaded must be in VOTable format and preferably only contain a single table.
You can upload a previous job's result by nickname (try upload result <Tab>). To upload a local file, put it in double quotes. HTTP URLs can also be uploaded, but they must be accessible to the local host, since it is downloaded from the source server and uploaded to the TAP server (this is because it is far more likely that the remote server cannot access a URL accessible to the client than vice versa).
The special upload samp form will upload the last table sent to tapsh (or broadcast) by SAMP clients. This is a convenient way to use data generated by other VO clients (e.g., VODesktop).
You must give the uploaded table a name consisting of alphabetic characters and an underscore. In your queries, you refer to the uploaded table as tap_upload.<tablename>.
You can upload more than one file in this way for complex joins.
wait <seconds> -- just waits for the specified number of seconds
The wait command is a cheap hack to allow scripts to pause a bit before exiting. The purpose of this is to give SAMP clients a chance to complete their dialogs with the tapsh. Admittedly a command wait_for_samp_to_complete would be better, but even for SAMP transactions that tapsh has initiated that is not trivial. Hence, this hack.
Tapsh uses the sampy module provided by the Italian VO.