18. Almost real world

Just so you get an idea how SQL expressions can evolve to span several pages:

Suppose you have a catalogue giving alpha, delta, and an epoch of observation sufficiently far away from J2000. To match it, you have to bring the reference catalogue on our side to the epoch of your observation. For larger reference catalogues, that would be quite an expensive endeavour. Thus, it’s usually better to just transform a smaller selection of candidate stars.

To do this, you decide how far one of your stars can have moved (in the example below 0.1 degrees, the inner crossmatch), and you generate a crossmatch there. From that crossmatch, you select the rows for which the transformed coordinates match to the precision you want.

To play this through, load matchme.vot from the HTML or PDF attachment into TOPCAT. The rough crossmatch with Gaia is standard fare:

select
  alpha, delta, epoch,
  source_id, ra, dec, pmra, pmdec
from tap_upload.t1
  join gaia.dr3lite
  on distance(alpha, delta, ra, dec)<0.1

That is returning some 10000 pairs, almost all of which are wrong (there are certainly fewer than 55 true matches, as there are just 54 rows in matchme). We will thus have to filter more strictly constraining the positions. For that, we have to apply proper motions.

There is nothing in ADQL’s core that can do that. For the small distances we are talking about here, you could write something like

    ra+pmra/cos(radians(dec))*(epoch-2016)
      as palpha,
    dec+pmde*(epoch-2016) AS pdelta,

as a workable approximation.

More and more TAP services, however, have an ADQL extension function (UDF; see TOPCAT’s “Service” tab for a per-service list of those) ivo_epoch_prop_pos that will do a precise job. We will use it here:

SELECT alpha, delta, parallax, pmra, pmdec, source_id
FROM (
SELECT
  alpha, delta, parallax, pmra, pmdec, source_id,
  ivo_epoch_prop_pos(ra, dec, parallax,
    pmra, pmdec, radial_velocity, 2016, epoch) as tpos
FROM tap_upload.t1
  JOIN gaia.dr3lite
  ON DISTANCE(alpha, delta, ra, dec)<0.1) AS q
WHERE DISTANCE(POINT(alpha, delta), tpos)<2/3600.

(don’t forget to adapt the table name behind tap_upload!).

If you’ve tried it, you’ll have noticed that 53 rows were returned for 54 input rows. For “real” data you’d of course not have this; there’d be objects not matching at all and probably objects matching multiple objects. The reason this worked so nicely in this case is that the sample data is artificial: I made that up using ADQL, too. The statement was:

select coord1(tpos) alpha, coord2(tpos) as delta, epoch from (
  select
    ivo_epoch_prop_pos(ra, dec, parallax,
      pmra, pmdec, radial_velocity, 2016, epoch) as tpos,
    epoch
  from (
    select d3l.*, 1900+75*rand() as epoch
    from gaia.dr3lite as d3l tablesample(1)
    where
      power(pmra,2)+power(pmdec,2)>500*500) as gs) as transgs

This is rather subquery-heavy and in addition uses two features that we have not seen yet. For one, rand() returns a random number between 0 and 1, which we use here to generate a random source epoch.

And there’s TABLESAMPLE; this is a prototype extension that may go into ADQL 2.2, perhaps somewhat modified. As used here, you pass in how many percent of the table you want to look at. Over a TOP 100 or so, this has the advantage that you get different rows every time you use it. It’s not some statistically valid sampling, though.

Still  we have lost one object. Can you find it? And can you guess why we have lost it?

Files


Markus Demleitner, Hendrik Heinl

Copyright Notice