gavo.user.info module

Infrastructure for obtaining metadata about on tables and columns in the data center.

This has the command line interface for dachs info, and the annotation machinery is also used by dachs limits; the common functionality should probably move to rsc at some point (cf. rsc.dbtable.annotateDBTable).

The core here is annotateDBTable. This will gather various pieces of table metadata and fetch column metadata trhough _annotateColumns. That, in turn, constructs a query fetching the metadata from the database. Since writing this query is a bit involved, it is done in terms of a sequence of AnnotationMakers. These combine SQL making (through OutputFields) and then pulling the results out of the database result (in their annotate methods). The end result is that keys in the columns’ annotations dictionaries are added.

class gavo.user.info.AnnotationMaker(column)[source]

Bases: object

A class for producing column annotations.

An annotation simply is a dictionary with some well-known keys. They are generated from DB queries. It is this class’ responsibility to collect the DB query result columns pertaining to a column and produce the annotation dictionary from them.

To make this happen, it is constructed with the column; then, for each property queried, addPropertyKey is called. Finally, addAnnotation is called with the DB result row (see annotateDBTable) to actually make and attach the dictionary.

annotate(resultRow)[source]

builds an annotation of the column from resultRow.

resultRow is a dictionary containing values for all keys registered through addPropertyKey.

If the column already has an annotation, only the new keys will be overwritten.

doesWork()[source]

returns a true value if the annotator will contribute a query.

getOutputFieldFor(propName, propFunc, nameMaker, extractor=None)[source]

returns an OutputField that will generate a propName annotation from the propFunc function.

propFunc for now has a %(name)s where the column name must be inserted.

nameMaker is something like a base.VOTNameMaker.

extractor can be a callable receiving the result of propFunc and the annotation dictionary; it the must modify the annotation dictionary to reflect the result (the default is to just add the result under propName).

gavo.user.info.annotateDBTable(td, samplePercent=None, acquireColumnMeta=True)[source]

returns the TableDef td with domain annotations for its columns.

td must be an existing on-Disk table. If acquireColumnMeta is False, only the size of the table is being estimated.

samplePercent uses TABLESAMPLE SYSTEM to only look at about this percentage of the rows (which doesn’t work for views).

The annotations come in a dictionary-valued attribute annotations on the column object. The keys in there correspond to column names from //dc_tables.

This will not attempt to annotate columns that already have min, max, or options in their values.

This will only look at columns that have appropriate types.

gavo.user.info.estimateTableSize(tableName, connection)[source]

returns an estimate for the size of the table tableName.

This is precise for tables postgres guesses are small (currently, 1e6 rows); for larger tables, we round up postgres’ estimate.

The function will return None for non-existing tables.

gavo.user.info.getAnnotators(td)[source]

returns a pair of output fields and annotators to gather column statistcs for td.

The annotators are AnnotationMaker instances that will add the relevant annotations to td’s columns.

The rules applying are given in annotateDBTable.

gavo.user.info.getDefaultSamplePercent(tableSize)[source]

returns a hopefully sensible value for samplePercent depending on the tableSize.

This is based on the gut feeling that up to 1e6 rows, we can just scan all, whereas for 1e7 50% is fine, and then: 1e8: 20%, 1e9: 10%, 1e10: 5%. I think there might be a theory for that.

gavo.user.info.getMOCForStdTable(td, order=6)[source]

returns a MOC for a tableDef with one of the standard protocol mixins.

The function knows about SCS and SSAP for now; protocols are tested for in this order.

gavo.user.info.getMOCQuery(td, order)[source]

returns a MOC-generating query for a tableDef with standard columns.

(this is a helper for getMOCForStdTable)

gavo.user.info.getObscoreCoverageQuery(td, order)[source]

returns a database query for getting a MOC for tables with obscore columns

This will return None if no such query can be built.

gavo.user.info.getSCSCoverageQuery(td, order)[source]

returns a database query for getting a MOC for a table suitable for cone search.

This will return None if no such query can be built.

gavo.user.info.getSIAPCoverageQuery(td, order)[source]

returns a database query for getting a MOC for a table using //siap#pgs (i.e., SIAv1)

This will return None if no such query can be built.

For SIAv2, no such thing is available yet, the obscore querier below should work; however, we don’t really have standalone SIAv2 resources in DaCHS yet.

gavo.user.info.getSSAPCoverageQuery(td, order)[source]

returns a database query for getting a MOC for a table using one of our standard SSAP mixins.

This will return None if no such query can be built.

gavo.user.info.getSpectralLimitsExprs(td)[source]

returns the name of columns hopefully containing minimal and maximal spectral coverage.

As transformer function, we currently return the identity, as we’re only using IVOA standard columns anyway. Based on unit and ucd, we could pretty certainly do better.

If this doesn’t find any, it raise a NotFoundError.

gavo.user.info.getSpectralTransformer(sourceUnit)[source]

returns a function transforming a spectral value to joules, as required by VODataService.

gavo.user.info.getTimeLimitsExprs(td)[source]

returns the names of columns hopefully containing minimal and maximal time coverage of each row of a table defined by td.

As required by getScalarLimits, this will also return a function that (hopefully) turns the detected columns to julian years,

This tries a couple of known criteria for columns containing times in some order, and the first one matching wins.

This will raise a NotFoundError if none of our heuristics work.

gavo.user.info.iterScalarLimits(td, columnsDeterminer)[source]

yields Internal instances for time or spectral coverage.

ColumnsDeterminer is a function td -> (mincol, maxcol, transformer) expected to raise a NotFoundError if no appropriate columns can be found. This is either getTimeLimitsExprs or getSpectralLimitsExprs at this point. transformer here is a function val -> val turning what’s coming back from the database to what’s expected by the coverage machinery (e.g., MJD -> jYear).

It’s conceivable that at some point we’ll give multiple intervals, and hence this is an iterator (that can very well yield nothing for a variety of reasons).

gavo.user.info.main()[source]
gavo.user.info.parseCmdline()[source]
gavo.user.info.printTableInfo(td, samplePercent=None)[source]

tries to obtain various information on the properties of the database table described by td.