Package gavo :: Package utils :: Module texttricks

Module texttricks

Formatting, text manipulation, string constants, and such.

Classes
	NameMap is a name mapper fed from a simple text file.

Functions

formatSize(val, sf=1)
returns a human-friendly representation of a file size.

source code

makeEllipsis(aStr, maxLen=60)
returns aStr cropped to maxLen if necessary.

source code

makeLeftEllipsis(aStr, maxLen=60)
returns aStr shortened to maxLen by dropping prefixes if necessary.

source code

makeSourceEllipsis(sourceToken)
returns a string hopefully representative for a source token.

source code

getFileStem(fPath)
returns the file stem of a file path.

source code

formatSimpleTable(data, stringify=True, titles=None)
returns a string containing a text representation of tabular data.

source code

getRelativePath(fullPath, rootPath, liberalChars=True)
returns rest if fullPath has the form rootPath/rest and raises an exception otherwise.

source code

resolvePath(rootPath, relPath)
joins relPath to rootPath and makes sure the result really is in rootPath.

source code

fixIndentation(code, newIndent, governingLine=0)
returns code with all whitespace from governingLine removed from every line and newIndent prepended to every line.

source code

parsePercentExpression(literal, format)
returns a dictionary of parts in the %-template format.

source code

parseAssignments(assignments)
returns a name mapping dictionary from a list of assignments.

source code

hmsToDeg(hms, sepChar=None)
returns the time angle (h m s.decimals) as a float in degrees.

source code

dmsToDeg(dmsAngle, sepChar=None)
returns the degree minutes seconds-specified dmsAngle as a float in degrees.

source code

fracHoursToDeg(fracHours)
returns the time angle fracHours given in decimal hours in degrees.

source code

degToHms(deg, sepChar=' ', secondFracs=3)
converts a float angle in degrees to an time angle (hh:mm:ss.mmm). source code

degToDms(deg, sepChar=' ', secondFracs=2)
converts a float angle in degrees to a sexagesimal string. source code

datetimeToRFC2616(dt)
returns a UTC datetime object in the format requried by http.

source code

parseRFC2616Date(s)
returns seconds since unix epoch representing UTC from the HTTP-compatible time specification s.

source code

timegm(tm, epoch=25203)

source code

formatRFC2616Date(secs=None)
returns an RFC2616 date string for UTC seconds since unix epoch.

source code

parseISODT(literal)
returns a datetime object for a ISO time literal.

source code

parseDefaultDatetime(literal)

source code

parseDefaultDate(literal)

source code

parseDefaultTime(literal)

source code

roundToSeconds(dt)
returns a datetime instance rounded to whole seconds.

source code

formatISODT(dt)
returns some ISO8601 representation of a datetime instance.

source code

replaceXMLEntityRefs(unicodeString)

source code

ensureOneSlash(s)
returns s with exactly one trailing slash.

source code

iterSimpleText(f)
iterates over ``(physLineNumber, line)`` in f with some usual conventions for simple data files.

source code

getRandomString(length)
returns a random string of harmless printable characters.

source code

safe_str(val)

source code

parseAccept(aString)
parses an RFC 2616 accept header and returns a dict mapping media type patterns to their (unparsed) parameters.

source code

Variables
	floatRE = `'[+-]?(?:\\d+\\.?\\d*\|\\.\\d+)(?:[eE][+-]?\\d+)?'`
	dateRE = `re.compile(r'\d\d\d\d-\d\d-\d\d$')`
	datetimeRE = `re.compile(r'\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\dZ?$')`
	identifierPattern = `re.compile(r'[A-Za-z_][A-Za-z0-9_]*$')`
	isoTimestampFmt = `'%Y-%m-%dT%H:%M:%SZ'`
	isoTimestampFmtNoTZ = `'%Y-%m-%dT%H:%M:%S'`
	entityrefPat = `re.compile(r'&([^;])+;')`
	looksLikeURLPat = `re.compile(r'[a-z]{2,5}://')`
	xmlEntities = `{'amp': '&', 'apos': '\'', 'gt': '>', 'lt': '<',...`
	__package__ = `'gavo.utils'`

Function Details

makeEllipsis(aStr, maxLen=60)

source code

returns aStr cropped to maxLen if necessary.

Cropped strings are returned with an ellipsis marker.

makeLeftEllipsis(aStr, maxLen=60)

source code

returns aStr shortened to maxLen by dropping prefixes if necessary.

Cropped strings are returned with an ellipsis marker. >>> makeLeftEllipsis("0123456789"*2, 11) '...23456789'

makeSourceEllipsis(sourceToken)

source code

returns a string hopefully representative for a source token.

These are, in particular, passed around withing rsc.makeData. Usually, these are (potentially long) strings, but now and then they can be other things with appallingly long reprs. When DaCHS messages need to refer to such sources, this function is used to come up with representative strings.

getFileStem(fPath)

source code

returns the file stem of a file path.

The base name is what remains if you take the base name and split off extensions. The extension here starts with the last dot in the file name, except up to one of some common compression extensions (.gz, .xz, .bz2, .Z, .z) is stripped off the end if present before determining the extension.

>>> getFileStem("/foo/bar/baz.x.y")
'baz.x'
>>> getFileStem("/foo/bar/baz.x.gz")
'baz'
>>> getFileStem("/foo/bar/baz")
'baz'

Decorators:

@codetricks.document

formatSimpleTable(data, stringify=True, titles=None)

source code

returns a string containing a text representation of tabular data.

All columns of data are simply stringified, then the longest member determines the width of the text column. The behaviour if data does not contain rows of equal length is unspecified; data must contain at least one row.

If you have serialised the values in data yourself, pass stringify=False.

If you pass titles, it must be a sequence of strings; they are then used as table headers; the shorter of data[0] and titles will determine the number fo columns displayed.

getRelativePath(fullPath, rootPath, liberalChars=True)

source code

returns rest if fullPath has the form rootPath/rest and raises an exception otherwise.

Pass ``liberalChars=False`` to make this raise a ValueError when URL-dangerous characters (blanks, amperands, pluses, non-ASCII, and similar) are present in the result. This is mainly for products.

Decorators:

@codetricks.document

fixIndentation(code, newIndent, governingLine=0)

source code

returns code with all whitespace from governingLine removed from every line and newIndent prepended to every line.

governingLine lets you select a line different from the first one for the determination of the leading white space. Lines before that line are left alone.

>>> fixIndentation("  foo\n  bar", "")
'foo\nbar'
>>> fixIndentation("  foo\n   bar", " ")
' foo\n  bar'
>>> fixIndentation("  foo\n   bar\n    baz", "", 1)
'foo\nbar\n baz'
>>> fixIndentation("  foo\nbar", "")
Traceback (most recent call last):
Error: Bad indent in line 'bar'

parsePercentExpression(literal, format)

source code

returns a dictionary of parts in the %-template format.

format is a template with %<conv> conversions, no modifiers are allowed. Each conversion is allowed to contain zero or more characters matched stingily. Successive conversions without intervening literals aren't really supported. There's a hack for strptime-type times, though: H, M, and S just eat two characters each if there's no seperator.

This is really only meant as a quick hack to support times like 25:33.

>>> r=parsePercentExpression("12,xy:33,","%a:%b,%c"); r["a"], r["b"], r["c"]
('12,xy', '33', '')
>>> sorted(parsePercentExpression("2357-x", "%H%M-%u").items())
[('H', '23'), ('M', '57'), ('u', 'x')]
>>> r = parsePercentExpression("12,13,14", "%a:%b,%c")
Traceback (most recent call last):
ValueError: '12,13,14' cannot be parsed using format '%a:%b,%c'

parseAssignments(assignments)

source code

returns a name mapping dictionary from a list of assignments.

This is the preferred form of communicating a mapping from external names to field names in records to macros -- in a string that contains ":"-seprated pairs seperated by whitespace, like "a:b b:c", where the incoming names are leading, the desired names are trailing.

If you need defaults to kick in when the incoming data is None, try _parseDestWithDefault in the client function.

This function parses a dictionary mapping original names to desired names.

>>> parseAssignments("a:b  b:c")
{'a': 'b', 'b': 'c'}

hmsToDeg(hms, sepChar=None)

source code

returns the time angle (h m s.decimals) as a float in degrees.

>>> "%3.8f"%hmsToDeg("22 23 23.3")
'335.84708333'
>>> "%3.8f"%hmsToDeg("22:23:23.3", ":")
'335.84708333'
>>> "%3.8f"%hmsToDeg("222323.3", "")
'335.84708333'
>>> hmsToDeg("junk")
Traceback (most recent call last):
ValueError: Invalid time with sepChar None: 'junk'

Decorators:

@codetricks.document

dmsToDeg(dmsAngle, sepChar=None)

source code

returns the degree minutes seconds-specified dmsAngle as a float in degrees.

>>> "%3.8f"%dmsToDeg("45 30.6")
'45.51000000'
>>> "%3.8f"%dmsToDeg("45:30.6", ":")
'45.51000000'
>>> "%3.8f"%dmsToDeg("-45 30 7.6")
'-45.50211111'
>>> dmsToDeg("junk")
Traceback (most recent call last):
ValueError: Invalid dms value with sepChar None: 'junk'

Decorators:

@codetricks.document

degToHms(deg, sepChar=`'` `'`, secondFracs=3)

source code

converts a float angle in degrees to an time angle (hh:mm:ss.mmm).

>>> degToHms(0)
'00 00 00.000'
>>> degToHms(122.056, secondFracs=1)
'08 08 13.4'
>>> degToHms(-0.056, secondFracs=0)
'-00 00 13'
>>> degToHms(-1.056, secondFracs=0)
'-00 04 13'
>>> degToHms(359.2222, secondFracs=4, sepChar=":")
'23:56:53.3280'
>>> "%.4f"%hmsToDeg(degToHms(256.25, secondFracs=9))
'256.2500'

degToDms(deg, sepChar=`'` `'`, secondFracs=2)

source code

converts a float angle in degrees to a sexagesimal string.

>>> degToDms(0)
'+0 00 00.00'
>>> degToDms(-0.25)
'-0 15 00.00'
>>> degToDms(-23.50, secondFracs=4)
'-23 30 00.0000'
>>> "%.4f"%dmsToDeg(degToDms(-25.6835, sepChar=":"), sepChar=":")
'-25.6835'

datetimeToRFC2616(dt)

source code

returns a UTC datetime object in the format requried by http.

This may crap when you fuzz with the locale. In general, when handling "real" times within the DC, prefer unix timestamps over datetimes and use the other *RFC2616 functions.

parseISODT(literal)

source code

returns a datetime object for a ISO time literal.

There's no real timezone support yet, but we accept and ignore various ways of specifying UTC.

>>> parseISODT("1998-12-14")
datetime.datetime(1998, 12, 14, 0, 0)
>>> parseISODT("1998-12-14T13:30:12")
datetime.datetime(1998, 12, 14, 13, 30, 12)
>>> parseISODT("1998-12-14T13:30:12Z")
datetime.datetime(1998, 12, 14, 13, 30, 12)
>>> parseISODT("1998-12-14T13:30:12.224Z")
datetime.datetime(1998, 12, 14, 13, 30, 12, 224000)
>>> parseISODT("19981214T133012Z")
datetime.datetime(1998, 12, 14, 13, 30, 12)
>>> parseISODT("19981214T133012+00:00")
datetime.datetime(1998, 12, 14, 13, 30, 12)
>>> parseISODT("junk")
Traceback (most recent call last):
ValueError: Bad ISO datetime literal: junk (required format: yyyy-mm-ddThh:mm:ssZ)

Decorators:

@codetricks.document

roundToSeconds(dt)

source code

returns a datetime instance rounded to whole seconds.

This also recklessly clears any time zone marker. So, don't pass in anything with a meaningful time zone.

formatISODT(dt)

source code

returns some ISO8601 representation of a datetime instance.

The reason for preferring this function over a simple str is that datetime's default representation is too difficult for some other code (e.g., itself); hence, this code suppresses any microsecond part and always adds a Z (where strftime works, utils.isoTimestampFmt produces an identical string).

The behaviour of this function for timezone-aware datetimes is undefined.

For convenience, None is returned as None

>>> formatISODT(datetime.datetime(2015, 10, 20, 12, 34, 22, 250))
'2015-10-20T12:34:22Z'
>>> formatISODT(datetime.datetime(1815, 10, 20, 12, 34, 22, 250))
'1815-10-20T12:34:22Z'
>>> formatISODT(datetime.datetime(2018, 9, 21, 23, 59, 59, 640000))
'2018-09-22T00:00:00Z'

iterSimpleText(f)

source code

iterates over ``(physLineNumber, line)`` in f with some usual 
conventions for simple data files.

You should use this function to read from simple configuration and/or
table files that don't warrant a full-blown grammar/rowmaker combo.
The intended use is somewhat like this::
        
        with open(rd.getAbsPath("res/mymeta")) as f:
                for lineNumber, content in iterSimpleText(f):
                        try:
                                ...
                        except Exception, exc:
                                sys.stderr.write("Bad input line %s: %s"%(lineNumber, exc))

The grammar rules are, specifically:

* leading and trailing whitespace is stripped
* empty lines are ignored
* lines beginning with a hash are ignored
* lines ending with a backslash are joined with the following line;
  to have intervening whitespace, have a blank in front of the backslash.

Decorators:

@codetricks.document

parseAccept(aString)

source code

parses an RFC 2616 accept header and returns a dict mapping media type patterns to their (unparsed) parameters.

If aString is None, an empty dict is returned

If we ever want to do fancy things with http content negotiation, this will be further wrapped to provide something implementing the complex RFC 2616 rules; this primitive interface really is intended for telling apart browsers (which accept text/html) from other clients (which hopefully do not) at this point.

>>> sorted(parseAccept("text/html, text/*; q=0.2; level=3").items())
[('text/*', 'q=0.2; level=3'), ('text/html', '')]
>>> parseAccept(None)
{}

Variables Details

xmlEntities

Value:

{'amp': '&', 'apos': '\'', 'gt': '>', 'lt': '<', 'quot': '"'}

Module texttricks

makeEllipsis(aStr, maxLen=60)

makeLeftEllipsis(aStr, maxLen=60)

makeSourceEllipsis(sourceToken)

getFileStem(fPath)

formatSimpleTable(data, stringify=True, titles=None)

getRelativePath(fullPath, rootPath, liberalChars=True)

fixIndentation(code, newIndent, governingLine=0)

parsePercentExpression(literal, format)

parseAssignments(assignments)

hmsToDeg(hms, sepChar=None)

dmsToDeg(dmsAngle, sepChar=None)

degToHms(deg, sepChar=' ', secondFracs=3)

degToDms(deg, sepChar=' ', secondFracs=2)

datetimeToRFC2616(dt)

parseISODT(literal)

roundToSeconds(dt)

formatISODT(dt)

iterSimpleText(f)

parseAccept(aString)

xmlEntities

degToHms(deg, sepChar=`'` `'`, secondFracs=3)

degToDms(deg, sepChar=`'` `'`, secondFracs=2)