strup — string unpack

This Python package is for unpacking basic objects from a text string. The standard data types string, int, float and bool are supported.

The function unpack()

We may extract the objects from a text string text using the utility function unpack(fmt, text). Each format character in the string fmt indicates the data type for the corresponding object.

>>> from strup import unpack
>>> i, x, s, ok = unpack("ifs?", "5 2.3   ole  True")
>>> i, x, s, ok
(5, 2.3, 'ole', True)

The format characters for the data types are consistent with the syntax applied in the standard library module struct for handling of binary data. Characters in fmt are case sensitive.

Character

Data Object

i

int

f

float

s

string

?

bool

.

ignore this item

Each eventual dot inside fmt indicates that the corresponding item should not be part of the result.

>>> unpack("f..s", "2.3 ole 55   dole")
(2.3, 'dole')

In case of bool objects, the actual item of text must follow the convention applied in distutils.util.strtobool. Consequently, y, yes, t, true, on and 1 are interpreted as True and n, no, f, false, off and 0 as False. For all other values a ValueError exception is raised.

>>> strup.unpack("?????s????", "NO 0 F off False ---  yes 1 ON TruE")
(False, False, False, False, False, '---', True, True, True, True)

The set of items to consider from the string text, is by default the items returned from the standard library text.split() method.

Only the len(fmt) first items of text.split() are considered. Trailing dots are not needed in fmt and should not be specified.

Optional Parameters

The optional argument sep as defined in the standard Python string.split() is also applicable in this context.

>>> unpack("f..s", " 2.3 ,ole,55,   dole", sep=',')
(2.3, '   dole')

By specifying the optional parameter none=True, zero-sized string items in text are interpreted as None independent of the format character. By default none=False.

>>> unpack("fissi", "2.3,,, ,12", sep=',', none=True)
(2.3, None, None, ' ', 12)

String objects are often defined using quotes. The optional argument quote has default value None but may be " or '.

>>> unpack("isf", "100 'Donald Duck' 125.6", quote="'")
(100, 'Donald Duck', 125.6)

Eventual quotes inside quoted strings are controlled using the optional argument quote_escape. By default quote_escape=None means that internal quotes are identified in text using double quotes

>>> unpack("isf", "100 'She''s the best' 125.6", quote="'")
(100, "She's the best", 125.6)
>>> unpack("isf", '3 "A ""quote"" test"  93.4 ignored', quote='"')
(3, 'A "quote" test', 93.4)

However, other escape sequences are supported like quote_escape=r"\'" or quote_escape=r'\"'

>>> unpack("isf", r"100 'She\'s the best' 125.6", quote="'", quote_escape=r"\'")
(100, "She's the best", 125.6)

The class Unpack

All processing within the function unpack(), as described above, is handled by the class Unpack.

>>> from strup import Unpack

All arguments for the function unpack(), except text, are handled by the constructor of Unpack. This constructor also performs preprocessing. Finally, Unpack.__call__() process the actual text.

Consequently, when the same unpack pattern is applied in loops, we may benefit from utilizing Unpack directly.

>>> mydecode = Unpack('.s..f', quote='"')     # Preprocess the pattern
>>> for line in ['5.3 "Donald Duck" 2 yes 5.4',
                 '-2.2 "Uncle Sam" 4  no 1.5',
                 '3.3  "Clint Eastwood" 7 yes 6.5']:
...      mydecode(line)
("Donald Duck", 5.4)
("Uncle Sam", 1.5)
("Clint Eastwood", 6.5)

Exception Handling

Exceptions

Description

ValueError

Input error with relevant error message

>>> w1, w2, ival, w3 = unpack("ssis", "you,need,some,help", sep=",")
Traceback (most recent call last):
   File "e:\repositories\github\jeblohe\strup\strup\unpack.py", line 85, in unpack
   raise ValueError(msg)
ValueError: strup.unpack()
fmt='ssis'
text='you,need,some,help'
argv=(), kwargs={'sep': ','}
Error decoding element 2:'some' of items=['you', 'need', 'some', 'help']

API

Docstrings from the source code are provided here.

Considerations

A major goal with strup is to provide a clean and intuitive interface. If standard string methods are too low level and the re-module adds too much complexity, then strup might be your compromise.

Backward compatibility of the API is strongly emphasized.

strup will not grow into a general purpose parser. Text processing is in general a comprehensive topic. For high volume text processing it is recommended to apply optimized packages like numpy and pandas.

Installation

This package is platform independent and available from PyPI and Anaconda.

To install strup from PyPI:

pip install strup           # For end users
pip install -e .[dev]       # For package development (from the root of your strup repo)

or from Anaconda:

conda install -c jeblohe strup

The source code is hosted on GitHub. Continuous integration at CircleCI. The code is extensively tested on Python 2.7, 3,4, 3.5, 3.6, 3.7, 3.8 and 3.9. The test coverage is reported by Coveralls.

License

This software is licensed under the MIT-license.

Version

1.0.0 - 2020.10.24

First official release