strup
— API¶
strup is a package for unpacking basic data objects from a text string.
strup is written by Jens B. Helmers (c) 2020
SPDX-License-Identifier: MIT
The function unpack¶
-
strup.
unpack
(fmt, text, *args, **kwargs)¶ Extract basic data types from text based on fmt.
- Parameters
fmt (str) – fmt[i] defines the type of item i in the output tuple. fmt[i] must be ‘i’, ‘f’, ‘s’, ‘?’ or ‘.’. Item i will be ignored if fmt[i]==’.’.
text (str) – The text string to extract the objects from
- Other Parameters
*args (list, optional) – Additional variable length argument list to be submitted to the constructor of Unpack
**kwargs (dict, optional) – Additional keyword arguments to be submitted to the constructor of Unpack
- Returns
The tuple of objects parsed from text.
- Return type
tuple
- Raises
ValueError – If any parsing error occur.
Examples
>>> unpack("ifs?", "5 2.3 ole True") (5, 2.3, 'ole', True) >>> unpack("isf", "100 'Donald Duck' 125.6", quote="'") (100, 'Donald Duck', 125.6)
The class Unpack¶
-
class
strup.
Unpack
(fmt, sep=None, none=False, quote=None, quote_escape=None)¶ Unpack is a Python package for unpacking basic data types from a text string.
Each item is of type ‘int’, ‘float’, ‘string’ or ‘bool’ depending on a format code in the constructor.
-
__init__
(fmt, sep=None, none=False, quote=None, quote_escape=None)¶ Constructor for Unpack.
- Parameters
fmt (str) – fmt[i] defines the type of item i in the output tuple. fmt[i] must be ‘i’, ‘f’, ‘s’, ‘?’ or ‘.’. Item i will be ignored if fmt[i]==’.’.
sep (str or None, optional) – String to separate items. See string.split() method.
none (bool, optional) – If True: Zero-sized items are interpreted as None.
quote (str or None, optional) – String items are sometimes enclosed by quote characters. Quotes are mandatory if string items includes the sep or quote characters. A quote character inside an item must be escaped by ‘quote_escape’. (See example below). It is not possible to apply quotes if quote==’‘.
quote_escape (str or None, optional) – Typical values are ‘”“’, “’‘”, r’”’ or “’”. quote_escape = None is interpreted as quote_escape = quote*2
- Raises
ValueError – If any parsing error occur.
Examples
See the __call__() examples for application of these decoders:
>>> decode1 = Unpack('ifssis') >>> decode2 = Unpack('.fs', sep=',') >>> decode3 = Unpack('isfs', sep=' ', quote='"', quote_escape='""')
-
__call__
(text)¶ Extract the tuple of objects by parsing text based on self._fmt
- Parameters
text (str) – The text string to extract the objects from
- Returns
The tuple of objects parsed from text.
- Return type
tuple
- Raises
ValueError – If any parsing error occur.
Examples
decode1, decode2 and decode3 as defined in the __init__() examples:
>>> decode1("3 4.5 ole dole 5 doffen") (3, 4.5, 'ole', 'dole', 5, 'doffen') >>> decode2("3,4.5, ole,dole,5,doffen") (4.5, ' ole') >>> decode3('3 "A ""quote"" test" 93.4 knut ignored') (3, 'A "quote" test', 93.4, 'knut')
-
strup
— string unpack¶
This Python package is for unpacking basic objects from a text string.
The standard data types string
, int
, float
and bool
are supported.
The function unpack()
¶
We may extract the objects from a text string text
using the utility
function unpack(fmt, text)
.
Each format character in the string fmt
indicates the data type for the corresponding object.
>>> from strup import unpack
>>> i, x, s, ok = unpack("ifs?", "5 2.3 ole True")
>>> i, x, s, ok
(5, 2.3, 'ole', True)
The format characters for the data types are consistent with the syntax applied in the standard library module
struct for handling of binary data.
Characters in fmt
are case sensitive.
Character |
Data Object |
---|---|
i |
int |
f |
float |
s |
string |
? |
bool |
. |
ignore this item |
Each eventual dot inside fmt
indicates that the corresponding item should not be part of the result.
>>> unpack("f..s", "2.3 ole 55 dole")
(2.3, 'dole')
In case of bool objects, the actual item of text
must follow the convention applied in
distutils.util.strtobool.
Consequently, y, yes, t, true, on and 1 are interpreted as True and
n, no, f, false, off and 0 as False. For all other values
a ValueError
exception is raised.
>>> strup.unpack("?????s????", "NO 0 F off False --- yes 1 ON TruE")
(False, False, False, False, False, '---', True, True, True, True)
The set of items to consider from the string text
, is by default
the items returned from the standard library text.split()
method.
Only the len(fmt)
first items of text.split()
are considered. Trailing dots
are not needed in fmt
and should not be specified.
Optional Parameters¶
The optional argument sep
as defined in the standard Python string.split()
is
also applicable in this context.
>>> unpack("f..s", " 2.3 ,ole,55, dole", sep=',')
(2.3, ' dole')
By specifying the optional parameter none=True
, zero-sized string items in text
are interpreted as None
independent of the format character. By default none=False
.
>>> unpack("fissi", "2.3,,, ,12", sep=',', none=True)
(2.3, None, None, ' ', 12)
String objects are often defined using quotes. The optional argument quote
has default value
None
but may be "
or '
.
>>> unpack("isf", "100 'Donald Duck' 125.6", quote="'")
(100, 'Donald Duck', 125.6)
Eventual quotes inside quoted strings are controlled using the optional argument quote_escape
.
By default quote_escape=None
means that internal quotes are identified in text
using double quotes
>>> unpack("isf", "100 'She''s the best' 125.6", quote="'")
(100, "She's the best", 125.6)
>>> unpack("isf", '3 "A ""quote"" test" 93.4 ignored', quote='"')
(3, 'A "quote" test', 93.4)
However, other escape sequences are supported like quote_escape=r"\'"
or quote_escape=r'\"'
>>> unpack("isf", r"100 'She\'s the best' 125.6", quote="'", quote_escape=r"\'")
(100, "She's the best", 125.6)
The class Unpack
¶
All processing within the function unpack()
, as described above, is handled by the class
Unpack
.
>>> from strup import Unpack
All arguments for the function unpack()
, except text
, are handled by the constructor of Unpack
.
This constructor also performs preprocessing. Finally, Unpack.__call__()
process the actual text
.
Consequently, when the same unpack pattern is applied in loops, we may benefit from utilizing Unpack
directly.
>>> mydecode = Unpack('.s..f', quote='"') # Preprocess the pattern
>>> for line in ['5.3 "Donald Duck" 2 yes 5.4',
'-2.2 "Uncle Sam" 4 no 1.5',
'3.3 "Clint Eastwood" 7 yes 6.5']:
... mydecode(line)
("Donald Duck", 5.4)
("Uncle Sam", 1.5)
("Clint Eastwood", 6.5)
Exception Handling¶
Exceptions |
Description |
---|---|
|
Input error with relevant error message |
>>> w1, w2, ival, w3 = unpack("ssis", "you,need,some,help", sep=",")
Traceback (most recent call last):
File "e:\repositories\github\jeblohe\strup\strup\unpack.py", line 85, in unpack
raise ValueError(msg)
ValueError: strup.unpack()
fmt='ssis'
text='you,need,some,help'
argv=(), kwargs={'sep': ','}
Error decoding element 2:'some' of items=['you', 'need', 'some', 'help']
Considerations¶
A major goal with strup
is to provide a clean and intuitive interface.
If standard string methods
are too low level and the re-module
adds too much complexity, then strup
might be your compromise.
Backward compatibility of the API is strongly emphasized.
strup
will not grow into a general purpose parser.
Text processing is in general a comprehensive topic.
For high volume text processing it is recommended to apply optimized packages like
numpy and pandas.
Installation¶
This package is platform independent and available from PyPI and Anaconda.
To install strup
from PyPI:
pip install strup # For end users
pip install -e .[dev] # For package development (from the root of your strup repo)
or from Anaconda:
conda install -c jeblohe strup
The source code is hosted on GitHub. Continuous integration at CircleCI. The code is extensively tested on Python 2.7, 3,4, 3.5, 3.6, 3.7, 3.8 and 3.9. The test coverage is reported by Coveralls.
License¶
This software is licensed under the MIT-license.