Two approaches for working with TimeSeries objects are supported with the scikits.timeseries module. The first approach consists in using standard relational databases. The second is to use the PyTables package. Both approaches have their pros and cons.
Generally speaking, if concurrent access to a database is required, or if an interface with an online server is needed (for example, if time series charts must be generated in real-time for a web site), relational databases are probably more convenient.
Alternatively, if the only objective is to store data for a specific project, without having to deal with the administrative complexities of a relational database, then PyTables may be enough.
It is recommended to experiment with both approaches in order to find which one is the most performant or the most adequate to any application.
Storing and retrieving time series from standard relational databases is very simple once you know a few tricks. For these examples, we use the ceODBC database module which I have found to be more reliable and faster than the pyodbc module. However, I think these examples should work with the pyodbc module as well.
SQL Server 2005 Express edition is the database used in the examples. Other standard relational databases should also work, but I have not personally verified it.
Note that it is also possible to use this approach with sqlite databases using the sqlite3 module included with the standard python distribution. The below code will need to be modified a bit to work with sqlite, but the basic approach remains the same.
A database called “test” is assumed to have been created already along with a table called “test_table” described by the following query:
1 2 3 4 | CREATE TABLE test_table (
[date] [datetime] NULL,
[value] [decimal](18, 6) NULL
)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | import ceODBC as odbc
import scikits.timeseries as ts
test_series = ts.time_series(range(50), start_date=ts.now('b'))
# lets mask one value just to make things interesting
test_series[5] = ts.masked
conn = odbc.Connection(
"Driver={SQL Native Client};Server=localhost;Database=test;Uid=userid;Pwd=password;")
crs = conn.cursor()
# start with an empty table for these examples
crs.execute("DELETE FROM test_table")
# convert series to list of (datetime, value) tuples which can be interpreted
# by the database module. Note that masked values will get converted to None
# with the tolist method. None gets translated to NULL when inserted into the
# database.
_tslist = test_series.tolist()
# insert time series data
crs.executemany("""
INSERT INTO test_table
([date], [value]) VALUES (?, ?)
""", _tslist)
# Read the data back out of the database.
# Explicitly cast data of type decimal to float for reading purposes,
# otherwise you will get decimal objects for your result.
crs.execute("""
SELECT
[date],
CAST(ISNULL([value], 999) AS float) as vals, -- convert NULL's to 999
(CASE
WHEN [value] is NULL THEN 1
ELSE 0
END) AS mask -- retrieve a mask column
FROM test_table
ORDER BY [date] ASC
""")
# zip(*arg) converts row based results to column based results. This is the
# crucial trick needed for easily reading time series data from a relational
# database with Python
_dates, _values, _mask = zip(*crs.fetchall())
_series = ts.time_series(_values, dates=_dates, mask=_mask, freq='B')
# commit changes to the database
conn.commit()
conn.close()
|
PyTables is basically a Python interface to HDF5, a portable format designed to manage large and complex hierarchical datasets.
PyTables is build on top of HDF5 and numpy and provides a convenient interface to directly read ndarrays from HDF5 tables, or write ndarrays to HDF5 tables.
The scikits.timeseries.lib.tstables sub-module defines two new objects, MaskedTable and TimeSeriesTable. These classes are designed to store MaskedArray and TimeSeries as tables in PyTables. The module also introduces new methods to the tables.File object.
Based on tables.Table
Stores a MaskedArray into a table. The initial MaskedArray is transformed into a structured flexible-type ndarray before storage:
If the initial MaskedArray has no named fields (standard type), the resulting ndarray has two named fields: _data and _mask. The _data field has the same type as the original array, while the _mask field is always boolean. Note that a mask is always created, even if the initial MaskedArray has no missing values.
If the initial MaskedArray object has named fields (structured array), the resulting ndarray has as many fields as the initial array, with the same name. Each field of the result has two nested sub-fields, _data and _mask. The _data subfield has the same dtype as the original field, while the _mask subfield will always be boolean.
For example, if the dtype of the initial array is:
>>> input_dtype = [('a', int), ('b', float), ('c', '|S3')]
The resulting ndarray will have a dtype:
>>> output_dtype = [('a', [('_data', int), ('_mask', bool)]),
... ('b', [('_data', float), ('_mask', bool)]),
... ('c', [('_data', '|S3'), ('_mask', bool)])]
When a MaskedTable is read, it is automatically retransformed into a MaskedArray.
Additional information about the MaskedArray (such as baseclass, fill_value...) is stored into a dictionary named special_attrs. This information can be accessed through the attr attribute of the table.
Reads the current MaskedTable.
Parameters: | start : {None, int}, optional
stop : {None, int}, optional
step : {None, int}, optional
field : {None, str}, optional
|
---|---|
Returns: | masked_array :
|
Based on MaskedTable
Stores a TimeSeries object into a table. The initial TimeSeries is transformed into a structured ndarray before storage:
If the initial TimeSeries has no named fields (standard dtype), the resulting ndarray has three named fields: _dates, _data and _mask. The _data field has the same dtype as the original object. The _mask field is always boolean. The _dates field is always integer, and corresponds to the integer representation of the underlying DateArray.
Note that a mask is always created, even if the initial TimeSeries has no missing values.
If the initial TimeSeries has named fields (flexible type), the resulting ndarray has the same named fields as the initial array, with the addition of an extra field _dates. The _dates field is always integer, and corresponds to the integer representation of the underlying DateArray. Each other field is composed of two nested sub-fields, _data and _mask. The _data subfield has the same dtype as the original field, while the _mask subfield is always boolean.
Additional information about the TimeSeries (such as baseclass, fill_value, or the dates frequency...) is stored into a dictionary named special_attrs. This information can be accessed through the attr attribute of the table.
To create a TimeSeriesTable, just use the File.createTimeSeriesTable method of a standard tables.File object.
Reads the current TimeSeriesTable.
Parameters: | coords : sequence
field : {None, str}, optional
|
---|---|
Returns: | time_series :
|