Table Of Contents

Previous topic

scikits.timeseries.convert_to_float

Next topic

scikits.timeseries.TimeSeries.fill_value

This Page

TimeSeries objects

class TimeSeries

A subclass of MaskedArray designed to manipulate time series.

Parameters:
data : {array_like}

Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here:

  • a sequence of objects (numbers, characters, objects);
  • a ndarray or one of its subclass. In particular, MaskedArray and TimeSeries are recognized.
dates : {DateArray}

A DateArray instance storing the date information.

autosort : {True, False}, optional

Whether to sort the series in chronological order.

**optional_parameters :

All the parameters recognized by MaskedArray are also recognized by TimeSeries.

See also

MaskedArray

A TimeSeries object is the combination of three ndarrays:

These three arrays can be accessed as attributes of a TimeSeries object. Another very useful attribute is series, that gives the possibility to directly access data and mask as a masked array.

As TimeSeries objects subclass MaskedArray, they inherit all their attributes and methods, as well as the attributes and methods of regular ndarrays.

Attributes

... specific to TimeSeries

data
Returns a view of a TimeSeries as a ndarray. This attribute is read-only and cannot be directly set.
mask

Returns the mask of the object, as a ndarray with the same shape as data, or as the special value nomask (equivalent to False). This attribute is writable and can be modified.

If data has a standard dtype (no named fields), the dtype of the mask is boolean. If data is a structured array with named fields, the mask has the same structure as the data‘s, but each field is atomically boolean.

In any case, a value of True in the mask indicates that the corresponding value of the series is invalid.

series
Returns a view of a TimeSeries as a MaskedArray. This attribute is read-only and cannot be directly set
dates
Returns the DateArray object of the dates of the series. This attribute is writable and can be modified. However, the size of the array must be zero or match either the size of the series or its length.
varshape
Returns the number of equivalent variables for each date. If varshape == (), the series has only one variable and is called a 1V-series.

Construction

To construct a TimeSeries object, the simplest method is to directly call the class constructor with the proper parameters.

However, the recommended way is to use the time_series factory function.

time_series(data, dates=None, start_date=None, length=None, freq=None, mask=False, dtype=None, copy=False, fill_value=None, keep_mask=True, hard_mask=False, autosort=True)

Creates a TimeSeries object.

The data parameter can be a valid TimeSeries object. In that case, the dates, start_date or freq parameters are optional: if none of them is given, the dates of the result are the dates of data.

If data is not a TimeSeries, then dates must be either None or an object recognized by the date_array function (used internally):

  • an existing DateArray object;
  • a sequence of Date objects with the same frequency;
  • a sequence of datetime.datetime objects;
  • a sequence of dates in string format;
  • a sequence of integers corresponding to the representation of Date objects.

In any of the last four possibilities, the freq parameter is mandatory.

If dates is None, a continuous DateArray is automatically constructed as an array of size len(data) starting at start_date and with a frequency freq.

Parameters:

data : array_like

Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here. data can also be a TimeSeries object.

dates : {None, var}, optional

A sequence of dates corresponding to each entry.

start_date : {Date}, optional

Date corresponding to the first entry of the data (index 0). This parameter must be a valid Date object, and is mandatory if dates is None and if data has a length greater or equal to 1.

length : {integer}, optional

Length of the dates.

freq : {freq_spec}, optional

A valid frequency specification, as a string or an integer. This parameter is mandatory if dates is None. Otherwise, the frequency of the series is set to the frequency of the dates input.

See also

numpy.ma.masked_array
Constructor for the MaskedArray class.
scikits.timeseries.date_array
Constructor for the DateArray class.

Notes

  • All other parameters recognized by the numpy.ma.array constructor are also recognized by the function.
  • If data is zero-sized, only the freq parameter is mandatory.

Note

By default, the series is automatically sorted in chronological order. This behavior can be overwritten by setting the keyword autosort=False.

Dates and data compatibility

The simplest example of a TimeSeries consists in a series series of one variable, where a date is associated with each element of the array. In that case, the dates attribute is a DateArray with the same size as the underlying array.

For example, we can create a 4-element series:

>>> first_date = ts.Date('D', '2009-01-01')
>>> series = ts.time_series([1, 2, 3, 4], start_date=first_date)
>>> series
timeseries([1 2 3 4],
   dates = [01-Jan-2009 ... 04-Jan-2009],
   freq  = D)

Note that with the use of the start_date keyword, the size of the dates attribute is automatically adjusted by time_series to match the size of the input data.

The dates can now be modified in place. For example, they can be shifted by one week with the following command.

>>> series.dates +=7
>>> series
timeseries([1 2 3 4],
   dates = [08-Jan-2009 ... 11-Jan-2009],
   freq  = D)

The dates can also be changed by setting the dates attribute to another DateArray object. In that case, the size of the new dates must match the size of the series, or a TimeSeriesCompatibilityError is raised. Setting the dates attribute to an object of a different type raises a TypeError exception.

It is often convenient to manipulate a series of several variables at once. Once possibility is to use a structured array as input, as illustrated by the following example:

>>>  series = ts.time_series(zip(np.random.normal(0, 1, 10),
...                              np.random.uniform(0, 1, 10)),
...                          dtype=[('norm', float), ('unif', float)],
...                          start_date=ts.Date('D', '2001-01-01'))

In this example, series consists of two fields (‘norm’ and ‘unif’). Note that in this example, the two fields have the same type (float), but this is not a requirement. Each field can be accessed as an independent TimeSeries using series['norm'] and series['unif'].

In practice, each individual entry of series is a numpy.void object. The series as a whole behaves as a 1D masked array, as represented by the shape of the series: series.shape = (10,). Because series is a 1D array, the size of series.dates must match series.size.

Despite the convenience of this approach to manipulate multi-variable series, it presents a serious disadvantage: structured arrays are usually not recognized by standard numpy functions.

An alternative is then to represent a series as a two-dimensional array, using columns as variables and rows as actual obervations. In that case, all the variables must have the same type, and the size of the dates attibute must match the length of the series.

More generally, it is possible to create a multi-variable series as a nD array. The corresponding dates must then satisfy the condition series.dates.size == series.shape[0] or a TimeSeriesCompatibilityError is raised. The specific attribute varshape is then set to keep track of the number of variables.

For example, a series of 50 years of monthly data can be represented as a (600,)-array of observations at a monthly frequency, or as a (50,12)-array of observations at an annual frequency.

>>> start - ts.Date('M', '2001-01')
>>> data = np.random.uniform(-1, +1, 50*12).reshape(50, 12)
>>> mseries = ts.time_series(data, start_date=start, length=50*12)
>>> aseries = ts.time_series(data, start_date=start.asfreq('Y'), length=50)

Both series have the same shape, (50, 12), but mseries is a series of one variable, with mseries.varshape == (), while aseries is a series of 12 variables, aseries.varshape == (12,), each variable corresponding to a month.

>>> (mseries.shape, mseries.varshape)
((50, 12), ())
>>> (aseries.shape, aseries.varshape)
((50, 12), (12,))

Because aseries is basically a 2D array, we can easily compute annual and monthly means. Thus, monthly means over the whole 50 years can be calculated at once with the mean method, using axis=0 as parameter. We can also compute the equivalent of 50 years of annual data using mean method, this time with axis=1.

>>> amean = aseries.mean(axis=1)
>>> amean.shape = (50,)
>>> mmean = aseries.mean(axis=0)
>>> mmean.shape = (12,)

Another example of multi-variable series would be one year of daily (256x256) raster map. This dataset can easily be represented as a (365,256,256)-array, and a corresponding series created with the following code:

>>> data = np.random.uniform(-1, +1, 365*256*256).reshape(365, 256, 256)
>>> newseries = ts.time_series(data, start_date=ts.now('D'))

Methods

Date information

The following methods access information about the dates attribute:

TimeSeries.get_steps()
Returns the time steps between consecutive dates, in the same unit as the instance frequency.
TimeSeries.has_missing_dates()
Returns whether the instance has missing dates.
TimeSeries.has_duplicated_dates()
Returns whether the instance has duplicated dates.
TimeSeries.is_full()
Returns whether the instance has no missing dates.
TimeSeries.is_valid()
Returns whether the instance is valid (that there are no missing nor duplicated dates).
TimeSeries.is_chronological()
Returns whether the instance is sorted in chronological order.
TimeSeries.date_to_index
TimeSeries.sort_chronologically

Shape manipulation

For reshape, resize, and transpose, the single tuple argument may be replaced with n integers which will be interpreted as an n-tuple.

TimeSeries.flatten
TimeSeries.ravel
TimeSeries.reshape
TimeSeries.resize
TimeSeries.split
TimeSeries.squeeze
TimeSeries.swapaxes
TimeSeries.transpose
TimeSeries.T