lcdata is designed to handle large datasets of light curves. Light curves are represented as tables in ~astropy.table.Table format, and are very similar to the ones used in sncosmo. A dataset can be created in several ways. For example, we can create a dataset from a list of sncosmo-like light curves:

>>> import lcdata
>>> import sncosmo
>>> light_curves = [sncosmo.load_example_data() for i in range(5)]
>>> dataset = lcdata.from_light_curves(light_curves)

The individual light curves in this dataset can be accessed as dataset.light_curves.


The metadata associated with all of the light curves can be accessed from a common ~astropy.table.Table as dataset.meta:

>>> dataset.meta
      object_id        ra dec   type  redshift  x1  c          x0           t0
--------------------- --- --- ------- -------- --- --- ----------------- -------
lcdata_xbdwhv_0000000 nan nan Unknown      0.5 0.5 0.2 1.20482820761e-05 55100.0
lcdata_xbdwhv_0000001 nan nan Unknown      0.5 0.5 0.2 1.20482820761e-05 55100.0
lcdata_xbdwhv_0000002 nan nan Unknown      0.5 0.5 0.2 1.20482820761e-05 55100.0
lcdata_xbdwhv_0000003 nan nan Unknown      0.5 0.5 0.2 1.20482820761e-05 55100.0
lcdata_xbdwhv_0000004 nan nan Unknown      0.5 0.5 0.2 1.20482820761e-05 55100.0

lcdata enforces a consistent metadata format. All light curves are guaranteed to have the following keys in their metadata.

  • object_id: A unique identifer. Default: randomly assigned string

  • ra: The right ascension. Default: nan

  • dec: The declination. Default: nan

  • type: A string representing the type of the light curve. Default: Unknown

  • redshift: The redshift. Default: nan

Astronomical data comes in many different formats, and keyword usage is not standardized. lcdata will try to find all of these keys in the metadata using a list of known aliases.

>>> light_curve = sncosmo.load_example_data()
>>> light_curve.meta = {
...     'id': 'example_id',
...     'right_ascension': 1.,
...     'decl': 2.,
...     'class': 'Type Ia',
...     'other_var': 5.
... }
>>> dataset = lcdata.from_light_curves([light_curve])
>>> print(dataset.meta)
object_id   ra dec   type  redshift other_var
---------- --- --- ------- -------- ---------
example_id 1.0 2.0 Type Ia      nan       5.0

Light Curves

lcdata will standardize the format of light curves, similarly to how the metadata is standardized. Each light curve is guaranteed to have the following keys:

  • time: times at which the light curve was sampled. Converted to a 64-bit float.

  • flux: The flux at each point on the light curve. Converted to a 32-bit float.

  • fluxerr: The uncertainty on the flux. Converted to a 32-bit float.

  • band: A string representing bandpass that the light curve was observed in. We recommend using the sncosmo bandpass names here. Converted to a binary string.

Additional columns are left as is. If the light curve columns have different labels, lcdata will try to infer which ones are which using a set of aliases.

>>> light_curve = astropy.table.Table({
...     'bandpass': ['lsstu', 'lsstb', 'lsstr'],
...     'flux': [1., 2., 10.],
...     'mjd': [59000., 59010., 59020.],
...     'fluxerr': [1., 0.5, 3.],
...     'myvar': [1., 2., 5.],
... })
>>> print(dataset.light_curves[0])
  time  flux fluxerr  band myvar
------- ---- ------- ----- -----
59000.0  1.0     1.0 lsstu   1.0
59010.0  2.0     0.5 lsstb   2.0
59020.0 10.0     3.0 lsstr   5.0

Dataset Manipulation

Datasets can be manipulated in various ways.


>>> dataset = dataset1 + dataset2

Selecting a subset:

>>> dataset = dataset[5:10]

Saving a Dataset in HDF5 format

lcdata has an optimized HDF5 reader/writer that can be used to rapidly load very large light curve datasets.

Datasets can be read from and written out to disk in HDF5 format.

>>> dataset.write_hdf5('./dataset.h5')
>>> dataset = lcdata.read_hdf5('./dataset.h5')

A dataset on disk can be appended to:

>>> dataset_2.write_hdf5('./dataset.h5', append=True)

Some datasets are too large to fit in memory all at once. lcdata can load only the metadata of a dataset into memory, and then access the light curves themselves on demand.

>>> # Read only the metadata
>>> dataset = lcdata.read_hdf5('./dataset.h5', in_memory=False)
>>> # Read a specific light curve
>>> light_curve = dataset.light_curves[10]
>>> # Select a subset of the dataset and load all of its light curves into memory.
>>> subset = dataset[1000:2000].load()

A common use case for this functionality is to process all of the light curves in the dataset in chunks. lcdata provides a helper to do this:

>>> for chunk in dataset.iterate_chunks(chunk_size=1000):
...     # At each iteration, chunk is an lcdata Dataset with the next 1000
...     # light curves.