table Class Reference

#include <table.h>


Detailed Description

Data table.

A class to contain and manipulate several equally-sized columns of data. The purpose of this class is to provide a structure which allows one to refer to the columns using a name represented by a string. Thus for a table object named t with 3 columns (named "colx", "coly" and "colz") and three rows, one could do the following:

      // Set the 1st row of column "colx" to 1.0
      t.set("colx",0,1.0);
      // Set the 2nd row of column "colz" to 2.0
      t.set("colz",1,2.0);
      // Set the 3rd row of column "coly" to 4.0
      t.set("coly",2,4.0);
      // This will print out 2.0
      cout << t.get("colz",1) << endl;
Note that the rows are numbered starting with 0 instead of starting with 1. To output all the rows of entire column, one can use
      for(size_t i=0;i<t.get_nlines();i++) {
      cout << i << " " << t.get("colx",i) << endl;
      }
To output all the columns of an entire row (in the following example it is the second row), labeled by their column name, one can use:
      for(size_t i=0;i<t.get_ncolumns();i++) {
      cout << t.get_column_name(i) << " ";
      }
      cout << endl;
      for(size_t i=0;i<t.get_ncolumns();i++) {
      cout << t.get(i,1) << " ";
      }
      cout << endl;

Methods are provided for interpolating columns, sorting columns, finding data points, and several other manipulations of the data.

Data representation

Each individual column is just an ovector_view (or any descendant of an ovector_view) The columns can be referred to in one of two ways:

The columns are organized in a both a <map> and a <vector> structure so that finding a column by its index ( string table::get_column_name(int index), and double table::get_column(int index) ) takes only constant time, and finding a column by its name ( int lookup_column() and double * table::get_column() ) is O(log(C)). Insertion of a column ( new_column() ) is O(log(C)), but deletion ( delete_column() ) is O(C). Adding a row of data can be either O(1) or O(C), but row insertion and deletion is slow, since the all of the columns must be shifted accordingly.

Ownership of any column may be changed at any time, but care must be taken to ensure that memory allocation errors do not occur. These errors should not occur when no columns are owned by the user.

Because of the structure, this class is not suitable for the matrix manipulation. The classes omatrix and umatrix are better used for that purpose.

Column size

The columns grow automatically (similar to the STL <vector>) in reponse to an attempt to call set() for a row that does not presently exist or in a call to line_of_data() when the table is already full. However, this forces memory rearrangments that are O(R*C). Columns which are not owned by the table are not modified, so the table will not allow an increase in the number of lines beyond the size of the smallest user-owned column. If the user has a good estimate of the number of rows beforehand, it is best to either specify this in the constructor, or in an explicit call to inc_maxlines().

Lookup, differentiation, integration, and interpolation
Lookup, differentiation, integration, and interpolation are automatically implemented using splines from the class smart_interp_vecp. A caching mechanism is implemented so that successive interpolations, derivative evaluations or integrations over the same two columns are fast.

Sorting

The columns are automatically sorted by name for speed, the results can be accessed by table::get_sorted_name(i). Individual columns can be sorted, or the entire table can be sorted by one column.

Allowable column names

In general, column names may be of any form as long as they don't contain whitespace, e.g. 123".#$xy~ is a legitmate column name. The column name should be restricted to contain only letters, numbers, and underscores and may not begin with a digit.

Thread-safety

Generally, the member functions are thread-safe in the sense that one would expect. Simple get() and set() functions are thread-safe, while insertion and deletion operations are not. It makes little sense to try to make insertion and deletion thread-safe. The interpolation routines are not thread-safe.

I/O and command-line manipulation

When data from an object of type table is output to a file through the collection class, the table can be manipulated on the command-line through the acol utility.

There is an example for the usage of this class given in examples/ex_table.cpp.

Todo:
Better testing of automatic resizing with user- and class-owned columns
Idea for future:
Be more restrictive about allowable column names
Idea for future:
Add interp() and related functions which avoid caching and can thus be const (This has been started with interp_const() )
Idea for future:
The nlines vs. maxlines and automatic resizing of table-owned vs. user-owned vectors could be reconsidered, especially now that ovectors can automatically resize on their own. 10/16/07: This issue may be unimportant, as it might be better to just move to a template based approach with a user-specified vector type. The interpolation is now flexible enough to handle different types. Might check to ensure sorting works with other types.
Idea for future:
The present structure, std::map<std::string,col,string_comp> atree and std::vector<aiter> alist; could be replaced with std::vector<col> list and std::map<std::string,int> tree where the map just stores the index of the the column in the list

Definition at line 197 of file table.h.


Public Member Functions

 table (int cmaxlines=0)
 Create a new table with space for nlines<=cmaxlines.
int set_interp (base_interp< ovector_view > &bi1, base_interp< ovector_const_subvector > &bi2)
 Set the base interpolation objects.
int read_generic (std::istream &fin)
 Read a generic data file.
Basic get and set methods


int set (std::string col, size_t row, double val)
 Set row row of column named col to value val - O(log(C)).
int set (size_t icol, size_t row, double val)
 Set row row of column number icol to value val - O(1).
double get (std::string col, size_t row) const
 Get value from row row of column named col - O(log(C)).
double get (size_t icol, size_t row)
 Get value from row row of column number icol - O(1).
int get_ncolumns () const
 Return the number of columns.
size_t get_nlines () const
 Return the number of lines.
int set_nlines (size_t il)
 Set the number of lines.
int set_nlines_auto (size_t il)
 Set the number of lines.
int get_maxlines ()
 Return the maximum number of lines.
ovector_viewget_column (std::string col)
 Returns a pointer to the column named col - O(log(C)).
const ovector_viewget_column_const (std::string col) const
 Returns a pointer to the column named col - O(log(C)).
ovector_viewget_column (size_t icol)
 Returns a pointer to the column of index icol - O(1).
const ovector_viewget_column (size_t icol) const
 Returns a pointer to the column of index icol - O(1).
const ovector_viewoperator[] (size_t icol) const
 Returns the column of index icol - O(1) (const version).
ovector_viewoperator[] (size_t icol)
 Returns the column of index icol - O(1).
const ovector_viewoperator[] (std::string scol) const
 Returns the column named scol - O(log(C)) (const version).
ovector_viewoperator[] (std::string scol)
 Returns the column named scol - O(log(C)).
int get_row (std::string col, double val, ovector &row) const
 Returns a copy of the row with value val in column col - O(R*C).
int get_row (size_t irow, ovector &row) const
 Returns a copy of row number irow - O(C).
Column manipulation


std::string get_column_name (size_t col) const
 Returns the name of column col - O(1).
std::string get_sorted_name (size_t col)
 Returns the name of column col in sorted order - O(1).
int new_column (std::string name)
 Add a new column owned by the table - O(log(C)).
int new_column (std::string name, ovector_view *ldat)
 Add a new column owned by the user - O(log(C)).
int lookup_column (std::string name, int &ix)
 Find the index for column named name - O(log(C)).
int rename_column (std::string olds, std::string news)
 Rename column named olds to news - O(C).
int copy_column (std::string src, std::string dest)
 Make a new column named dest equal to src - O(log(C)*R).
double * create_array (std::string col) const
 Create (using new) a generic array from column col.
int init_column (std::string scol, double val)
 Initialize all values of column named scol to val - O(log(C)*R).
int ch_owner (std::string name, bool ow)
 Modify ownership - O(log(C)).
bool get_owner (std::string name) const
 Get ownership - O(log(C)).
const gsl_vector * get_gsl_vector (std::string name) const
 Get a gsl_vector from column name - O(log(C)).
int check_synchro () const
 Return 0 if the tree and list are properly synchronized.
int add_col_from_table (std::string loc_index, table &source, std::string src_index, std::string src_col, std::string dest_col="")
 Insert a column from a separate table, interpolating it into a new column.
Row maninpulation and data input


int new_row (size_t n)
 Insert a row before row n.
int copy_row (size_t src, size_t dest)
 Copy the data in row src to row dest.
int insert_data (size_t n, size_t nv, double *v)
 Insert a row of data before row n.
int insert_data (size_t n, size_t nv, double **v)
 Insert a row of data before row n.
int line_of_names (std::string newheads)
 Read a new set of names from newheads.
template<class vec_t>
int line_of_data (size_t nv, const vec_t &v)
 Read a line of data from an array.
Lookup and search methods


size_t ordered_lookup (std::string col, double val)
 Look for a value in an ordered column.
size_t lookup (std::string col, double val) const
 Exhaustively search column col for the value val - O(log(C)*R).
double lookup_val (std::string col, double val, std::string col2) const
 Search column col for the value val and return value in col2.
size_t lookup (int col, double val) const
 Exhaustively search column col for the value val - O(log(C)*R).
size_t mlookup (std::string col, double val, std::vector< double > &results, double threshold=0.0) const
 Exhaustively search column col for many occurences of val - O(log(C)*R).
int lookup_form (std::string formula, double &maxval)
 Search for row with maximum value of formula.
Interpolation, differentiation, and integration, max, and min


double interp (std::string sx, double x0, std::string sy)
 Interpolate x0 from sx into sy.
double interp_const (std::string sx, double x0, std::string sy) const
 Interpolate x0 from sx into sy.
double interp (size_t ix, double x0, size_t iy)
 Interpolate x0 from ix into iy.
int deriv (std::string x, std::string y, std::string yp)
 Make a new column yp which is the derivative y'(x) - O(log(C)*R).
double deriv (std::string sx, double x0, std::string sy)
 The first derivative of the function sy(sx) at sx=x0.
double deriv (size_t ix, double x0, size_t iy)
 The first derivative of the function iy(ix) at ix=x0.
int deriv2 (std::string x, std::string y, std::string yp)
 Make a new column yp which is $ y^{\prime \prime}(x) $ - O(log(C)*R).
double deriv2 (std::string sx, double x0, std::string sy)
 The second derivative of the function sy(sx) at sx=x0.
double deriv2 (size_t ix, double x0, size_t iy)
 The second derivative of the function iy(ix) at ix=x0.
double integ (std::string sx, double x1, double x2, std::string sy)
 The integral of the function sy(sx) from sx=x1 to sx=x2.
double integ (size_t ix, double x1, double x2, size_t iy)
 The integral of the function iy(ix) from ix=x1 to ix=x2.
int integ (std::string x, std::string y, std::string ynew)
 The integral of the function iy(ix).
double max (std::string col) const
 Return column maximum. Makes no assumptions about ordering - O(R).
double min (std::string col) const
 Return column minimum. Makes no assumptions about ordering - O(R).
Subtable method


tablesubtable (std::string list, size_t top, size_t bottom, bool linked=true)
 Make a subtable.
Add space


int inc_maxlines (size_t llines)
 Manually increase the maximum number of lines.
Delete methods


int delete_column (std::string scol)
 Delete column named scol - O(C).
int delete_row (std::string scol, double val)
 Delete the row with the value val in column scol.
int delete_row (size_t irow)
 Delete the row of index irow.
Clear methods


void zero_table ()
 Zero the data entries but keep the column names and nlines fixed.
void clear_table ()
 Clear the table and the column names.
void clear_data ()
 Remove all of the data by setting the number of lines to zero.
Sorting methods


int sort_table (std::string scol)
 Sort the entire table by the column scol.
int sort_column (std::string scol)
 Individually sort the column scol.
Summary method


int summary (std::ostream *out, int ncol=79) const
 Output a summary of the information stored.
Constant manipulation
virtual int add_constant (std::string name, double val)
 Add a constant.
virtual int set_constant (std::string name, double val)
 Add a constant.
virtual double get_constant (std::string name)
 Get a constant.
virtual int remove_constant (std::string name)
 Remove a constant.

Protected Types

typedef struct table::col_s col
typedef struct table::sortd_s sortd
Iterator types
typedef std::map< std::string,
col, string_comp >::iterator 
aiter
typedef std::map< std::string,
col, string_comp >
::const_iterator 
aciter
typedef std::vector< aiter >
::iterator 
aviter

Protected Member Functions

int reset_list ()
 Set the elements of alist with the appropriate iterators from atree - O(C).
int make_fp_varname (std::string &s)
 Ensure a variable name does not match a function or contain non-alphanumeric characters.
int make_unique_name (std::string &col, std::vector< std::string > &cnames)
 Make sure a name is unique.
Column manipulation methods
aiter get_iterator (std::string lname)
 Return the iterator for a column.
colget_col_struct (std::string lname)
 Return the column structure for a column.
aiter begin ()
 Return the beginning of the column tree.
aiter end ()
 Return the end of the column tree.

Static Protected Member Functions

static int sortd_comp (const void *a, const void *b)
 The sorting function.

Protected Attributes

std::map< std::string, double > constants
 The list of constants.
Actual data
size_t maxlines
 The size of allocated memory.
size_t nlines
 The size of presently used memory.
std::map< std::string, col,
string_comp
atree
 The tree of columns.
std::vector< aiter > alist
 The list of tree iterators.
Interpolation
sm_interp_vecsi
 The interpolation object.
base_interp< ovector_view > * intp1
 A pointer to the interpolation object.
base_interp
< ovector_const_subvector > * 
intp2
 A pointer to the subvector interpolation object.
cspline_interp< ovector_viewcintp1
 The default interpolation object.
cspline_interp
< ovector_const_subvector
cintp2
 The default subvector interpolation object.
search_vec< ovectorse
 The vector-searching object.
bool intp_set
 True if the interpolation type has been set.
std::string intp_colx
 The last x-column interpolated.
std::string intp_coly
 The last y-column interpolated.

Data Structures

struct  col_s
 Column structure for table [protected]. More...
struct  sortd_s
 A structure for sorting in table [protected]. More...

Member Function Documentation

int set ( std::string  col,
size_t  row,
double  val 
)

Set row row of column named col to value val - O(log(C)).

This function adds the column col if it does not already exist and adds rows using inc_maxlines() and set_nlines() to create at least (row+1) rows if they do not already exist.

int set_nlines ( size_t  il  ) 

Set the number of lines.

This function is stingy about increasing the table memory space and will only increase it enough to fit il lines, which is useful if you have columns not owned by the table.

int set_nlines_auto ( size_t  il  ) 

Set the number of lines.

Todo:
Resolve whether set() should really use this approach. Also, resolve whether this should replace set_nlines() (It could be that the answer is no, because as the documentation in the other version states, the other version is useful if you have columns not owned by the table.)

ovector_view* get_column ( size_t  icol  )  [inline]

Returns a pointer to the column of index icol - O(1).

Note that several of the methods require reallocation of memory and pointers previously returned by this function will be incorrect.

Definition at line 278 of file table.h.

const ovector_view* get_column ( size_t  icol  )  const [inline]

Returns a pointer to the column of index icol - O(1).

Note that several of the methods require reallocation of memory and pointers previously returned by this function will be incorrect.

Definition at line 289 of file table.h.

const ovector_view& operator[] ( size_t  icol  )  const [inline]

Returns the column of index icol - O(1) (const version).

This does not do any sort of bounds checking and is quite fast.

Note that several of the methods require reallocation of memory and refereces previously returned by this function will be incorrect.

Definition at line 303 of file table.h.

ovector_view& operator[] ( size_t  icol  )  [inline]

Returns the column of index icol - O(1).

This does not do any sort of bounds checking and is quite fast.

Note that several of the methods require reallocation of memory and refereces previously returned by this function will be incorrect.

Definition at line 317 of file table.h.

const ovector_view& operator[] ( std::string  scol  )  const [inline]

Returns the column named scol - O(log(C)) (const version).

No error checking is performed.

Note that several of the methods require reallocation of memory and refereces previously returned by this function will be incorrect.

Definition at line 330 of file table.h.

ovector_view& operator[] ( std::string  scol  )  [inline]

Returns the column named scol - O(log(C)).

No error checking is performed.

Note that several of the methods require reallocation of memory and refereces previously returned by this function will be incorrect.

Definition at line 344 of file table.h.

int new_column ( std::string  name,
ovector_view ldat 
)

Add a new column owned by the user - O(log(C)).

This function does not modify the number of lines of data in the table.

Todo:
We've got to figure out what to do if ldat is too small. If it's smaller than nlines, obviously we should just fail, but what if it's size is between nlines and maxlines?

int lookup_column ( std::string  name,
int &  ix 
)

Find the index for column named name - O(log(C)).

If the column is not present, this does not call the error handler, but quietly sets ix to zero and returns gsl_notfound.

int rename_column ( std::string  olds,
std::string  news 
)

Rename column named olds to news - O(C).

This is slow since we have to delete the column and re-insert it. This process in turn mangles all of the iterators in the list.

int init_column ( std::string  scol,
double  val 
)

Initialize all values of column named scol to val - O(log(C)*R).

Note that this does not initialize elements beyond nlines so that if the number of rows is increased afterwards, the new rows will have uninitialized values.

int ch_owner ( std::string  name,
bool  ow 
)

Modify ownership - O(log(C)).

Warning:
columns allocated using malloc() should never be owned by the table object since it uses delete instead of free().

int add_col_from_table ( std::string  loc_index,
table source,
std::string  src_index,
std::string  src_col,
std::string  dest_col = "" 
)

Insert a column from a separate table, interpolating it into a new column.

Given a pair of columns ( src_index, src_col ) in a separate table (source), this creates a new column in the present table named src_col which interpolates loc_index into src_index. The interpolation objects from the source table will be used. If there is already a column in the present table named src_col, then this will fail.

If there is an error in the interpolation for any particular row, then the value of src_col in that row will be set to zero.

size_t ordered_lookup ( std::string  col,
double  val 
)

Look for a value in an ordered column.

O(log(C)*log(R))

int lookup_form ( std::string  formula,
double &  maxval 
)

Search for row with maximum value of formula.

This searches the table for the maximum value of the specified formula. For example, to find the row for which the column mu is 2 and T is 3, you can use

        table::lookup_form("-abs(mu-2)-abs(T-3)");

double interp ( std::string  sx,
double  x0,
std::string  sy 
)

Interpolate x0 from sx into sy.

O(log(C)*log(R)) but can be as bad as O(log(C)*R) if the relevant columns are not well ordered.

double interp_const ( std::string  sx,
double  x0,
std::string  sy 
) const

Interpolate x0 from sx into sy.

O(log(C)*log(R)) but can be as bad as O(log(C)*R) if the relevant columns are not well ordered.

double interp ( size_t  ix,
double  x0,
size_t  iy 
)

Interpolate x0 from ix into iy.

O(log(R)) but can be as bad as O(R) if the relevant columns are not well ordered.

double deriv ( std::string  sx,
double  x0,
std::string  sy 
)

The first derivative of the function sy(sx) at sx=x0.

O(log(C)*log(R)) but can be as bad as O(log(C)*R) if the relevant columns are not well ordered.

double deriv ( size_t  ix,
double  x0,
size_t  iy 
)

The first derivative of the function iy(ix) at ix=x0.

O(log(R)) but can be as bad as O(R) if the relevant columns are not well ordered.

double deriv2 ( std::string  sx,
double  x0,
std::string  sy 
)

The second derivative of the function sy(sx) at sx=x0.

O(log(C)*log(R)) but can be as bad as O(log(C)*R) if the relevant columns are not well ordered.

double deriv2 ( size_t  ix,
double  x0,
size_t  iy 
)

The second derivative of the function iy(ix) at ix=x0.

O(log(R)) but can be as bad as O(R) if the relevant columns are not well ordered.

double integ ( std::string  sx,
double  x1,
double  x2,
std::string  sy 
)

The integral of the function sy(sx) from sx=x1 to sx=x2.

O(log(C)*log(R)) but can be as bad as O(log(C)*R) if the relevant columns are not well ordered.

double integ ( size_t  ix,
double  x1,
double  x2,
size_t  iy 
)

The integral of the function iy(ix) from ix=x1 to ix=x2.

O(log(R)) but can be as bad as O(R) if the relevant columns are not well ordered.

int integ ( std::string  x,
std::string  y,
std::string  ynew 
)

The integral of the function iy(ix).

O(log(R)) but can be as bad as O(R) if the relevant columns are not well ordered.

table* subtable ( std::string  list,
size_t  top,
size_t  bottom,
bool  linked = true 
)

Make a subtable.

Uses the columns specified in list from the row top to the row of index bottom. If linked is false the the data will be independent from the original table.

int delete_column ( std::string  scol  ) 

Delete column named scol - O(C).

This is slow because the iterators in alist are mangled and we have to call reset_list to get them back.

void clear_data (  )  [inline]

Remove all of the data by setting the number of lines to zero.

This leaves the column names intact and does not remove the constants.

Definition at line 724 of file table.h.

int summary ( std::ostream *  out,
int  ncol = 79 
) const

Output a summary of the information stored.

Outputs the number of constants, the number of columns, a list of the column names, and the number of lines of data.

int reset_list (  )  [protected]

Set the elements of alist with the appropriate iterators from atree - O(C).

Generally, the end-user shouldn't need this method. It is only used in delete_column() to rearrange the list when a column is deleted from the tree.


The documentation for this class was generated from the following file:

Documentation generated with Doxygen and provided under the GNU Free Documentation License. See License Information for details.

Project hosting provided by SourceForge.net Logo, O2scl Sourceforge Project Page