PXLab Manual

[Home]    [Content]    [Previous Chapter]    [Next Chapter]


Data Processing

Starting from version 2.1 there are some basic data processing tools implemented in PXLab. The major work in data processing will still be done outside of PXLab using some standard software package. A major task of the data processing tools will be to prepare data files for standard software packages.

Data Processing Objects

In addition to the display list objects which present stimuli the context section of a design file may also contain data processing objects. The following data processing list objects are available: Each of these will be executed immediately after the respective presentation list has been run. Thus as an example the SessionData() list will be executed after its Session() and SessionEnd() procedure has been run, including all blocks and trials contained in the Session(). The SessionData() list gets all data from its preceding Session() to work on. The ProcedureData() list is executed after the Procedure() and ProcedureEnd() list including all Session() nodes have been run.

The ExperimentNode() is somewhat special since it is never executed in a data collection session but can only be run separately by the application ExStat. It can use multiple data files contained in a directory and combine them to a single data table.

A data processing list like SessionData() itself does not do any processing. Computations are done by data processing objects contained in the list. Here is an example for a SessionData() list:

    SessionData(SubjectCode, TrialFactor, Direction, Trial.Arrow.ResponseTime) {
      Statistics(Trial.Arrow.ResponseTime) {
        Include = 1;
        Exclude = (Trial.Arrow.ResponseTime < 100) || (Trial.Arrow.ResponseTime > 800);
	Stats = de.pxlab.pxl.StatCodes.N
	        | de.pxlab.pxl.StatCodes.MEAN 
                | de.pxlab.pxl.StatCodes.STDDEV 
                | de.pxlab.pxl.StatCodes.STDERR;
      }  
    }
The list contains a single data analysis object: Statistics. This object computes descriptive statistics of a single dependent variable. In this example the dependent variable is Trial.Arrow.ResponseTime. It is sepcified as a display object argument.

Note that the SessionData() declaration also contains a list of arguments. These are the parameters whose values make up the data table which is used as an input for the data processing objects. The arguments of a data processing list like SessionData() define the grand data table which can be used by the data processing objects in its list.

A data processing list can contain an arbitrary number of data processing objects. Multiple instances of the same class must be identified by an instance postfix in the same way as it is done with multiple display objects in a display presentation list.

The input data table is created new for each data processing object. Thus the Include and Excludeparameter can be applied independently for every data processing object. However, the columns of the input data table are the same for every data processing object since these are defined by the data processing list arguments.

Data processing object declarations have the same syntax as display object declarations. They also may have parameters and assignments are used to set their values.

Here is how the data processing mechanism works:

It is important to remember, that the columns of each data table are defined by the data processing list or the factors declaration of an experiment while the row entries are defined by the Include and the Exclude condition defined by the data processing object.

Data Processing Applications

Data processing can be run with ExRun during data collection or after data collection using option '-Z' of option ExRun. There also is a special application which does not run experiments but only runs data processing objects in a design file. This is application ExStat. It reads multiple data files and processes the data processing objects in the data files and also can process the data according to a special data processing design file given as an argument to ExStat. The data files read by ExStat may be data tree files generated by ExRun but may also be raw data files of arbitrary source. The option '-r' tells ExStat to expect raw data files.

Where Processed Data are Stored

There exist several ways to control where processed data are stored. The name of the processed data file may be given explicitely in the command line of the applications ExRun and ExStat using option '-t'. If this option is not present then the processed data file name is determined automatically. The way how the file name is derived differs for ExRun and ExStat.

The Processed Data File Name in ExRun

Since every run of ExRun uses a special subject identification code the processed data file name also is derived from parameter SubjectCode. It has the same root as the standard data file and its extension is defined by parameter ProcessedDataFileExtension which by default is 'html'. The standard data file destination is defined by parameter DataFileDestination and the subdirectory name TrialDataDirectory. The subdirectory for processed data files is defined by parameter ProcessedDataDirectory. Thus if all parameters are set then the processed data file is contained in subdirectory ProcessedDataDirectory of directory DataFileDestination and the file name is equal to DataFileName with extension ProcessedDataFileExtension.

If ExRun is used to only process an already existing data tree file using option '-Z' then no data are collected and only the data processing nodes of the design tree are executed. Destination files are determined in exactly the same way as described in the previous paragraph.

The Processed Data File Name in ExStat

The behavior of ExStat depends on whether an explicit data processing design file which contains an ExperimentData() node is given in the command line (option '-f') or not. If no explicit design file is given, or if the design file does not contain an ExperimentData() node then ExStat behaves exactly like ExRun with option '-Z' for every data tree file which is processed.

Processed data file names are determined differently by ExStat if an ExperimentData() node has to be processed. If a processed data file name has been given on the command line, then it is used. If no data output file name has been given on the command line then the file name is derived from the design file which contains the ExperimentData() node with the extension defined by ProcessedDataFileExtension. No destination directory mechanism is used in this case.

An exception to this rule is the case, where an explicit data output file name is defined by the data processing object. This can be done with parameter FileName for every data processing object.

General Properties of Data Processing Objects

Clobal Parameters

The following global parameter refer to data processing objects:

DataProcessingEnabled
This flag controls whether data processing objects are run during data collection or not. If true then the data processing objects defined in the design file are run otherwise no data processing is done. By default this parameter is 0.
DataProcessingDirectory
a subdirectory of DataFileDestination for data processing output files.
DataProcessingFileExtension
the extension for data processing output files. By default this is '.html'.

Result Files

The data generated by data processing objects are sent to their own data destination. This destination depends on whether data processing is run during a data collection session by ExRun or as a separate data processing session by ExStat. In the first case the data processing results destination directory is determined by the value of parameter DataProcessingDirectory. The file name root is identical to the raw data file name root and the file name extension is defined by parameter DataProcessingFileExtension which by default is '.html', since most data processing objects by default generate HTML output.

If data processing objects are run by ExStat after data collection has been finished then the default destination file should be given using command line option '-t'.

All data processing objects have a parameter named FileName. This may be used to define individual data file names for every data processing object.

Common Parameters of Data Processing Objects

Every data processing object has the following parameters:

Include
This flag controls the inclusion of trials into the data table. A trial is only then included into the data table if the Include parameter evaluates to true for this trial's parameter values.
Exclude
After Include has been evaluated this flag controls the exclusion of trials. Trials are excluded if the Exclude parameter evaluates to true for this trial's parameter values.
ResultFormat
This string defines the resulting output format. If it is undefined then a proper HTML output format is used. If ResultFormat is defined then it is used as an output string.
HTMLFormat
Some data processing objects can switch between unformatted text output and HTML formatted text output. If this parameter is 1 then HTML format is used.
FileName
If this parameter is defined then the output data are written to the respective file. If the parameter is not defined then the default file name mechanism described earlier is used.

Available Data Processing Objects

Export
export a data table. Can either export raw data or factorial data where multiple observations of the same factor level are replaced by their mean values. Factor level tables will also have replaced missing values by the means of the same non-random factor level across the random factor. Here is an example for exporting a factorial data file. It exports the number of cases found for each factor level combination of the factors SubjectCode, Trial.SyntheticSound:A.WavePars, Trial.SyntheticSound:B.WavePars, and Trial.Message:C.ResponseCode. It thus generates the number of cases of each factor level combiation.
Experiment() {
  Context() {
    ExperimentData(
	SubjectCode, 
	Trial.SyntheticSound:A.WavePars, 
	Trial.SyntheticSound:B.WavePars, 
	Trial.Message:C.ResponseCode) {
      Export() {
	Include = StoreData;
	Exclude = 0;
        DataType = de.pxlab.pxl.ExportCodes.FACTORIAL_FREQUENCY_DATA;
        DataFile = "pmfk_table.dat";
      }
    }
  }
}
The Export object can export these types of data files:
RAW_DATA
a data file with a single row for every trial, similar to the 'dat'-file.
FACTORIAL_DATA
a data file with a single row for every factor level combination of all factors, including random, independent, and dependent factor. If the original data table contains repetitions for a factor level combination then the factorial data file contains the arithmetic mean of these as its dependent factor value. If the original data table contains dependent values which are not convertible to a number then the respective value is replaced by the arithmetic mean of all other levels of the random factor for this factor level combination of independent factors.
FACTORIAL_FREQUENCY_DATA
this data format is derived from the factorial data format. It treats the dependent factor as an independent factor and has an additional column which contains the number of cases of each factor level combination in the factorial data table. Thus if the dependent factor lavel is the value of a response category then the factorial frequency table will contain how often each category has been observed at each factor level combination.
REPEATED_MEASURES_DATA
this table may be used as input to commercial statistics packages which treat repeated measures as multiple dependent variables, like SYSTAT or SPSS do. This table contains a column for every random and independent factor which is a between groups factor. If the design contains within or repeated measurement factors then the values of the dependent factor for all combinations of within factors are added as additional columns to the table. See Export for a more detailed description of the output.

Export may also be used to transform data using the built in functions of PXLab. Here is an example of a design file which transforms a raw data file of CIE Yxy-chromaticity coordinates into RGB-coordinates.

Experiment() {
  Context() {
    AssignmentGroup() {
      new a = 0;
      new b = 0;
      new c = 0;
      new c1 = arrayOf3(a, b, c);
      new c2 = toDevRGB(c1);
    }
    ExperimentData(a, b, c, c1, c2) {
      Export(a, b, c, c2) {
          FileName = "color_transformed.tmp";
	  DataType = de.pxlab.pxl.ExportCodes.RAW_DATA;
      }
    }
    Trial(a, b, c) { 
    }
  }
}

Note that the Trial() declaration is used here to define names for the input data columns, since we expect a raw data table which does not contain parameter name declarations. This also implies that all parameter names are declared new in the design file. Here is an example for an input file for this design:

   10.089509  0.454104  0.458863
   11.6268    0.5049    0.431933
   ...
   7.107370   0.393992  0.273389

And here is the corresponding output:

   10.089509  0.454104  0.458863  [112, 86, 21]
   11.6268    0.5049    0.431933  [138, 84, 10]
   ...
   7.107370   0.393992  0.273389  [117, 57, 82]

If no Trial() declaration is contained in the design file and a raw data file is being processed then the columns are defined by the arguments of the data processing list node. In this case the number of columns and the number of arguments in the data processing list node have to be identical.

Statistics
compute descriptive statistics for a single variable. This example computes descriptive statistics for the two levels of 'Luminance' separately. Invalid responses are excluded.
Experiment() {
  Context() {
    AssignmentGroup() {
      new invalid = (ResponseTime < 100) || (ResponseTime > 500);
    }
    ExperimentData(SubjectCode, TrialCounter, Luminance, ResponseTime) {
      Statistics:A(ResponseTime) {
	Include = StoreData && (Luminance == 8);
	Exclude = invalid;
	Stats = de.pxlab.pxl.StatCodes.ALL;
      }
      Statistics:B(ResponseTime) {
	Include = StoreData && (Luminance == 90);
	Exclude = invalid;
	Stats = de.pxlab.pxl.StatCodes.ALL;
      }
    }
  }
}
Anova
compute factor level statistics and an analysis of variance for an arbitrary number of factors. Factor types are determined automatically.

Here is a example for using Anova to compute an analysis of variance for the data of an experiment on the horizontal-vertical illusion. 'SubjectCode' is the random factor, the parameters Trial.HorizontalVerticalIllusion.Orientation, Trial.HorizontalVerticalIllusion.CutRatio, and FromWhere are independent variables and Trial.HorizontalVerticalIllusion.CutLine is the dependent variable. Note that the Anova object does not have an argument list which means that its parameters are the same as that of the ExperimentData node. No inclusion and exclusion conditions are defined and the default format is used. This design file can be used by the application ExStat to process multiple data files collected in a single directory. The data files must have been generated by the same design file.

Experiment() {
  Context() {
    ExperimentData(SubjectCode, 
        Trial.HorizontalVerticalIllusion.Orientation, 
	Trial.HorizontalVerticalIllusion.CutRatio,
        FromWhere, 
	Trial.HorizontalVerticalIllusion.CutLine) {
      Anova() {
	Include = 1;
	Exclude = 0;
      }
    }
  }
}

Here is another example which also uses a new parameter defined in a separate AssignmentGroup node. Only those trials are included in the analysis which have the StoreData parameter set.

Experiment() {
  Context() {
    AssignmentGroup() {
      new invalid = (ResponseTime < 100) || (ResponseTime > 500);
    }
    ExperimentData(SubjectCode, TrialCounter, Luminance, ResponseTime) {
      Anova(SubjectCode, Luminance, ResponseTime) {
	Include = StoreData;
	Exclude = invalid;
      }
    }
  }
}
Regression
compute correlations and multiple linear regression. Here is an example which computes regression lines for the positive and negative items in a Sternberg paradigm. Note that 'NegPos' is a parameter which identifies an item as a negative or positive item. False responses are excluded.
Experiment()
{
  Context()
  {
    ExperimentData(
	TrialCounter, 
	NegPos, 
	Trial.SerialLearningList.MemorySetSize,
	Trial.Feedback.Response,
        Trial.Message.ResponseTime) {
      Regression:Neg(Trial.Message.ResponseTime, Trial.SerialLearningList.MemorySetSize) {
	Include = NegPos;
	Exclude = Trial.Feedback.Response;
      }
      Regression:Pos(Trial.Message.ResponseTime, Trial.SerialLearningList.MemorySetSize) {
	Include = !NegPos;
	Exclude = Trial.Feedback.Response;
      }
    }
  }
}
EllipseEstimation
estimate parameters of a rotated ellipse. May be used to estimate confusion ellipses in color discrimination.
VisualGammaEstimation
estimate parameters of a gamma function from visual matching data.

[This file was last updated on July 15, 2010, 12:07:01.]

[Home]    [Content]    [Previous Chapter]    [Next Chapter]