analysis3d2d

developed by amoeba.blue

3D Regression via Pythagorean Geometric Distance

This page enables the evaluation and comparison of datasets consisting of three dimensions.

methods

- Linear regression models are used to evaluate both x and y sets individually in comparison to the z set.
- Because x and y sets may have negative or zero values, analysis3d2d does not attempt to construct logarithmic models based original data.
- x and y values are each redefined as linear distances (dx and dy) from the x or y value associated with the greatest z point.
- In other words, z declines as it is distributed across the 1d dx and dy planes.
- Because dx and dy are absolute, this redefinition ignores direction of the original relationship between two points within a dimension.
- This absolution also ensures the successful calculation of logarithmic regression models.
- The highest z value is necessarily associated with a dx and dy equal to zero.

- Linear and logarithmic regression models are then used to independently evaluate dx and dy in comparison to z.
- Geometric (Pythagorean) distance (dg) is also calculated using both x and y.
- dg is defined relative to the xy pair corresponding to the highest z value.
- In other words, z declines as it is distributed across the 2d plane xy.
- Because dg is absolute, this redefinition ignores direction of the original relationship between the two points of the two dimensions.
- The highest z value is necessarily associated with a dg of zero.

- Linear and logarithmic regression models are then used to evaluate dg in comparison to z.
- Because dx, dy, and dg are calculated relative to the maximal x,y,z point, logarithmic models and data do not redundently include this point (0,0,max).
- Because logarithmic models of dx, dy, and dg necessarily present a higher degree of fitness (R^2) relative to all other types, secondary data comparisons are performed relative to the most fit regression of the three sets.
- The secondary comparison set is redefined according to the primary d method associated with the greatest R^2.
- The standard error is calculated for the regression line relative to the primary input: fd1(d1)
- Standard errors are then calculated, using two methods, for the secondary input relative to the primary regression line.
- A distinct standard error is calculated using only the data from the secondary input: fd1(d2)
- An integrated standard error is calculated using the data from the secondary input in combination with the primary set: fd1(d2+d1)

- The ratio is provided for each pair of standard errors: (fd1(d2)/fd1(d1)) and (fd1(d2+d1)/fd1(d1))
- Ratios of less than 1 indicate that the secondary set exhibits less variation relative to the primary regression.
- Ratios of greater than 1 indicate that the secondary set exhibits more variation relative to the primary regression.
- In practice, it may be difficult to obtain error ratios less than 1 with a large secondary dataset.
- Similarly, an error ratio of greater than 1 does not necessarily indicate a poor match between the datasets.
- The results of this comparison must be interpreted in the context of the data type with a sufficient understanding of variations within the sample sets.

directions

- Copy dataset into the input feild as comma seperated (csv) or tab delimited (txt) values.
- To avoid errors, ensure tabs are not present in csv and that commas are not present in csv.
- Headers will not impact results. They will be interpreted as invalid data and removed from the dataset,
- For analysis, your data will be labeled as x,y, and z.
- Data must be input according to the scheme below.
- x,y,z
- x,y,z
- x,y,z
- Select the 3d2d button.
- The log feild will note any issues that may arise while processing.
- The results field provides a number of descriptive a parameters.
- Data fields present input and calculated values.
- The visualization feature pictures 3d data as input and as converted to distance.
- The graphing feature pictures 2d comparisons of z relative to the distance calculated for x (dx) and y (dy) as well as the calculated geometric distance (dg).
- The comparison feature enables the evaluation of a secondary 3d dataset in comparison to the primary 3d set and indicates the similarity of the two.
- Using the methods defined above, copy the secondary dataset into the compare field.
- Select the 3d2d2 button to derive a similarity between the two.
- Any issues found in the secondary data are noted in the log field.
- To construct models of the secondary set, copy this set to the input field and select the 3d2d button to start over.

- For an example, an input csv file can be downloaded here: example.csv
- To perform only 2d analysis, add a column of constant x or y values to the 2d set.

input

3d2d

output

log | results | data | visual | graph | compare |

values | ln(values) |

absolute | distance |

color (c) = ( z(val) - z(min) ) / ( z(max) - z(min) ) | ||||

c < 0.2 | c < 0.4 | c < 0.6 | c < 0.8 | c < 1.0 |

dg | dx | dy |

color (c) = ( z(val) - z(min) ) / ( z(max) - z(min) ) | ||||

c < 0.2 | c < 0.4 | c < 0.6 | c < 0.8 | c < 1.0 |

3d2d2