analysis3d2d

developed by amoeba.blue

3D Regression via Pythagorean Geometric Distance
This page enables the evaluation and comparison of datasets consisting of three dimensions.

methods

• Linear regression models are used to evaluate both x and y sets individually in comparison to the z set.
• Because x and y sets may have negative or zero values, analysis3d2d does not attempt to construct logarithmic models based original data.
• x and y values are each redefined as linear distances (dx and dy) from the x or y value associated with the greatest z point.
• In other words, z declines as it is distributed across the 1d dx and dy planes.
• Because dx and dy are absolute, this redefinition ignores direction of the original relationship between two points within a dimension.
• This absolution also ensures the successful calculation of logarithmic regression models.
• The highest z value is necessarily associated with a dx and dy equal to zero.
• Linear and logarithmic regression models are then used to independently evaluate dx and dy in comparison to z.
• Geometric (Pythagorean) distance (dg) is also calculated using both x and y.
• dg is defined relative to the xy pair corresponding to the highest z value.
• In other words, z declines as it is distributed across the 2d plane xy.
• Because dg is absolute, this redefinition ignores direction of the original relationship between the two points of the two dimensions.
• The highest z value is necessarily associated with a dg of zero.
• Linear and logarithmic regression models are then used to evaluate dg in comparison to z.
• Because dx, dy, and dg are calculated relative to the maximal x,y,z point, logarithmic models and data do not redundently include this point (0,0,max).
• Because logarithmic models of dx, dy, and dg necessarily present a higher degree of fitness (R^2) relative to all other types, secondary data comparisons are performed relative to the most fit regression of the three sets.
• The secondary comparison set is redefined according to the primary d method associated with the greatest R^2.
• The standard error is calculated for the regression line relative to the primary input: fd1(d1)
• Standard errors are then calculated, using two methods, for the secondary input relative to the primary regression line.
• A distinct standard error is calculated using only the data from the secondary input: fd1(d2)
• An integrated standard error is calculated using the data from the secondary input in combination with the primary set: fd1(d2+d1)
• The ratio is provided for each pair of standard errors: (fd1(d2)/fd1(d1)) and (fd1(d2+d1)/fd1(d1))
• Ratios of less than 1 indicate that the secondary set exhibits less variation relative to the primary regression.
• Ratios of greater than 1 indicate that the secondary set exhibits more variation relative to the primary regression.
• In practice, it may be difficult to obtain error ratios less than 1 with a large secondary dataset.
• Similarly, an error ratio of greater than 1 does not necessarily indicate a poor match between the datasets.
• The results of this comparison must be interpreted in the context of the data type with a sufficient understanding of variations within the sample sets.

directions

1. Copy dataset into the input feild as comma seperated (csv) or tab delimited (txt) values.
2. To avoid errors, ensure tabs are not present in csv and that commas are not present in csv.
3. Headers will not impact results. They will be interpreted as invalid data and removed from the dataset,
4. For analysis, your data will be labeled as x,y, and z.
5. Data must be input according to the scheme below.
• x,y,z
• x,y,z
• x,y,z
6. Select the 3d2d button.
7. The log feild will note any issues that may arise while processing.
8. The results field provides a number of descriptive a parameters.
9. Data fields present input and calculated values.
10. The visualization feature pictures 3d data as input and as converted to distance.
11. The graphing feature pictures 2d comparisons of z relative to the distance calculated for x (dx) and y (dy) as well as the calculated geometric distance (dg).
12. The comparison feature enables the evaluation of a secondary 3d dataset in comparison to the primary 3d set and indicates the similarity of the two.
• Using the methods defined above, copy the secondary dataset into the compare field.
• Select the 3d2d2 button to derive a similarity between the two.
• Any issues found in the secondary data are noted in the log field.
• To construct models of the secondary set, copy this set to the input field and select the 3d2d button to start over.
13. For an example, an input csv file can be downloaded here: example.csv
14. To perform only 2d analysis, add a column of constant x or y values to the 2d set.

input

output

 color (c) = ( z(val) - z(min) ) / ( z(max) - z(min) ) c < 0.2 c < 0.4 c < 0.6 c < 0.8 c < 1.0
 color (c) = ( z(val) - z(min) ) / ( z(max) - z(min) ) c < 0.2 c < 0.4 c < 0.6 c < 0.8 c < 1.0