I'm preparing to implement the squarified treemap algorithm in an unfamiliar package, so I wanted a clear and complete description of the process. The original paper by Bruls, Huizing, and van Wijk is written procedurally and relies on global state.
I've written an extended explanation of the algorithm, but I've chosen to implement the algorithm in a functional manner. The code is in Mathematica, but should be readily adaptable to other functional languages like Schema, Scala, F#, or JavaScript/D3.
Format: PDF NB Github
Mathjax
Thursday, February 27, 2014
Friday, February 21, 2014
Axis/Scale Boundaries
When charting unknown data, we would like to set the axis scale such that the details in the data are clear and the end points on the scale are aesthetically pleasing.
D3 and Protovis use the "nice" algorithm for linear scales (and a modification for logarithmic scales). The pull request implementing the algorithm in JavaScript is on github, but, mathematically, for an array of data containing a \(min\) and \(max\) value:
Let $$step = 10^{round(log_{10}(max - min)) - 1}$$
then,
$$scale_{min} = \lfloor\frac{min}{step}\rfloor \times step$$
$$scale_{max} = \lceil\frac{max}{step}\rceil \times step$$
If the data has a range of 0.002454 to 0.1455, the scale will extend from 0.002 to 0.015. Visually, the bounding boxes (tan and green) will look like:
D3 and Protovis use the "nice" algorithm for linear scales (and a modification for logarithmic scales). The pull request implementing the algorithm in JavaScript is on github, but, mathematically, for an array of data containing a \(min\) and \(max\) value:
Let $$step = 10^{round(log_{10}(max - min)) - 1}$$
then,
$$scale_{min} = \lfloor\frac{min}{step}\rfloor \times step$$
$$scale_{max} = \lceil\frac{max}{step}\rceil \times step$$
If the data has a range of 0.002454 to 0.1455, the scale will extend from 0.002 to 0.015. Visually, the bounding boxes (tan and green) will look like:
Wednesday, February 12, 2014
Minimizing Autocorrelation of a List
As part of a graph rendering algorithm, I needed to take a list of y-values and disperse the points within a xy-range. (Within the list, the x coordinate has no meaning. I'm not re-ordering a time series.) To model the dispersion, I decided to use the autocorrelation metric. Autocorrelation measures the correlation of a list to itself. Put another way, autocorrelation measures how much the prior value can be used to predict the next value. (For this discussion, I'm using 1 as the lag parameter. Additionally, minimizing the autocorrelation is the same as minimizing the covariance.)
For example, the figure below shows a sinusoid (sampled at a regular interval). The autocorrelation of the figure is 0.994705 --- almost the maximum value of 1.
In contrast, the figure below has been modified by the autocorrelation minimization algorithm. The two figures have the same y-values, but the x-values have been changed resulting in a new correlation of -0.999955.
The Algorithm
The formula for calculating autocorrelation includes a number of terms, but the only terms that are impacted by changing the order of the list can be reduced to the sum:
$$ \Sigma(x_i - \mu)(x_{i+1} - \mu) $$
where \(\mu\) is the arithmetic mean of the list.
The approach is to maximize occurrences of large negative values.
Updated: Algorithm simplified
For example, the figure below shows a sinusoid (sampled at a regular interval). The autocorrelation of the figure is 0.994705 --- almost the maximum value of 1.
In contrast, the figure below has been modified by the autocorrelation minimization algorithm. The two figures have the same y-values, but the x-values have been changed resulting in a new correlation of -0.999955.
The Algorithm
The formula for calculating autocorrelation includes a number of terms, but the only terms that are impacted by changing the order of the list can be reduced to the sum:
$$ \Sigma(x_i - \mu)(x_{i+1} - \mu) $$
where \(\mu\) is the arithmetic mean of the list.
The approach is to maximize occurrences of large negative values.
- Create a list terms with the values of the list minus the mean of the list.
- Sort the list terms.
- Initialize the merged list with the head and tail of terms.
- While elements remain in terms,
- Take an element from terms (either the head or tail) such that the element minus the mean multiplied by the head or tail of merged has minimal value.
- Add the mean to each element of merged. Return that list.
Updated: Algorithm simplified
Subscribe to:
Posts (Atom)