Jekyll2019-04-24T20:45:09-07:00/Art is Never Finished...Only Abandoned. – Leonardo da VinciNodemon HTTP Server2019-03-27T00:00:00-07:002019-03-27T00:00:00-07:00/blog/nodemon-http-server<p>While playing with <a href="https://threejs.org">three.js</a> today, I needed a hot-reloading JS webserver in a real hurry to serve out content. Here’s how to start a server at <a href="http://localhost:8080">http://localhost:8080/</a> in just three lines:</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code>npm install http-server -g <span class="c"># Install the http-server tool</span>
<span class="nb">cd</span> /to/some/directory <span class="c"># Change to the directory you want to serve content from</span>
nodemon <span class="sb">`</span>which http-server<span class="sb">`</span> <span class="c"># Use nodemon to restart the http-server when any content changes</span>
</code></pre>
</div>
<p>If you don’t need a server that restarts, consider using:</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code>http-server . -p 8080
</code></pre>
</div>While playing with three.js today, I needed a hot-reloading JS webserver in a real hurry to serve out content. Here’s how to start a server at http://localhost:8080/ in just three lines:Preferring Quantiles to Histograms2019-03-12T00:00:00-07:002019-03-12T00:00:00-07:00/blog/percentiles-and-histograms<p>The histogram is most scientists’ tool of choice for viewing the distribution of values of a single variable. But lately I have been exploring an alternative: quantiles (e.g. deciles, percentiles, etc). Although viewing data in this way is not perfect, it has several advantages to histograms, including robustness to outliers and a freedom from apriori assumptions about the range or smoothing level of the data.</p>
<p>This post has some code written in <a href="https://www.python.org">Python</a>. If you don’t have it already, you may want to follow along by <a href="../jupyter-quick-install/">installing Jupyter</a> and opening up a notebook.</p>
<h2 id="generating-artificial-data">Generating Artificial Data</h2>
<p>We’ll begin by inventing an unusual probability distribution with characteristics that will highlight some problems with histograms. We’ll choose a <a href="https://en.wikipedia.org/wiki/Cumulative_distribution_function">cumulative distribution function</a> (CDF), <a href="https://en.wikipedia.org/wiki/Probability_density_function">probability density function</a> (PDF), and inverse CDF that are easy to evaluate analytically. The distribution will be jagged, bimodal, and have a small a blocky sidebands to represent outliers.</p>
<p>Understanding the python code used to produce artificial data is not essential to understand the rest of this article, but the basic trick used in the following code is to pick random points on the Y axis between 0 and 1, draw a horizontal line over to where it intersects the cumulative density function, and then use the intersection’s X coordinate as your random sample. You can easily produce samples from any distribution with this technique.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">unusual_distribution_cdf</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">'''Returns the cumulative distribution function of an unusual distribution.'''</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span> <span class="k">return</span> <span class="mi">0</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mi">0</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="mf">0.5</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mf">0.98</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mf">0.5</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mf">1.96</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.5</span><span class="o">*</span><span class="p">(</span><span class="mf">0.98</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mf">0.5</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mi">2</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.25</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="mf">0.5</span>
<span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">unusual_distribution_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="s">'''Returns the probability density function of an unusual distribution.
Derived analytically from unusual_distribution_cdf(). '''</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span> <span class="k">return</span> <span class="mi">0</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mi">0</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.5</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mf">0.98</span><span class="p">):</span> <span class="k">return</span> <span class="n">x</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mf">1.96</span><span class="p">):</span> <span class="k">return</span> <span class="mi">0</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="mi">2</span><span class="p">):</span> <span class="k">return</span> <span class="mf">0.25</span>
<span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">unusual_distribution_inverse_cdf</span><span class="p">(</span><span class="n">y</span><span class="p">):</span>
<span class="s">'''Returns the inverse cumulative density function of an unusual distribution.
If you feed random numbers from 0 to 1 into this, it produces numbers that are
effectively sampled from the probability density function.'''</span>
<span class="k">if</span> <span class="p">(</span><span class="n">y</span> <span class="o"><</span> <span class="mi">0</span><span class="p">):</span> <span class="k">return</span> <span class="bp">None</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">y</span> <span class="o"><</span> <span class="mf">0.5</span><span class="p">):</span> <span class="k">return</span> <span class="mi">2</span><span class="o">*</span><span class="n">y</span><span class="o">-</span><span class="mi">1</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">y</span> <span class="o"><</span> <span class="mf">0.25</span><span class="o">*</span><span class="p">(</span><span class="mf">0.98</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span><span class="o">+</span><span class="mf">0.5</span><span class="p">):</span> <span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">y</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="c"># 0.95125 = 0.5*0.95**2 + 0.5</span>
<span class="k">elif</span> <span class="p">(</span><span class="n">y</span> <span class="o"><</span> <span class="mi">1</span><span class="p">):</span> <span class="k">return</span> <span class="mi">4</span><span class="o">*</span><span class="n">y</span><span class="o">-</span><span class="mi">2</span>
<span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">calc_true_cdf</span><span class="p">(</span><span class="n">xs</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">unusual_distribution_cdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">xs</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">unusual_distribution_pdf</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">xs</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">sample_from_pdf</span><span class="p">(</span><span class="n">n_samps</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">unusual_distribution_inverse_cdf</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n_samps</span><span class="p">)]</span>
</code></pre>
</div>
<p>Most people will prefer to simply plot the CDF and PDF and see what it looks like:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">xs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_cdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True CDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
</code></pre>
</div>
<p class="center"><img src="fig-01-true-distribution.png" alt="fig-01-true-distribution.png" title="The hidden, true distribution used to generate." /></p>
<p>We won’t need to show the CDF again – it was merely a visual guide to help see how to generate samples from the distribution.</p>
<p>For the rest of this article, we’ll use two collections of samples, which we’ll call A and B:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">n_samps</span> <span class="o">=</span> <span class="mi">200</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">sample_from_pdf</span><span class="p">(</span><span class="n">n_samps</span><span class="p">)</span>
<span class="n">B</span> <span class="o">=</span> <span class="n">sample_from_pdf</span><span class="p">(</span><span class="n">n_samps</span><span class="p">)</span>
</code></pre>
</div>
<h2 id="problems-with-histograms">Problems with Histograms</h2>
<p>Building histograms on data A and data B will immediately highlight some of the problems with histograms:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">num_bins</span> <span class="o">=</span> <span class="mi">30</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram for A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram for B"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-02-different-bins.png" alt="fig-02-different-bins.png" title="Samples A and B have different bin sizes." /></p>
<p>Several annoyances are obvious:</p>
<ul>
<li>
<p><strong>Histograms require assumptions about the range of a the distribution.</strong> Histograms usually require that you choose a minimum, maximum, and number of bins with which to tile that range. When one don’t know <em>a priori</em> the min and max, it is common to use the limits of the sampled data, specify the number of bins, and then calculate the bin edges automatically. Unfortunately, this means that it is almost guaranteed that two sets of sampled data, even from the same distribution, will have different bin boundaries. In the above figure, you’ll note that the bin widths for A and B are different, because in sample set B, there were no points sampled around <script type="math/tex">x=2</script>.</p>
</li>
<li>
<p><strong>Histograms are hard to combine or build incrementally.</strong> Let’s pretend A is your existing estimate of the distribution, and B is a new batch of data that just came in. Furthermore, let’s imagine that you don’t want to recompute your histogram from scratch, but want to combine the histograms of A and B to produce a new histogram. How would you combine two histograms that have different bin widths and numbers of bins? While possible, I hope you can see that it would be awkward and leaves much to be desired.</p>
</li>
<li>
<p><strong>Histograms obscure the shape of the data when there are large outliers.</strong> For illustration purposes, the little “sideband” around <script type="math/tex">x=2.0</script> is not that far from the main distribution. But you can imagine that if those outliers were at <script type="math/tex">x = 30</script>, and keeping the number of bins fixed at <code class="highlighter-rouge">num_bins=30</code>, the range of the data would be so large that almost all of the data would fall in the first bin, and all others would be zero. Relatedly, when you have lots of data in a single bin, you should theoretically be able to infer more about the internal structure of that bin. Unfortunately, when using a histogram, one can never resolve anything finer than a bin width.</p>
</li>
</ul>
<p>Another, more subtle problem that we won’t get into today is that a histogram has a varying number of data points per bin. This means that your error bars (or confidence interval) on each bin will have different magnitudes. It is disappointingly rare for error bars on histograms to be plotted – but without error bars, it’s not science!</p>
<h2 id="quantiles">Quantiles</h2>
<p>We will now explore using quantiles to represent the sampled distribution, and see how that differs from the histogram representation. For those of you who have forgotten what a “quantile” is, it is just a generalization of the concept of a percentile. The basic concept of quantiles is to sort some data by value, and then sequentially group sorted data into <script type="math/tex">N</script> equal-size groups. When <script type="math/tex">N=10</script> points, we call them “deciles”. When <script type="math/tex">N=100</script>, we call them percentiles. A list of percentiles of family income would give us 101 numbers (from 0% to 100%). The 0th percentile would be the lowest income, the 50th percentile would be the median, and the 100th percentile would indicate the maximum family income in the data set.</p>
<p>Let’s see how you could use the percentiles to generate a histogram-like plot. To make the comparison between histograms and quantiles fair, we will use the same number of quantiles as we did histogram bins.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calc_quantiles</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">):</span>
<span class="n">percentiles</span> <span class="o">=</span> <span class="p">[</span><span class="mf">100.0</span><span class="o">*</span><span class="n">p</span><span class="o">/</span><span class="n">num_quantiles</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="o">+</span><span class="mi">1</span><span class="p">)]</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">percentile</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">percentiles</span><span class="p">]</span>
<span class="k">return</span> <span class="n">quantiles</span>
<span class="n">num_quantiles</span> <span class="o">=</span> <span class="n">num_bins</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="c"># To plot the percentiles as a histogram, we need to take the "derivative"</span>
<span class="c"># (actually, the first difference) and pad the ending with a near-zero value</span>
<span class="n">quantile_pdf</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="p">(</span><span class="n">num_quantiles</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">diff</span><span class="p">(</span><span class="n">quantiles</span><span class="p">),</span> <span class="mi">10</span><span class="o">**</span><span class="mi">10</span><span class="p">))</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">step</span><span class="p">(</span><span class="n">quantiles</span><span class="p">,</span> <span class="n">quantile_pdf</span><span class="p">,</span> <span class="n">where</span><span class="o">=</span><span class="s">'post'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Quantile-Estimated PDF of A"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"green"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-03-quantiles-v-hist.png" alt="fig-03-quantiles-v-hist.png" title="Quantiles and Histogram estimates of PDF compared." /></p>
<p>There are several immediate things to notice about the quantile estimate:</p>
<ol>
<li>
<p>There appears to be more inaccuracy between the Quantile PDF and the true PDF than with the histogram around the <script type="math/tex">x=1.0</script> area. We’ll discuss whether this is really true or not in a moment.</p>
</li>
<li>
<p>Rather than break up the estimate of the PDF into equal bin-widths on the X axis as the histogram did, <strong>the quantile-based estimate of the PDF divides the probability mass into equal-area rectangles</strong> that are squished and squashed to be short and wide or tall and thin. Around <script type="math/tex">x=0.1</script> there is a short and wide bin, and around <script type="math/tex">x=1.0</script> there are very tall and narrow rectangles.</p>
</li>
<li>
<p>The quantile representation assumes that the distribution is continuous, so the “outliers” on the far right have now been spread out over the whole range <script type="math/tex">% <![CDATA[
1.0 < x < 2.0 %]]></script>. It ends up predicting such a low value that it is not very visible on this plot, but it is nonzero.</p>
</li>
</ol>
<p>Point two is most important factor to keep in mind. With histograms, we give each bin an equal amount of information, regardless of how many points fall into it. With quantiles, we give each fraction of data the same amount of information, which is then spread out.</p>
<h1 id="smoothing-with-gaussian-kernels">Smoothing with Gaussian Kernels</h1>
<p>We noted earlier that histograms generally sideline the issue of uncertainty in each bin estimate. We have done the same with the quantile technique here – while we have faithfully represented the probability mass of the samples, this may not produce a very good visual estimate of the PDF. Can we make it look better?</p>
<p>Each percentile represents 1/100th of the whole data. It is easy to imagine us placing 100 equally sized kernels, one over the barycenter of each percentile-bounded rectangle. Anything with unit area would work – rectangles, triangles, gaussians – but because of its smooth properties, we will select a unit-area gaussian as our kernel:</p>
<script type="math/tex; mode=display">g(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp \left( \frac{1}{2} \left( \frac{x - \mu}{\sigma} \right) ^2 \right)</script>
<p>where</p>
<script type="math/tex; mode=display">\int_{\infty}^{\infty} g(x) dx = 1</script>
<p>Graphically, the intuition looks like this:</p>
<p class="center"><img src="fig-04-smoothing.jpg" alt="fig-04-smoothing.jpg" title="Smoothing whiteboard. " /></p>
<p>As you can see in the figure above, the key intuition here is to scale the standard deviation <script type="math/tex">\sigma</script> according to the bin width. Short, wide bins produce short, wide unit gaussians. The code to do this is fairly straightforward:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># Halfway between each two quantiles, put a unit-area gaussian kernel with a stddev (sigma) equal</span>
<span class="c"># to half the distance betwneen the two quantiles, scaled by smoothing factor K. </span>
<span class="k">def</span> <span class="nf">gaussian</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="p">(</span><span class="n">sigma</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">pi</span><span class="p">)))</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="mf">0.5</span> <span class="o">*</span> <span class="p">((</span><span class="n">x</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span><span class="o">/</span><span class="n">sigma</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">1.0</span><span class="p">):</span>
<span class="s">'''Returns a smoothed PDF estimate'''</span>
<span class="n">num_quantiles</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantiles</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full_like</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># Initialize "hi-rez" PDF to be zero</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_quantiles</span><span class="p">):</span>
<span class="n">p0</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">p1</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">K</span><span class="o">*</span><span class="p">(</span><span class="n">p1</span> <span class="o">-</span> <span class="n">p0</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">mu</span> <span class="o">=</span> <span class="p">(</span><span class="n">p1</span> <span class="o">+</span> <span class="n">p0</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">xs</span><span class="p">)):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">xs</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="p">(</span><span class="n">mu</span> <span class="o">-</span> <span class="mi">5</span><span class="o">*</span><span class="n">sigma</span><span class="p">))</span> <span class="ow">or</span> <span class="p">(</span><span class="n">x</span> <span class="o">></span> <span class="p">(</span><span class="n">mu</span> <span class="o">+</span> <span class="mi">5</span><span class="o">*</span><span class="n">sigma</span><span class="p">)):</span>
<span class="k">continue</span> <span class="c"># Skip because it's >5 stddevs away</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">gaussian</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">pdf</span> <span class="o">/</span> <span class="p">(</span><span class="n">num_quantiles</span><span class="p">)</span> <span class="c"># Rescale to unit area</span>
<span class="k">return</span> <span class="n">pdf</span>
<span class="c"># To plot the percentiles, we need to take the derivative of it</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed Quantile-Estimated PDF (K=1)"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed Quantile-Estimated PDF (K=3)"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"purple"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-04-smoothed-quantiles.png" alt="fig-04-smoothed-quantiles.png" title="Smoothed quantiles. " /></p>
<p>This approach to smoothing is intuitively attractive, and to my eye the <code class="highlighter-rouge">K=3</code> smoothing level looks very pleasant. Using <script type="math/tex">K>1.0</script> is basically expressing that we have some measurement uncertainty of our <script type="math/tex">x</script>’s that we would like smoothed out.</p>
<p>But have we really accounted solved the right problem? While we know exactly the quantiles of the sampled data, we have uncertainty about the true quantiles. What if, instead of using a gaussian smoothing kernel, we computed our uncertainty on our estimates of the percentiles and generated an estimate based on that, instead?</p>
<p>One way to explore uncertainty of a sample is to the classic bootstrap technique. The following code generates bootstraps (from the quantile-estimated PDF, not from the original distribution!), which we can then use that uncertainty to scale the gaussian kernel widths:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">quantile_bootstrap</span><span class="p">(</span><span class="n">quantiles</span><span class="p">,</span> <span class="n">num_points_to_sample</span><span class="p">):</span>
<span class="s">'''Returns a new set of data points bootstrapped from ntiles.'''</span>
<span class="n">N</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantiles</span><span class="p">)</span>
<span class="n">samps</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">num_points_to_sample</span><span class="p">)</span>
<span class="n">bootstrap</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">(</span><span class="n">num_points_to_sample</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">samps</span><span class="p">):</span>
<span class="n">i_bot</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="n">i_top</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="n">p_bot</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i_bot</span><span class="p">]</span>
<span class="n">p_top</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i_top</span><span class="p">]</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="o">-</span> <span class="n">i_bot</span><span class="p">)</span>
<span class="n">bootstrap</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">p_bot</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">alpha</span><span class="p">)</span> <span class="o">+</span> <span class="n">p_top</span><span class="o">*</span><span class="p">(</span><span class="n">alpha</span><span class="p">)</span> <span class="c"># Linearly interpolate</span>
<span class="k">return</span> <span class="n">bootstrap</span>
<span class="k">def</span> <span class="nf">bootstrapped_quantile_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">n_samps</span><span class="p">,</span> <span class="n">n_bootstraps</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">confidence_interval</span><span class="o">=</span><span class="mf">0.95</span><span class="p">):</span>
<span class="s">'''Returns a smoothed PDF estimate that uses bootstraps of samples from the quantiles.'''</span>
<span class="n">num_quantiles</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantiles</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full_like</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># Initialize "hi-rez" PDF to be zero</span>
<span class="n">bootstraps</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full</span><span class="p">((</span><span class="n">n_bootstraps</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="mf">0.0</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_bootstraps</span><span class="p">):</span>
<span class="n">bs</span> <span class="o">=</span> <span class="n">quantile_bootstrap</span><span class="p">(</span><span class="n">quantiles</span><span class="p">,</span> <span class="n">n_samps</span><span class="p">)</span>
<span class="n">bootstraps</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">bs</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="n">bs</span> <span class="o">=</span> <span class="n">bootstraps</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bs</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">n_bootstraps</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantiles</span><span class="p">))</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">xs</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="c"># Count number of values in range:</span>
<span class="n">x0</span> <span class="o">=</span> <span class="n">xs</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">xs</span><span class="p">[</span><span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="n">dx</span> <span class="o">=</span> <span class="n">x1</span> <span class="o">-</span> <span class="n">x0</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">x0</span> <span class="o"><</span> <span class="n">bs</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">bs</span> <span class="o"><</span> <span class="n">x1</span><span class="p">))</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span> <span class="o">/</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">bs</span><span class="p">)</span> <span class="o">*</span> <span class="n">dx</span><span class="p">)</span>
<span class="k">return</span> <span class="n">pdf</span>
<span class="c"># To plot the percentiles, we need to take the derivative of it</span>
<span class="n">xs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">bootstrapped_quantile_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">n_samps</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Bootstrapped PDF of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"green"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed PDF Estimate (K=1)"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-04-bootstrapped-quantiles.png" alt="fig-04-bootstrapped-quantiles.png" title="Bootstrapped quantiles. " /></p>
<p>Huh! That’s suprising. If I did that correctly (and I am not 100% sure that I did), it appears that the uncertainty about each percentile is indeed very gaussian, since it so closely resembles the <code class="highlighter-rouge">K=1</code> case. There is some small variation around the <script type="math/tex">x=-1.0</script> and <script type="math/tex">x=2.0</script> points, but otherwise, it appears that we can just stick with simple gaussian kernel smoothing. Arguably, it might be nice to correct those edge effects, so let’s do that now:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">gaussian</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="p">(</span><span class="n">sigma</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">pi</span><span class="p">)))</span> <span class="o">*</span> <span class="n">math</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="mf">0.5</span> <span class="o">*</span> <span class="p">((</span><span class="n">x</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span><span class="o">/</span><span class="n">sigma</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">1.0</span><span class="p">):</span>
<span class="s">'''Returns a smoothed PDF estimate of the quantiles.'''</span>
<span class="n">num_quantiles</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantiles</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">full_like</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># Initialize "hi-rez" PDF to be zero</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_quantiles</span><span class="p">):</span>
<span class="n">p0</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">p1</span> <span class="o">=</span> <span class="n">quantiles</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">K</span><span class="o">*</span><span class="p">(</span><span class="n">p1</span> <span class="o">-</span> <span class="n">p0</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">mu</span> <span class="o">=</span> <span class="p">(</span><span class="n">p1</span> <span class="o">+</span> <span class="n">p0</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">xs</span><span class="p">)):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">xs</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o"><</span> <span class="p">(</span><span class="n">mu</span> <span class="o">-</span> <span class="mi">5</span><span class="o">*</span><span class="n">sigma</span><span class="p">))</span> <span class="ow">or</span> <span class="p">(</span><span class="n">x</span> <span class="o">></span> <span class="p">(</span><span class="n">mu</span> <span class="o">+</span> <span class="mi">5</span><span class="o">*</span><span class="n">sigma</span><span class="p">)):</span>
<span class="k">continue</span> <span class="c"># Skip because it's very small</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">gaussian</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="c"># Wrap left side</span>
<span class="n">idxs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argwhere</span><span class="p">(</span><span class="n">xs</span> <span class="o"><</span> <span class="n">quantiles</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">flipped_idxs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">flip</span><span class="p">(</span><span class="n">idxs</span><span class="p">)</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">flipped_idxs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">flipped_idxs</span><span class="p">:</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">pdf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">j</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c"># Wrap right side</span>
<span class="n">idxs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argwhere</span><span class="p">(</span><span class="n">xs</span> <span class="o">></span> <span class="n">quantiles</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="n">j</span> <span class="o">=</span> <span class="n">idxs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">idxs</span><span class="p">:</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">+=</span> <span class="n">pdf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">pdf</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">j</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">pdf</span> <span class="o">/</span> <span class="p">(</span><span class="n">num_quantiles</span><span class="p">)</span> <span class="c"># Rescale to unit area</span>
<span class="k">return</span> <span class="n">pdf</span>
<span class="c"># To plot the percentiles, we need to take the derivative of it</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed Quantile-Estimated PDF (K=1)"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed Quantile-Estimated PDF (K=3)"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"green"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-05-quantiles-v-hist-revised.png" alt="fig-05-quantiles-v-hist-revised.png" title="Quantiles and Histogram estimates of PDF compared." /></p>
<p>That looks nicer to my eye.</p>
<h2 id="combining-quantiles">Combining Quantiles</h2>
<p>Let us now imagine that we now want to combine samples A and samples B into a new estimate. How would you achieve this?</p>
<p>Besides just combining two sets of samples into a single one and creating a new PDF estimate from scratch, I can think of two ways we could combine them:</p>
<ol>
<li>Take the (weighted) mean of all the estimated PDFs.</li>
<li>Combine each set of quantiles into a single set of quantiles.</li>
</ol>
<p>Let’s explore both of those:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">gaussian_cdf</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">return</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">scipy</span><span class="o">.</span><span class="n">special</span><span class="o">.</span><span class="n">erf</span><span class="p">((</span><span class="n">x</span><span class="o">-</span><span class="n">mu</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">sigma</span><span class="o">*</span><span class="n">Math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span><span class="p">))))</span>
<span class="k">def</span> <span class="nf">flatten_lists</span><span class="p">(</span><span class="n">l</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">item</span> <span class="k">for</span> <span class="n">sublist</span> <span class="ow">in</span> <span class="n">l</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">sublist</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">combine_quantiles</span><span class="p">(</span><span class="n">list_of_quantiles</span><span class="p">,</span> <span class="n">list_of_n_samps</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">):</span>
<span class="s">'''Combines a list of quantiles of equal length into a single quantile set of length num_quantiles.'''</span>
<span class="n">N_tot</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">list_of_n_samps</span><span class="p">)</span>
<span class="n">N_qs</span> <span class="o">=</span> <span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">qs</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">for</span> <span class="n">qs</span> <span class="ow">in</span> <span class="n">list_of_quantiles</span><span class="p">]</span>
<span class="n">per_element_weights</span> <span class="o">=</span> <span class="p">[</span><span class="n">N</span> <span class="o">/</span> <span class="p">(</span><span class="n">N_tot</span> <span class="o">*</span> <span class="n">N_q</span><span class="p">)</span> <span class="k">for</span> <span class="n">N</span><span class="p">,</span> <span class="n">N_q</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">list_of_n_samps</span><span class="p">,</span> <span class="n">N_qs</span><span class="p">)]</span>
<span class="c"># Put all the values (except the 0th percentile) into sorted_quantiles, with weights</span>
<span class="n">q_n_pairs</span> <span class="o">=</span> <span class="p">[[(</span><span class="n">q</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="k">for</span> <span class="n">q</span> <span class="ow">in</span> <span class="n">qs</span><span class="p">[</span><span class="mi">1</span><span class="p">:]]</span> <span class="k">for</span> <span class="n">qs</span><span class="p">,</span> <span class="n">w</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">list_of_quantiles</span><span class="p">,</span> <span class="n">per_element_weights</span><span class="p">)]</span>
<span class="n">sorted_quantiles</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">flatten_lists</span><span class="p">(</span><span class="n">q_n_pairs</span><span class="p">),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">tup</span><span class="p">:</span> <span class="n">tup</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="c"># For every new quantile...</span>
<span class="n">xp</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">flatten_lists</span><span class="p">(</span><span class="n">list_of_quantiles</span><span class="p">))</span>
<span class="n">combined</span> <span class="o">=</span> <span class="p">[</span><span class="n">xp</span><span class="p">]</span>
<span class="c"># Uncomment for a double check:</span>
<span class="c"># print("This should be very close to 1.0: ", sum([w for q, w in sorted_quantiles]))</span>
<span class="k">for</span> <span class="n">y_des</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="c"># ...find the stairstepped point just before y_des...</span>
<span class="n">y_stairs</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">q</span><span class="p">,</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">sorted_quantiles</span><span class="p">:</span> <span class="c"># TODO: Start search where left off </span>
<span class="n">y_next</span> <span class="o">=</span> <span class="p">(</span><span class="n">y_stairs</span> <span class="o">+</span> <span class="n">w</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">y_next</span> <span class="o"><=</span> <span class="n">y_des</span><span class="p">):</span>
<span class="n">xp</span> <span class="o">=</span> <span class="n">q</span>
<span class="n">y_stairs</span> <span class="o">=</span> <span class="n">y_next</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">break</span>
<span class="c"># ...find the slope of the line that combines all other ramps after X</span>
<span class="c"># y1 = A1 x + b1 # Find b1 such that y1 = A1 (x - q0)</span>
<span class="c"># y2 = A2 x + b2 # Find b2 such that y2 = A2 </span>
<span class="c"># ----------------------</span>
<span class="c"># y = (A1+A2)x + (b1+b2)</span>
<span class="c"># x = (y - (b1+b2)) / (A1+A2)</span>
<span class="c"># To find the b1, b2, etc, we'll use the fact that at x = 0, y = (q0 - xp)</span>
<span class="c"># y = A x + b </span>
<span class="c"># q0-xp = A 0 + b</span>
<span class="c"># b = q0-xp</span>
<span class="n">slope</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">qs</span><span class="p">,</span> <span class="n">w</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">list_of_quantiles</span><span class="p">,</span> <span class="n">per_element_weights</span><span class="p">):</span> <span class="c">## TODO: Search where left off</span>
<span class="c"># Find the index of the first point less than</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">qs</span><span class="o">></span><span class="n">xp</span><span class="p">)</span> <span class="c"># Find the first point greater or equal to xp</span>
<span class="n">q0</span> <span class="o">=</span> <span class="n">qs</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">q1</span> <span class="o">=</span> <span class="n">qs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">q1</span> <span class="o">-</span> <span class="n">q0</span><span class="p">)</span> <span class="o">/</span> <span class="n">w</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="n">q0</span> <span class="o">-</span> <span class="n">xp</span><span class="p">)</span>
<span class="n">slope</span> <span class="o">+=</span> <span class="n">a</span>
<span class="n">offset</span> <span class="o">+=</span> <span class="n">b</span>
<span class="c"># Now invert the cumulative set of lines to get x_des</span>
<span class="n">x_des</span> <span class="o">=</span> <span class="p">((</span><span class="n">y_des</span> <span class="o">-</span> <span class="n">y_stairs</span><span class="p">)</span> <span class="o">-</span> <span class="n">offset</span><span class="p">)</span> <span class="o">/</span> <span class="n">slope</span> <span class="c"># x = (y - (b1+b2)) / (A1+A2)</span>
<span class="c"># print("{:5f} + {:5f} = {:5f}".format(xp, x_des, xp+x_des))</span>
<span class="n">combined</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">xp</span><span class="o">+</span><span class="n">x_des</span><span class="p">)</span>
<span class="k">return</span> <span class="n">combined</span>
<span class="k">def</span> <span class="nf">rastered_combination</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">list_of_quantiles</span><span class="p">):</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">list_of_quantiles</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">for</span> <span class="n">quantiles</span> <span class="ow">in</span> <span class="n">list_of_quantiles</span><span class="p">[</span><span class="mi">1</span><span class="p">:]:</span>
<span class="n">pdf</span> <span class="o">+=</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">)</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">pdf</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">list_of_quantiles</span><span class="p">)</span>
<span class="k">return</span> <span class="n">pdf</span>
<span class="n">A_quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="n">B_quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="n">AB_quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">]),</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="n">combined_quantiles</span> <span class="o">=</span> <span class="n">combine_quantiles</span><span class="p">([</span><span class="n">A_quantiles</span><span class="p">,</span> <span class="n">B_quantiles</span><span class="p">],</span> <span class="p">[</span><span class="n">n_samps</span><span class="p">,</span> <span class="n">n_samps</span><span class="p">],</span> <span class="n">num_quantiles</span><span class="p">)</span>
<span class="c">#print(combined_quantiles)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">A_quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Dataset A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">B_quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Dataset B"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"yellow"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">AB_quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Dataset AB"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"green"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">rastered_combination</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="p">[</span><span class="n">A_quantiles</span><span class="p">,</span> <span class="n">B_quantiles</span><span class="p">]),</span> <span class="n">label</span><span class="o">=</span><span class="s">"A+B Estimate"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">combined_quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Combined Estimate"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"purple"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram of A"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre>
</div>
<p class="center"><img src="fig-05-combinations.png" alt="fig-05-combinations.png" title="Combining data sets A and B." /></p>
<p>Apologies for the busy plot, but there’s a lot to see here, and it may be worth studying further. I’m not sure which combination method should be used in general – it may depend on your application.</p>
<h2 id="what-happens-as-you-get-more-data">What happens as you get more data?</h2>
<p>Rather than do much math on this one, let us simply keep the number of histogram bins and quantiles constant at 50, increase the number of data points, and see what happens.</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">n</span> <span class="o">=</span> <span class="mi">200</span>
<span class="n">num_bins</span> <span class="o">=</span> <span class="mi">50</span>
<span class="k">while</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">20000</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">sample_from_pdf</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Quantile-Estimated PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s">'{} Data Points, {} Bins (or Quantiles)'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">))</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">n</span><span class="o">*</span><span class="mi">2</span>
</code></pre>
</div>
<p class="center"><img src="fig-06-200-samps.png" alt="fig-06-200-samps.png" title="200 data points." /></p>
<p class="center"><img src="fig-06-400-samps.png" alt="fig-06-400-samps.png" title="400 data points." /></p>
<p class="center"><img src="fig-06-800-samps.png" alt="fig-06-800-samps.png" title="800 data points." /></p>
<p class="center"><img src="fig-06-1600-samps.png" alt="fig-06-1600-samps.png" title="1600 data points." /></p>
<p class="center"><img src="fig-06-3200-samps.png" alt="fig-06-3200-samps.png" title="3200 data points." /></p>
<p class="center"><img src="fig-06-6400-samps.png" alt="fig-06-6400-samps.png" title="6400 data points." /></p>
<h2 id="what-happens-if-we-try-fewer-and-fewer-bins">What happens if we try fewer and fewer bins?</h2>
<p>This is another easy one to try:</p>
<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">n</span> <span class="o">=</span> <span class="mi">1600</span>
<span class="n">num_bins</span> <span class="o">=</span> <span class="mi">50</span>
<span class="k">while</span> <span class="n">num_bins</span> <span class="o">></span> <span class="mi">5</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">sample_from_pdf</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">quantiles</span> <span class="o">=</span> <span class="n">calc_quantiles</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">)</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">calc_true_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'gray'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">smoothed_pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">quantiles</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Quantile-Estimated PDF"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">histtype</span><span class="o">=</span><span class="s">'step'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Histogram"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">1.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s">'{} Data Points, {} Bins (or Quantiles)'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">num_bins</span><span class="p">))</span>
<span class="n">num_bins</span> <span class="o">=</span> <span class="n">num_bins</span> <span class="o">-</span> <span class="mi">10</span>
</code></pre>
</div>
<p class="center"><img src="fig-07-50-bins.png" alt="fig-07-50-bins.png" title="50 bins." /></p>
<p class="center"><img src="fig-07-40-bins.png" alt="fig-07-40-bins.png" title="40 bins." /></p>
<p class="center"><img src="fig-07-30-bins.png" alt="fig-07-30-bins.png" title="30 bins." /></p>
<p class="center"><img src="fig-07-20-bins.png" alt="fig-07-20-bins.png" title="20 bins." /></p>
<p class="center"><img src="fig-07-10-bins.png" alt="fig-07-10-bins.png" title="10 bins." /></p>
<h2 id="expressing-uncertainty-and-confidence-interval">Expressing Uncertainty and Confidence Interval</h2>
<p>I haven’t tried making confidence intervals for the histogram yet, but my guess is that we could express uncertainty by taking a lot of bootstraps, generate many CDFs, and then taking the top 95% or bottom 5% value at each X coordinate.</p>
<p>TODO.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The difference between the histogram and quantile technique is primarily in how information is used. In histograms, we give each bin an equal amount of information (i.e. one number per bin), regardless of how many points fall into it. In quantiles, we give each fraction of data the same amount of information (i.e. about one number per percentile).</p>
<p>The quantile method has a few advantages:</p>
<ol>
<li>It better captures the structure of the distribution better in high density areas.</li>
<li>It allows variable-density smoothing.</li>
<li>It can help reduce the visual effect of outliers.</li>
</ol>
<p>The quantile method also has disadvantages:</p>
<ol>
<li>It cannot represent truly disconnected distributions.</li>
<li>Visually, even though it is technically better it can be misleading – the relatively higher density of control points compared to the histogram means that quantile-based estimates of the PDF can have more visible variation.</li>
</ol>
<p>If this looks correct to y’all, then perhaps we could convert this code into a class or library that could be easily packaged and re-used by others?</p>
<h2 id="references">References</h2>
<ol>
<li><a href="https://suchideas.com/articles/maths/applied/histogram-errors/">https://suchideas.com/articles/maths/applied/histogram-errors/</a></li>
<li><a href="https://arxiv.org/pdf/1112.2593v3.pdf">https://arxiv.org/pdf/1112.2593v3.pdf</a></li>
<li><a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)">Bootstrap Hypothesis Testing</a></li>
</ol>The histogram is most scientists’ tool of choice for viewing the distribution of values of a single variable. But lately I have been exploring an alternative: quantiles (e.g. deciles, percentiles, etc). Although viewing data in this way is not perfect, it has several advantages to histograms, including robustness to outliers and a freedom from apriori assumptions about the range or smoothing level of the data.Jupyter Quick Install for Scientists2019-03-11T00:00:00-07:002019-03-11T00:00:00-07:00/blog/jupyter-quick-install<p><a href="https://www.python.org">Python</a> is a good language, but its tooling (like <code class="highlighter-rouge">pip</code>, <code class="highlighter-rouge">conda</code>, <code class="highlighter-rouge">setup.py</code>, <code class="highlighter-rouge">setup.cfg</code>, <code class="highlighter-rouge">requirements.txt</code>, and the awkward support for binaries needed for libraries like scipy, keras, etc), leaves much to be desired. And for most scientists, it’s simply not interesting using all these build tools when you just want to get coding quickly on a new machine.</p>
<p>The most simplest, most reliable way I’ve found to get jupyter installed is to first <a href="https://www.docker.com/get-started">install docker</a>. Then we just run a single command – but before you do so, please replace <code class="highlighter-rouge">/path/to/my/notebooks</code> with your local directory where you have been storing notebooks.</p>
<div class="highlighter-rouge"><pre class="highlight"><code>docker run --rm -p 8888:8888 --mount type=bind,source=/path/to/my/notebooks,target=/home/jovyan/work --name my-py-notebook jupyter/scipy-notebook
</code></pre>
</div>
<p>You’ll see some stuff printed out, but the only thing you care about is the “Token”. Now browse to <a href="http://localhost:8888">http://localhost:8888</a>, and paste in the token into the password field. Done!</p>
<p>Because we bind-mounted <code class="highlighter-rouge">/path/to/my/notebooks</code>, your notebooks will be persisted, but the docker container will be closed when you are done with it. Stop the container by typing this in a new terminal:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>docker stop my-py-notebook
</code></pre>
</div>
<h2 id="references">References</h2>
<ol>
<li><a href="https://github.com/jupyter/docker-stacks">https://github.com/jupyter/docker-stacks</a></li>
</ol>Python is a good language, but its tooling (like pip, conda, setup.py, setup.cfg, requirements.txt, and the awkward support for binaries needed for libraries like scipy, keras, etc), leaves much to be desired. And for most scientists, it’s simply not interesting using all these build tools when you just want to get coding quickly on a new machine.Automatic CSV to SQL ETL in Clojure2019-03-03T00:00:00-08:002019-03-03T00:00:00-08:00/blog/csv-etl-in-clojure<p>This is another basic example to help beginners get started in <a href="https://clojure.org">Clojure</a>, which is still my favorite programming language, almost 10 years after I first started using it. Like all languages, Clojure has some warts. But it also has a lot of very powerful and abstract concepts that many average programmers have not seen before, and like most mathematical concepts, the power of such abstractions are greatly underestimated by many people. Even fancy-sounding but easy-to-understand concepts like <a href="https://en.wikipedia.org/wiki/Homoiconicity">homoiconicity</a> are still really underappreciated in the programming community.</p>
<p>Compared with other mathematically-oriented languages like Haskell, Clojure is also arguably a more practical choice for practical tasks – it’s designed for massive concurrency, and has more libraries than any other language. This is a bold but probably a true statement because, in addition to its own libraries, Clojure can use java libraries on the back-end and javascript libraries on the front-end, which are two of the most popular languages in existence.</p>
<h2 id="tl-dr">TL; DR</h2>
<p>In this article, I’ll present a simple example of setting up a SQL database from a bunch of CSV files. This is called an ETL job (Extract, Transfer, Load), which is a really common pattern in software.</p>
<p>Since most ETL jobs like this are quite simple, let’s make a trivial example slightly more realistic by trying to automatically infer the database schema automatically. Let’s also try to make it fast enough to be usable for files with a few million rows in it.</p>
<p>An overview of the approach that we will follow is:</p>
<ol>
<li>
<p><strong>Group CSVs by directory</strong>, such that each directory corresponds to a SQL table that we would like to create.</p>
</li>
<li>
<p><strong>Scan the the directory tree</strong> to create a list of CSV files we want to scan.</p>
</li>
<li>
<p><strong>Autodetect the data type of each column</strong> across all files in each directory, and store the schema just outside the directory, so that it can be modified as needed.</p>
</li>
<li>
<p><strong>Create the table from the autodetected schema</strong>.</p>
</li>
<li>
<p><strong>Load the CSV files into SQL tables</strong>.</p>
</li>
</ol>
<p>In past articles, we discussed how to <a href="../loading-csvs-in-clojure">read in CSVs</a> and <a href="../docker-postgres">create a postgres database with docker</a>. I’ll just assume that you read those already, that you have a Postgres instance running in docker already, so that we can get on with the ETL-specific code.</p>
<h2 id="group-csvs-in-directories">1. Group CSVs in Directories</h2>
<p>Let’s go get some census data from 2010 and download some data. The <a href="http://census.ire.org/data/bulkdata.html">census.ire.org</a> website has a handy tool that let’s you download census data on each state. Since this is an example, we will only use five states’ worth of data.</p>
<ol>
<li><a href="http://censusdata.ire.org/01/all_060_in_01.P1.csv">Alabama</a></li>
<li><a href="http://censusdata.ire.org/02/all_060_in_02.P1.csv">Alaska</a></li>
<li><a href="http://censusdata.ire.org/04/all_060_in_04.P1.csv">Arizona</a></li>
<li><a href="http://censusdata.ire.org/05/all_060_in_05.P1.csv">Arkansas</a></li>
<li><a href="http://censusdata.ire.org/06/all_060_in_06.P1.csv">California</a></li>
</ol>
<p>Just making one table from a few small CSVs isn’t going to fully demonstrate this example, so let’s also fetch a much bigger dataset…the past 15 years of crime reports from Los Angeles, courtesy of <a href="https://data.gov">data.gov</a>.</p>
<ol>
<li><a href="https://data.lacity.org/api/views/y8tr-7khq/rows.csv?accessType=DOWNLOAD">“Crime Data from 2010 to Present”</a></li>
</ol>
<p>Note that this file is MUCH larger than the census data; it is 6.8 million lines long, and has about 20 columns, and is 1.5G in size. It will be a better benchmark of performance than those short census files.</p>
<p>Now place all the files in a directory tree like this:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>sqlcsv/
├── census
│ ├── all_060_in_01.P1.csv
│ ├── all_060_in_02.P1.csv
│ ├── all_060_in_04.P1.csv
│ ├── all_060_in_05.P1.csv
│ └── all_060_in_06.P1.csv
└── crimes
└── Crimes_-_2001_to_present.csv
</code></pre>
</div>
<h2 id="scan-directories">2. Scan directories</h2>
<p>This is pretty easy using <code class="highlighter-rouge">file-seq</code>. All we have to do is create a few functions for listing files and subdirectories.</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.files</span><span class="w">
</span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">java-time</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">jt</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">clojure.java.io</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">jio</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">list-files</span><span class="w">
</span><span class="s">"Lists only the files in the directory string DIR."</span><span class="w">
</span><span class="p">[</span><span class="n">dir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nb">file-seq</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/file</span><span class="w"> </span><span class="n">dir</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">remove</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">.isDirectory</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">%</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">list-subdirectories</span><span class="w">
</span><span class="s">"Lists only the subdirectorys of the directory string DIR"</span><span class="w">
</span><span class="p">[</span><span class="n">dir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nb">file-seq</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/file</span><span class="w"> </span><span class="n">dir</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">filter</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">.isDirectory</span><span class="w"> </span><span class="n">%</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">remove</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="n">%</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/file</span><span class="w"> </span><span class="n">dir</span><span class="p">)))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">has-suffix?</span><span class="w">
</span><span class="s">"Works on file object types."</span><span class="w">
</span><span class="p">[</span><span class="w"> </span><span class="o">^</span><span class="n">String</span><span class="w"> </span><span class="n">suffix</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">file</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">and</span><span class="w"> </span><span class="p">(</span><span class="nf">.isFile</span><span class="w"> </span><span class="n">file</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">re-find</span><span class="w"> </span><span class="p">(</span><span class="nb">re-pattern</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">".*\\."</span><span class="w"> </span><span class="n">suffix</span><span class="w"> </span><span class="s">"$"</span><span class="p">))</span><span class="w"> </span><span class="p">(</span><span class="nf">.getName</span><span class="w"> </span><span class="n">file</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">list-files-of-type</span><span class="w">
</span><span class="s">"Lists all files in the directory with the extension ext."</span><span class="w">
</span><span class="p">[</span><span class="n">dir</span><span class="w"> </span><span class="n">ext</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nb">file-seq</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/file</span><span class="w"> </span><span class="n">dir</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">filter</span><span class="w"> </span><span class="p">(</span><span class="nb">partial</span><span class="w"> </span><span class="n">has-suffix?</span><span class="w"> </span><span class="n">ext</span><span class="p">))))</span><span class="w">
</span></code></pre>
</div>
<p>Some string-cleaning utility functions will also come in handy:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.util</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">set!</span><span class="w"> </span><span class="n">*warn-on-reflection*</span><span class="w"> </span><span class="n">true</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">alphanumeric?</span><span class="w">
</span><span class="s">"TRUE when the string is completely alphanumeric."</span><span class="w">
</span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="p">(</span><span class="nb">apply</span><span class="w"> </span><span class="nb">str</span><span class="w"> </span><span class="p">(</span><span class="nb">re-seq</span><span class="w"> </span><span class="o">#</span><span class="s">"[a-z_A-Z0-9]"</span><span class="w"> </span><span class="n">string</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">spaces-to-underscores</span><span class="w">
</span><span class="s">"Converts spaces to underscores."</span><span class="w">
</span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">clojure.string/replace</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="o">#</span><span class="s">"\s"</span><span class="w"> </span><span class="s">"_"</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">periods-to-underscores</span><span class="w">
</span><span class="s">"Converts spaces to underscores."</span><span class="w">
</span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">clojure.string/replace</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="o">#</span><span class="s">"\."</span><span class="w"> </span><span class="s">"_"</span><span class="p">))</span><span class="w">
</span></code></pre>
</div>
<p>We’ll also need some simple date-parsing functions:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.dates</span><span class="w">
</span><span class="s">"Code for handling strings reperesnting dates and datetimes."</span><span class="w">
</span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">java-time</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">jt</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="nf">set!</span><span class="w"> </span><span class="n">*warn-on-reflection*</span><span class="w"> </span><span class="n">true</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">parse-date</span><span class="w">
</span><span class="s">"Parses a standard date, like 2019-02-17."</span><span class="w">
</span><span class="p">[</span><span class="n">s</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">jt/local-date</span><span class="w"> </span><span class="s">"yyyy-MM-dd"</span><span class="w"> </span><span class="n">s</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">parse-datetime</span><span class="w">
</span><span class="s">"Returns the datetime format that Python's pandas usually saves in."</span><span class="w">
</span><span class="p">[</span><span class="n">s</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">jt/local-date-time</span><span class="w"> </span><span class="s">"yyyy-MM-dd HH:mm:ss"</span><span class="w"> </span><span class="n">s</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">local-to-offset</span><span class="w">
</span><span class="s">"Converts a local date time to an offset date time. By default, it assumes
that the local time is UTC, but you may change this with optional arg TZ."</span><span class="w">
</span><span class="p">[</span><span class="n">local-date-time</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="p">[</span><span class="n">tz</span><span class="p">]]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">tz</span><span class="w"> </span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="s">"UTC"</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">-></span><span class="w"> </span><span class="n">local-date-time</span><span class="w">
</span><span class="p">(</span><span class="nf">jt/zoned-date-time</span><span class="w"> </span><span class="n">tz</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">jt/offset-date-time</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">parse-RFC3339</span><span class="w">
</span><span class="s">"Assuming a UTC datestamp with T and Z separator, for example:
2019-01-17T22:03:16Z
2019-01-17T22:03:16.383Z
2019-01-17T22:03:16.111222333Z"</span><span class="w">
</span><span class="p">[</span><span class="n">s</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">local-to-offset</span><span class="w">
</span><span class="p">(</span><span class="nf">condp</span><span class="w"> </span><span class="nb">=</span><span class="w"> </span><span class="p">(</span><span class="nb">count</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="mi">20</span><span class="w"> </span><span class="p">(</span><span class="nf">jt/local-date-time</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ss'Z'"</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="mi">24</span><span class="w"> </span><span class="p">(</span><span class="nf">jt/local-date-time</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="mi">27</span><span class="w"> </span><span class="p">(</span><span class="nf">jt/local-date-time</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'"</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="mi">30</span><span class="w"> </span><span class="p">(</span><span class="nf">jt/local-date-time</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS'Z'"</span><span class="w"> </span><span class="n">s</span><span class="p">))))</span><span class="w">
</span></code></pre>
</div>
<h2 id="autodetect-the-schema">3. Autodetect the Schema</h2>
<p>This is by far the most complex section of the program. For each element, the autodetector tests each of the parsing functions in <code class="highlighter-rouge">sql-types-and-parsers</code>, and the first that works will be is considered the inferred SQL type. As I experimented with this at the REPL, I realized that testing every sql parser on every element was prohibitively slow, so I defined <code class="highlighter-rouge">guess-all-sql-types-in-column</code> to reduce the rate of failed tests by remembering which parser last worked for each column.</p>
<p>Another note on optimization: although I initially assumed that I could make the CSV loading faster by scanning only the first N lines of each file, this ended up being error-prone in general, so I relented and allowed it to scan the whole file.</p>
<p>Note that this code only works for integers, floats, dates, datetimes, and strings (text), but that you could easily extend it by adding more things to <code class="highlighter-rouge">type-definitions</code>. An exception thrown if types do not match in different files.</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.guess-schema</span><span class="w">
</span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">clojure.data.csv</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">net.roboloco.dates</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">dates</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">net.roboloco.files</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">files</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">net.roboloco.util</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">util</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="nf">set!</span><span class="w"> </span><span class="n">*warn-on-reflection*</span><span class="w"> </span><span class="n">true</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">def</span><span class="w"> </span><span class="o">^</span><span class="no">:dynamic</span><span class="w"> </span><span class="n">*sql-types-and-parsers*</span><span class="w">
</span><span class="c1">;; This data structure defines all of the SQL data types, and the appropriate
</span><span class="w"> </span><span class="c1">;; function to use when parsing a string containing that data type.
</span><span class="w"> </span><span class="c1">;; Parsers will be tried in sequential order, and the first one that works is used.
</span><span class="w"> </span><span class="c1">;;
</span><span class="w"> </span><span class="c1">;; SQL String->CLJ Parser
</span><span class="w"> </span><span class="p">[[</span><span class="s">"NULL"</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="p">(</span><span class="nb">nil?</span><span class="w"> </span><span class="n">%</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">%</span><span class="p">))]</span><span class="w">
</span><span class="p">[</span><span class="s">"INTEGER"</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">Integer/parseInt</span><span class="w"> </span><span class="n">%</span><span class="p">)]</span><span class="w">
</span><span class="p">[</span><span class="s">"DOUBLE PRECISION"</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">Float/parseFloat</span><span class="w"> </span><span class="n">%</span><span class="p">)]</span><span class="w">
</span><span class="p">[</span><span class="s">"DATE"</span><span class="w"> </span><span class="n">dates/parse-date</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="s">"TIMESTAMPTZ"</span><span class="w"> </span><span class="n">dates/parse-RFC3339</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="s">"TEXT"</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">%</span><span class="p">)]])</span><span class="w"> </span><span class="c1">;; this is always true, so is the "default" value
</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">guess-sql-parser</span><span class="w">
</span><span class="s">"Given an unknown string, this fn runs through all of the SQL types & parsers in
sql-types-and-parsers and returns the first row with a working parser."</span><span class="w">
</span><span class="p">[</span><span class="n">string</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="p">[</span><span class="n">types-and-parsers</span><span class="w"> </span><span class="n">*sql-types-and-parsers*</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">when-let</span><span class="w"> </span><span class="p">[[</span><span class="n">sql-type</span><span class="w"> </span><span class="n">parse-fn</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">typerow</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">types-and-parsers</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">try</span><span class="w"> </span><span class="p">(</span><span class="nf">parse-fn</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">catch</span><span class="w"> </span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="n">false</span><span class="p">))</span><span class="w">
</span><span class="n">typerow</span><span class="w">
</span><span class="p">(</span><span class="nf">recur</span><span class="w"> </span><span class="p">(</span><span class="nb">next</span><span class="w"> </span><span class="n">types-and-parsers</span><span class="p">))))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">guess-all-sql-types-in-column</span><span class="w">
</span><span class="s">"Like guess-sql-type, but an optimized version for looking at a whole column.
In practice, this really reduces the number of tests and exceptions trapped
over the simpler but much slower solution:
(set (flatten (map guess-sql-parser seq-of-strings)))"</span><span class="w">
</span><span class="p">[</span><span class="n">seq-of-strings</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="p">[</span><span class="n">strings</span><span class="w"> </span><span class="n">seq-of-strings</span><span class="w">
</span><span class="n">last-successful-parse-fn</span><span class="w"> </span><span class="n">nil</span><span class="w">
</span><span class="n">types-found</span><span class="w"> </span><span class="o">#</span><span class="p">{}]</span><span class="w">
</span><span class="p">(</span><span class="nb">if-let</span><span class="w"> </span><span class="p">[</span><span class="n">string</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">strings</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nf">try</span><span class="w"> </span><span class="p">(</span><span class="nf">last-successful-parse-fn</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">catch</span><span class="w"> </span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="n">false</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">recur</span><span class="w"> </span><span class="p">(</span><span class="nb">next</span><span class="w"> </span><span class="n">strings</span><span class="p">)</span><span class="w"> </span><span class="c1">; Previously successful parser worked again
</span><span class="w"> </span><span class="n">last-successful-parse-fn</span><span class="w">
</span><span class="n">types-found</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">if-let</span><span class="w"> </span><span class="p">[[</span><span class="n">sql-type</span><span class="w"> </span><span class="n">parse-fn</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">guess-sql-parser</span><span class="w"> </span><span class="n">string</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nf">recur</span><span class="w"> </span><span class="p">(</span><span class="nb">next</span><span class="w"> </span><span class="n">strings</span><span class="p">)</span><span class="w"> </span><span class="c1">; A new working parser was found
</span><span class="w"> </span><span class="n">parse-fn</span><span class="w">
</span><span class="p">(</span><span class="nb">conj</span><span class="w"> </span><span class="n">types-found</span><span class="w"> </span><span class="n">sql-type</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">recur</span><span class="w"> </span><span class="p">(</span><span class="nb">next</span><span class="w"> </span><span class="n">strings</span><span class="p">)</span><span class="w"> </span><span class="c1">; No working parser found, move to next string
</span><span class="w"> </span><span class="n">last-successful-parse-fn</span><span class="w">
</span><span class="p">(</span><span class="nb">conj</span><span class="w"> </span><span class="n">types-found</span><span class="w"> </span><span class="n">nil</span><span class="p">))))</span><span class="w">
</span><span class="n">types-found</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">clean-column-names</span><span class="w">
</span><span class="s">"Replaces whitespaces and periods in column names with underscores."</span><span class="w">
</span><span class="p">[</span><span class="n">columns</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="n">columns</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="n">util/periods-to-underscores</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">mapv</span><span class="w"> </span><span class="n">util/spaces-to-underscores</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">guess-csv-column-types</span><span class="w">
</span><span class="s">"Returns a map of column name to the guessed SQL column type. Reads every
row in the CSV, and returns all types found for each column. Works in
parallel and lazily on chunks of 1000 lines, to reduce the time to parse
very large files."</span><span class="w">
</span><span class="p">[</span><span class="n">csv-filepath</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Scanning:"</span><span class="w"> </span><span class="n">csv-filepath</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">with-open</span><span class="w"> </span><span class="p">[</span><span class="n">reader</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/reader</span><span class="w"> </span><span class="n">csv-filepath</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">rows</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.data.csv/read-csv</span><span class="w"> </span><span class="n">reader</span><span class="p">)</span><span class="w">
</span><span class="n">header</span><span class="w"> </span><span class="p">(</span><span class="nf">clean-column-names</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">rows</span><span class="p">))</span><span class="w">
</span><span class="n">data-rows</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="n">rows</span><span class="p">)</span><span class="w">
</span><span class="n">chunk-size</span><span class="w"> </span><span class="mi">10000</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="n">data-rows</span><span class="w">
</span><span class="p">(</span><span class="nf">partition-all</span><span class="w"> </span><span class="n">chunk-size</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">apply</span><span class="w"> </span><span class="nb">map</span><span class="w"> </span><span class="nb">vector</span><span class="w"> </span><span class="n">%</span><span class="p">))</span><span class="w"> </span><span class="c1">;; Convert list of rows into list of columns
</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">pmap</span><span class="w"> </span><span class="n">guess-all-sql-types-in-column</span><span class="w"> </span><span class="n">%</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="n">data</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="mi">10000</span><span class="w"> </span><span class="p">(</span><span class="nb">inc</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="s">"rows scanned"</span><span class="p">)</span><span class="w"> </span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">range</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">apply</span><span class="w"> </span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="o">&</span><span class="w"> </span><span class="n">args</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nb">reduce</span><span class="w"> </span><span class="n">clojure.set/union</span><span class="w"> </span><span class="n">args</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="nb">vector</span><span class="w"> </span><span class="n">header</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">into</span><span class="w"> </span><span class="p">{})))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">scan-csvdir-and-make-schema</span><span class="w">
</span><span class="s">"Scans the header of every .csv file in CSVDIR, and returns a hashmap
containing the schema of all the columns in the directory.
If a non-alphanumeric string is found, raises an exception.
If the schema is inconsistent, raises an exception."</span><span class="w">
</span><span class="p">[</span><span class="n">csvdir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">csv-schemas</span><span class="w"> </span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nf">files/list-files-of-type</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="s">"csv"</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="n">guess-csv-column-types</span><span class="p">))</span><span class="w">
</span><span class="n">columns</span><span class="w"> </span><span class="p">(</span><span class="nb">set</span><span class="w"> </span><span class="p">(</span><span class="nf">flatten</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="nb">keys</span><span class="w"> </span><span class="n">csv-schemas</span><span class="p">)))</span><span class="w">
</span><span class="n">problematic-columns</span><span class="w"> </span><span class="p">(</span><span class="nb">remove</span><span class="w"> </span><span class="n">util/alphanumeric?</span><span class="w"> </span><span class="n">columns</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">when-not</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">problematic-columns</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">throw</span><span class="w"> </span><span class="p">(</span><span class="nf">Exception.</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">"Non-alphanumeric characters found in column names:"</span><span class="w">
</span><span class="p">(</span><span class="nb">apply</span><span class="w"> </span><span class="nb">str</span><span class="w"> </span><span class="p">(</span><span class="nf">interpose</span><span class="w"> </span><span class="s">", "</span><span class="w"> </span><span class="n">problematic-columns</span><span class="p">))))))</span><span class="w">
</span><span class="p">(</span><span class="nb">into</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="p">(</span><span class="k">for</span><span class="w"> </span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="n">columns</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">all-types-for-col</span><span class="w"> </span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nf">vec</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">%</span><span class="w"> </span><span class="n">col</span><span class="p">))</span><span class="w"> </span><span class="n">csv-schemas</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">flatten</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">remove</span><span class="w"> </span><span class="nb">nil?</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">set</span><span class="p">))</span><span class="w">
</span><span class="n">nullable-suffix</span><span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">all-types-for-col</span><span class="w"> </span><span class="s">"NULL"</span><span class="p">)</span><span class="w">
</span><span class="s">" NULL"</span><span class="w">
</span><span class="s">""</span><span class="p">)</span><span class="w">
</span><span class="n">types</span><span class="w"> </span><span class="p">(</span><span class="nb">disj</span><span class="w"> </span><span class="n">all-types-for-col</span><span class="w"> </span><span class="s">"NULL"</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">cond</span><span class="w">
</span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">(</span><span class="nb">count</span><span class="w"> </span><span class="n">types</span><span class="p">))</span><span class="w"> </span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="n">nil</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">(</span><span class="nb">count</span><span class="w"> </span><span class="n">types</span><span class="p">))</span><span class="w"> </span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">types</span><span class="p">)</span><span class="w"> </span><span class="n">nullable-suffix</span><span class="p">)]</span><span class="w">
</span><span class="c1">;; If it's mixed integer and float, make everything float
</span><span class="w"> </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="o">#</span><span class="p">{</span><span class="s">"INTEGER"</span><span class="w"> </span><span class="s">"DOUBLE PRECISION"</span><span class="p">}</span><span class="w"> </span><span class="n">types</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">"DOUBLE PRECISION"</span><span class="w"> </span><span class="n">nullable-suffix</span><span class="p">)]</span><span class="w">
</span><span class="c1">;; If the default type of TEXT is in there, choose text
</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="s">"TEXT"</span><span class="p">)</span><span class="w"> </span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">"TEXT"</span><span class="w"> </span><span class="n">nullable-suffix</span><span class="p">)]</span><span class="w">
</span><span class="no">:otherwise</span><span class="w"> </span><span class="c1">;; Otherwise we have some weird error
</span><span class="w"> </span><span class="p">(</span><span class="nf">throw</span><span class="w"> </span><span class="p">(</span><span class="nf">Exception.</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">"Inconsistent types across files for column: "</span><span class="w">
</span><span class="n">col</span><span class="w"> </span><span class="p">(</span><span class="nf">vec</span><span class="w"> </span><span class="n">types</span><span class="p">))))))))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">parse-csv-rows-using-schema</span><span class="w">
</span><span class="s">"Lazily parse CSV-ROWS using the schema."</span><span class="w">
</span><span class="p">[</span><span class="n">schema</span><span class="w"> </span><span class="n">csv-rows</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">header</span><span class="w"> </span><span class="p">(</span><span class="nf">clean-column-names</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">csv-rows</span><span class="p">))</span><span class="w">
</span><span class="n">types</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="n">%</span><span class="p">)</span><span class="w"> </span><span class="n">header</span><span class="p">)</span><span class="w">
</span><span class="n">empty-string-to-nil</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">s</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">and</span><span class="w"> </span><span class="p">(</span><span class="nb">string?</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">s</span><span class="p">))</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="n">s</span><span class="p">))</span><span class="w">
</span><span class="n">raw-rows</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="n">empty-string-to-nil</span><span class="w"> </span><span class="n">%</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="n">csv-rows</span><span class="p">))</span><span class="w">
</span><span class="n">all-parsers</span><span class="w"> </span><span class="p">(</span><span class="nb">into</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="n">*sql-types-and-parsers*</span><span class="p">)</span><span class="w">
</span><span class="n">row-parsers</span><span class="w"> </span><span class="p">(</span><span class="nf">mapv</span><span class="w"> </span><span class="o">#</span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">all-parsers</span><span class="w"> </span><span class="n">%</span><span class="p">)</span><span class="w"> </span><span class="n">types</span><span class="p">)</span><span class="w">
</span><span class="n">typed-rows</span><span class="w"> </span><span class="p">(</span><span class="k">for</span><span class="w"> </span><span class="p">[</span><span class="n">raw-row</span><span class="w"> </span><span class="n">raw-rows</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">parse-fn</span><span class="w"> </span><span class="n">element</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">when</span><span class="w"> </span><span class="p">(</span><span class="nb">and</span><span class="w"> </span><span class="n">parse-fn</span><span class="w"> </span><span class="p">(</span><span class="nb">not</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">element</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="nf">try</span><span class="w"> </span><span class="p">(</span><span class="nf">parse-fn</span><span class="w"> </span><span class="n">element</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">catch</span><span class="w"> </span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Schema:"</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Header:"</span><span class="w"> </span><span class="n">header</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Raw row:"</span><span class="w"> </span><span class="n">raw-row</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">throw</span><span class="w"> </span><span class="n">e</span><span class="p">)))))</span><span class="w">
</span><span class="n">row-parsers</span><span class="w">
</span><span class="n">raw-row</span><span class="p">))</span><span class="w">
</span><span class="n">cnt</span><span class="w"> </span><span class="p">(</span><span class="nf">atom</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w">
</span><span class="n">chunk-size</span><span class="w"> </span><span class="mi">1000</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">header</span><span class="w"> </span><span class="n">typed-rows</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">table-definition-sql-string</span><span class="w">
</span><span class="s">"Returns a string suitable for creating a SQL table named TABLE-NAME, given
a hashmap SCHEMA of column names to column types. The ENDING-STRING is appended
to the end of the create table statement, if needed. "</span><span class="w">
</span><span class="p">[</span><span class="n">table-name</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="p">[</span><span class="n">ending-string</span><span class="p">]]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">ending-string</span><span class="w"> </span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="n">ending-string</span><span class="w"> </span><span class="s">""</span><span class="p">)</span><span class="w">
</span><span class="n">col-defs</span><span class="w"> </span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="n">schema</span><span class="w">
</span><span class="p">(</span><span class="nb">sort-by</span><span class="w"> </span><span class="nb">first</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">remove</span><span class="w"> </span><span class="p">(</span><span class="nb">comp</span><span class="w"> </span><span class="nb">nil?</span><span class="w"> </span><span class="nb">second</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[[</span><span class="n">col</span><span class="w"> </span><span class="n">type</span><span class="p">]]</span><span class="w"> </span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"\t%s %s"</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="n">type</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="nf">interpose</span><span class="w"> </span><span class="s">",\n"</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">apply</span><span class="w"> </span><span class="nb">str</span><span class="p">))]</span><span class="w">
</span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"CREATE TABLE %s (\n%s %s\n);"</span><span class="w">
</span><span class="n">table-name</span><span class="w"> </span><span class="n">col-defs</span><span class="w"> </span><span class="n">ending-string</span><span class="p">)))</span><span class="w">
</span></code></pre>
</div>
<p>We are now ready to actually do the autodetection! Let’s create our main namespace:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.csv2sql</span><span class="w">
</span><span class="p">(</span><span class="no">:gen-class</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">clojure.data.csv</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">clojure.java.jdbc</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">sql</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">net.roboloco.guess-schema</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">guess</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">net.roboloco.files</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">files</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="nf">set!</span><span class="w"> </span><span class="n">*warn-on-reflection*</span><span class="w"> </span><span class="n">true</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">table-schema-filename</span><span class="w"> </span><span class="p">[</span><span class="n">dirname</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"%s-schema.edn"</span><span class="w"> </span><span class="n">dirname</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">table-sql-filename</span><span class="w"> </span><span class="p">[</span><span class="n">dirname</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"%s.sql"</span><span class="w"> </span><span class="n">dirname</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">autodetect-sql-schemas!</span><span class="w">
</span><span class="s">"Scans through the subdirectories of CSVDIR, infers the column data types,
and stores the inferred schema in CSVDIR so that you may manually edit it
before loading it in with MAKE-SQL-TABLES."</span><span class="w">
</span><span class="p">[</span><span class="n">csvdir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">doseq</span><span class="w"> </span><span class="p">[</span><span class="n">dir</span><span class="w"> </span><span class="p">(</span><span class="nf">files/list-subdirectories</span><span class="w"> </span><span class="n">csvdir</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nf">printf</span><span class="w"> </span><span class="s">"Autodetecting schema for: %s\n"</span><span class="w"> </span><span class="n">dir</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">tablename</span><span class="w"> </span><span class="p">(</span><span class="nf">.getName</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">dir</span><span class="p">)</span><span class="w">
</span><span class="n">schema</span><span class="w"> </span><span class="p">(</span><span class="nf">guess/scan-csvdir-and-make-schema</span><span class="w"> </span><span class="n">dir</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">when-not</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">table-sql</span><span class="w"> </span><span class="p">(</span><span class="nf">guess/table-definition-sql-string</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="n">schema</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="p">(</span><span class="nf">table-schema-filename</span><span class="w"> </span><span class="n">tablename</span><span class="p">))</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">spit</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="p">(</span><span class="nf">table-schema-filename</span><span class="w"> </span><span class="n">tablename</span><span class="p">))</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">spit</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="p">(</span><span class="nf">table-sql-filename</span><span class="w"> </span><span class="n">tablename</span><span class="p">))</span><span class="w"> </span><span class="n">table-sql</span><span class="p">))))))</span><span class="w">
</span></code></pre>
</div>
<p>You may note that I’m storing the SQL schemas for each subdirectory in the root <code class="highlighter-rouge">sqlcsv/</code> directory. This will let you hand-tune the schema as needed, if you want to make an index on one key or another, or make a particular column unique and required.</p>
<h2 id="create-the-autodetected-schema">4. Create the Autodetected Schema</h2>
<p>With the schema autodetected, we now need to create the tables. Continuing along with the <code class="highlighter-rouge">net.roboloco.csv2sql</code> namespace, and assuming that you are using the same <a href="../docker-postgres">postgres database</a> from a previous article:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">def</span><span class="w"> </span><span class="n">default-db</span><span class="w"> </span><span class="p">{</span><span class="no">:dbtype</span><span class="w"> </span><span class="s">"postgresql"</span><span class="w">
</span><span class="no">:dbname</span><span class="w"> </span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="p">(</span><span class="nf">System/getenv</span><span class="w"> </span><span class="s">"POSTGERS_DB"</span><span class="p">)</span><span class="w"> </span><span class="s">"csv2sql"</span><span class="p">)</span><span class="w">
</span><span class="no">:user</span><span class="w"> </span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="p">(</span><span class="nf">System/getenv</span><span class="w"> </span><span class="s">"POSTGRES_USER"</span><span class="p">)</span><span class="w"> </span><span class="s">"postgres"</span><span class="p">)</span><span class="w">
</span><span class="no">:password</span><span class="w"> </span><span class="p">(</span><span class="nb">or</span><span class="w"> </span><span class="p">(</span><span class="nf">System/getenv</span><span class="w"> </span><span class="s">"POSTGRES_PASS"</span><span class="p">)</span><span class="w"> </span><span class="s">"mysecretpassword"</span><span class="p">)})</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">connection-ok?</span><span class="w">
</span><span class="s">"A predicate that tests if the database is connected."</span><span class="w">
</span><span class="p">[</span><span class="n">db</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="p">{</span><span class="no">:result</span><span class="w"> </span><span class="mi">15</span><span class="p">}</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="p">(</span><span class="nf">sql/query</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="p">[</span><span class="s">"select 3*5 as result"</span><span class="p">]))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">drop-existing-sql-tables!</span><span class="w">
</span><span class="s">"For each subdirectory in DIRNAME, drop any tables with the same name."</span><span class="w">
</span><span class="p">[</span><span class="n">db</span><span class="w"> </span><span class="n">csvdir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">doseq</span><span class="w"> </span><span class="p">[</span><span class="n">table-name</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">.getName</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">files/list-subdirectories</span><span class="w"> </span><span class="n">csvdir</span><span class="p">))]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">cmd</span><span class="w"> </span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"DROP TABLE IF EXISTS %s;"</span><span class="w"> </span><span class="n">table-name</span><span class="p">)</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/db-do-commands</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">cmd</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">make-sql-tables!</span><span class="w">
</span><span class="s">"Makes the SQL tables from whatever is in the database. "</span><span class="w">
</span><span class="p">[</span><span class="n">db</span><span class="w"> </span><span class="n">csvdir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">doseq</span><span class="w"> </span><span class="p">[</span><span class="n">sql-file</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">.getName</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">files/list-files-of-type</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="s">"sql"</span><span class="p">))]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">table-sql</span><span class="w"> </span><span class="p">(</span><span class="nb">slurp</span><span class="w"> </span><span class="n">sql-file</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="n">table-sql</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/db-do-commands</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">table-sql</span><span class="p">))))</span><span class="w">
</span></code></pre>
</div>
<h2 id="load-the-csv-files-into-sql">5. Load the CSV files into SQL</h2>
<p>The final step is to load in the CSV file. As we do so, we need to parse the strings from the CSV using the schema so that they are converted into the proper data type for JDBC to properly insert them in Postgres.</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">insert-csv!</span><span class="w">
</span><span class="s">"Inserts the rows of the CSV into the database, converting the rows to the appropriate
type as they are loaded. Lazy, so it works on very large files. If a column is not
found in the schema, it is omitted and not inserted into the database. "</span><span class="w">
</span><span class="p">[</span><span class="n">db</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="n">csvfile</span><span class="w"> </span><span class="n">schema</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">with-open</span><span class="w"> </span><span class="p">[</span><span class="n">reader</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/reader</span><span class="w"> </span><span class="n">csvfile</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">csv-rows</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.data.csv/read-csv</span><span class="w"> </span><span class="n">reader</span><span class="p">)</span><span class="w">
</span><span class="p">[</span><span class="n">header</span><span class="w"> </span><span class="n">typed-rows</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">guess/parse-csv-rows-using-schema</span><span class="w"> </span><span class="n">schema</span><span class="w"> </span><span class="n">csv-rows</span><span class="p">)</span><span class="w">
</span><span class="n">cnt</span><span class="w"> </span><span class="p">(</span><span class="nf">atom</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w">
</span><span class="n">chunk-size</span><span class="w"> </span><span class="mi">1000</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">doseq</span><span class="w"> </span><span class="p">[</span><span class="n">chunk-of-rows</span><span class="w"> </span><span class="p">(</span><span class="nf">partition-all</span><span class="w"> </span><span class="n">chunk-size</span><span class="w"> </span><span class="n">typed-rows</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">line-num</span><span class="w"> </span><span class="p">(</span><span class="nf">swap!</span><span class="w"> </span><span class="n">cnt</span><span class="w"> </span><span class="nb">inc</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Inserted"</span><span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="n">chunk-size</span><span class="w"> </span><span class="p">(</span><span class="nb">inc</span><span class="w"> </span><span class="err">@</span><span class="n">cnt</span><span class="p">))</span><span class="w"> </span><span class="s">"rows"</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/insert-multi!</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="n">chunk-of-rows</span><span class="p">)))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">insert-all-csvs!</span><span class="w">
</span><span class="s">"Loads all the subdirectories of CSVDIR as tables. Optional hashmap MANUAL-OPTIONS
lets you decide how to customize various tables; for example, you may want to set
an optional table."</span><span class="w">
</span><span class="p">[</span><span class="n">db</span><span class="w"> </span><span class="n">csvdir</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">doseq</span><span class="w"> </span><span class="p">[</span><span class="n">dirname</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">f</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nf">.getName</span><span class="w"> </span><span class="o">^</span><span class="n">java.io.File</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">files/list-subdirectories</span><span class="w"> </span><span class="n">csvdir</span><span class="p">))]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">filepath</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="s">"/"</span><span class="w"> </span><span class="p">(</span><span class="nf">table-schema-filename</span><span class="w"> </span><span class="n">dirname</span><span class="p">))</span><span class="w">
</span><span class="n">_</span><span class="w"> </span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="n">filepath</span><span class="p">)</span><span class="w">
</span><span class="n">schema</span><span class="w"> </span><span class="p">(</span><span class="nb">slurp</span><span class="w"> </span><span class="n">filepath</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">when-not</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">schema</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nf">files/list-files-of-type</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="n">csvdir</span><span class="w"> </span><span class="s">"/"</span><span class="w"> </span><span class="n">dirname</span><span class="p">)</span><span class="w"> </span><span class="s">"csv"</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">csvfile</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="p">(</span><span class="nf">format</span><span class="w"> </span><span class="s">"Loading: %s"</span><span class="w"> </span><span class="n">csvfile</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">insert-csv!</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">dirname</span><span class="w"> </span><span class="n">csvfile</span><span class="w"> </span><span class="n">schema</span><span class="p">)))</span><span class="w">
</span><span class="nb">doall</span><span class="p">)))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">-main</span><span class="w">
</span><span class="p">[]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">csvdir</span><span class="w"> </span><span class="p">(</span><span class="nf">System/getenv</span><span class="w"> </span><span class="s">"CSVDIR"</span><span class="p">)</span><span class="w">
</span><span class="n">db</span><span class="w"> </span><span class="n">default-db</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">when-not</span><span class="w"> </span><span class="p">(</span><span class="nf">connection-ok?</span><span class="w"> </span><span class="n">db</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">throw</span><span class="w"> </span><span class="p">(</span><span class="nf">Exception.</span><span class="w"> </span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="s">"Unable to connect to DB:"</span><span class="w"> </span><span class="n">db</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="nf">autodetect-sql-schemas!</span><span class="w"> </span><span class="n">csvdir</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">make-sql-tables!</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">csvdir</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nf">insert-all-csvs!</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="n">csvdir</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">println</span><span class="w"> </span><span class="s">"Done!"</span><span class="p">)))</span><span class="w">
</span></code></pre>
</div>
<p>Nothing left to do but try it out! The final step is to run <code class="highlighter-rouge">(-main)</code> either at the REPL or add <code class="highlighter-rouge">gen-class</code> to the namespace and build an uberjar with <code class="highlighter-rouge">(-main)</code> set as the entry point (in <code class="highlighter-rouge">project.clj</code>), and then launch it with an environment variable that sets the CSVDIR:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>lein uberjar
CSVDIR=/path/to/your/sqlcsv/ java -jar target/csv2sql-0.1.0-SNAPSHOT-standalone.jar
</code></pre>
</div>
<h2 id="conclusion">Conclusion</h2>
<p>The above is probably sufficient for this exercise – this ETL job will populate a database with a few million rows in a few minutes. On my laptop, it ingests about 5000-10000 rows per second, depending on the CSV.</p>
<p>Not bad a couple hundred lines of code, but it could probably still be trimmed/simplified. The above code may be found in the <a href="https://github.com/ivarthorson/csv2sql">csv2sql repo</a> if you want to go further.</p>
<p>Some possible extensions to this would be:</p>
<ol>
<li>
<p>Warn the user if 99.9% of the elements of a column are of one type, but there are a few values that are of a different type.</p>
</li>
<li>
<p>Add support for JSONs, rather than just CSVs. This would probably involve flattening nested JSONs so that <code class="highlighter-rouge"><span class="p">{</span><span class="nt">"a"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nt">"b"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">}}</span></code> would become <code class="highlighter-rouge"><span class="p">{</span><span class="nt">"a.b"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">}</span></code>, and might involve generalizing the loader.</p>
</li>
<li>
<p>If your database supports tens or hundreds of millions of rows, add support for Parquet files, a common tabular data format for big data.</p>
</li>
</ol>
<p>This little program was focused on ETL, and we should probably stop at that. Rather than bolt on a HTTP CRUD API here, it might make more sense to contain that functionality as a separate app.</p>
<h2 id="references">References</h2>
<ol>
<li><a href="http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html">http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html</a></li>
<li><a href="https://docs.timescale.com/v1.2/using-timescaledb/writing-data">https://docs.timescale.com/v1.2/using-timescaledb/writing-data</a></li>
</ol>This is another basic example to help beginners get started in Clojure, which is still my favorite programming language, almost 10 years after I first started using it. Like all languages, Clojure has some warts. But it also has a lot of very powerful and abstract concepts that many average programmers have not seen before, and like most mathematical concepts, the power of such abstractions are greatly underestimated by many people. Even fancy-sounding but easy-to-understand concepts like homoiconicity are still really underappreciated in the programming community.Composite Robot Construction2019-02-26T00:00:00-08:002019-02-26T00:00:00-08:00/blog/composite-robot-construction<p>This article describes the progress of a fiber-reinforced polymer monopod robot that I built over a period of about 30 days. My hope is that seeing some of the successes and failures I encountered will help you build your own composite robots. Foam and fiberglass are very versatile materials that are just as accessible to garage-level workshops as they are to research institutions, as I hope you will see.</p>
<p>If this article interests you, perhaps look at the <a href="../../about/frp-seminar-slides.pdf">FRP tutorial slides I made</a>, or watch the <a href="https://youtu.be/x9cK6zSe-NQ">tutorial video</a>.</p>
<h2 id="project-background">Project Background</h2>
<p>I built a robot for my Ph.D. thesis while working at a research institution in Italy, the <a href="https://iit.it">Istituto Italiano di Technologia</a>. Italy is an absolutely lovely place to live – people are healthy, their eyes usually excude a great understanding of human compassion, the food is reliably delicious, and the weather is amazing. But every culture has it weakpoints, and it is probably fair to say that Italians are not well known for their planning and organization. So maybe it will not surprise you that, like so many other projects in Italy, my thesis robot was constructed in a rush and at the last minute.</p>
<p>Due to some miscommunications, bureaucratic delays, and funding hiccups, I did not even have the parts that I needed until <em>after</em> the end of my thesis research period. By way of background, beginning a Ph.D. in Europe usually implies you to have gotten a Masters degree first, so the durations are shorter than in the USA. In Italy, to prevent the abuse of vulnerable and underpaid Ph.D. students, there is even a legal limit to a Ph.D.’s duration, which is a period of about three to three and a half years, followed by a three to six month period of writing the thesis. In my case, after working for three years on compliant actuation, delays meant that the construction of my robot actually began in the final three-month period allocated for us students to write our theses, so I was in a real hurry to build, code, and write as fast as my fingers could possibly type!</p>
<p>The point of building the robot was to try to design a mechanical structure with mechanical dynamics that would naturally be very close to actual hopping motions, even without any control effort applied. Using electric motors in a hopping robot is difficult, as they have a fairly low power-to-weight ratio. I tried to solve the problem by designing custom actuators with transmissions built to match the simulated mechanical dynamics of a 3-link hopping robot. This meant big springs, and a mechanism for stretching them elastically.</p>
<h2 id="actuator-background">Actuator Background</h2>
<p>Some but not all of the actuator parts arrived in December 2011, all shiny and new:</p>
<p class="center"><img src="2011-12-13_Canon_EOS_REBEL_T1i_IMG_9896_0a51120.jpg" alt="2011-12-13_Canon_EOS_REBEL_T1i_IMG_9896_0a51120" title="The actuator parts laid out on a table, before assembly." /></p>
<p>I’m not going to go into how the actuator works – see my <a href="../../about/thorson-phd-defense.pdf">thesis slides</a> for that. The overview of the idea is that a center gear is driven by a differential, and as the gear rolls around the interior of the ring gear, it pulls on a bar that compresses the large spring. This can store a large amount of energy for a short time, and means the rotor needs to move less during hopping motions, so you can use a less powerful motor when hopping.</p>
<p class="center"><img src="2011-12-13_Canon_EOS_REBEL_T1i_IMG_9975_03176b2.jpg" alt="2011-12-13_Canon_EOS_REBEL_T1i_IMG_9975_03176b2" title="The guts of the mechanism." /></p>
<p class="center"><img src="2011-12-16_Canon_EOS_REBEL_T1i_IMG_09990_fcb25ff.jpg" alt="2011-12-16_Canon_EOS_REBEL_T1i_IMG_09990_fcb25ff" title="The whole motor, fully assembled, tips the scales at 2883g." /></p>
<h2 id="construction-plan">Construction Plan</h2>
<p>Even though I had made another <a href="https://youtu.be/o5AgR_gQ89A">larger, heavier version of this actuator</a> about a year earlier, and since then had reduced the weight of the motor significantly, the last iteration of the actuator still ended up weighing 2883g. I had simulated the robot extensively, and I knew that the whole robot had to weigh less than 10kg, at the very most. The weight of just these two actuators alone unfortunately took up over half of the mass of the robot, and as a result I was desperate to make the structure as light as possible. Although using aluminum construction would probably have been slightly simpler, I decided to go with foam-and-fiberglass construction, since I had used it before on other projects and knew it was robust and strong enough for a structure like this.</p>
<p>The main point of building this robot was to test the actuators, but I also wanted to try a few different composite construction techniques. While I have used nomex honeycomb sheets for simply curved surfaces, and used molds for complex surfaces, I had never tried using hot wire techniques or an expanding urethane foam for complex curved parts. To test all the combinations, I planned to make the robot in four pieces:</p>
<ol>
<li>The body, which I would make from styrofoam and fiberglass, using moldless construction.</li>
<li>The hip joint, which attaches the body to the thigh, I would make from fiberglass and urethane foam, using inner and outer molds.</li>
<li>The thigh, which I would make from carbon fiber and urethane foam, using inner and outer molds.</li>
<li>The shank, which I would make from carbon fiber and urethan foam, using inner and outer molds.</li>
</ol>
<h2 id="day-1-mold-printing">Day 1: Mold-printing</h2>
<p>The first day, I designed and 3D-printed an inner “plug” and outer “mold” on a pair of 3D systems Stratisys printers. The colors are different because the printers had different filiments on them when I printed them.</p>
<p>The “plug” defines the inside shape of the part that I want to make:</p>
<p class="center"><img src="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10597_6f43657.jpg" alt="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10597_6f43657" title="The two pieces of the plug." /></p>
<p>And the “mold” defines the outside shape:</p>
<p class="center"><img src="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10600_e4bd0f0.jpg" alt="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10600_e4bd0f0" title="The mold." /></p>
<p>I ended up mounting the plug (not shown) and the mold on small melanine boards so that I could handle them better.</p>
<p class="center"><img src="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10601_5353810.jpg" alt="2012-01-01_Canon_EOS_REBEL_T1i_IMG_10601_5353810" title="The mold, installed in a board (top side)." /></p>
<h2 id="day-2-hot-wiring">Day 2: Hot-Wiring</h2>
<p>I stole a technique from Burt Rutan’s moldless composite construction books, and used two sheets of metal and an electrically heated nichrome “hot wire” to cut the body of the robot from a block of foam.</p>
<p>Note: DO NOT USE URETHANE FOAM FOR HOTWIRING. Styrofoam fumes are still not good to breathe, but they are much less poisonous than urethane foam smoke.</p>
<p>The first thing to do is to make a couple of sheet metal templates. When cutting out the sheets of metal, it’s a good idea to bolt the two pieces together so they come out identically shaped.</p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10611_313d00b.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10611_313d00b" title="Two sheets of aluminum bolted together." /></p>
<p>The hot-wiring is pretty simple. Start with three blocks of styrofoam…</p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10606_0fc4ddf.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10606_0fc4ddf" title="Three blocks of styrofoam." /></p>
<p>…then glue them together with spray adhesive, stack a heavy weight on top, and wait a few hours.</p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10609_1e2f4b6.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10609_1e2f4b6" title="A heavy weight pressing the styrofoam blocks together." /></p>
<p>Clamp the metal cutouts to the block, and be careful to make sure they are not rotated relative to each other. We used machinist spacers to hold the templates a fixed distance from the tabletop when clamping them on. Likewise, make sure the templates are the correct distance from the front and back of the blocks. Be as precise as possible here, this is an important step.</p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10613_827e473.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10613_827e473" title="Clamp the metal cutouts to the block. (Left side)" /></p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10614_57f03c0.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10614_57f03c0" title="Clamp the metal cutouts to the block. (Right side)" /></p>
<p>Now heat up a nichrome wire with a few amps of current. Depending on the diameter of the wire, more or less current will be needed. I recommend doing some test cuts, and adding some type of tensioning mechanism because the spring will stretch slightly as it heats and you want it to stay very taut. Here’s a short video of our test cuts:</p>
<p>TODO</p>
<p>When you are done, you should be able to lift off the piece you just cut with a hot wire, and leave long spider-web-like strands behind. The strands are easily brushed off with your hand.</p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10639_00936b0.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10639_00936b0" title="Lifting off the recently-cut foam;" /></p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10640_412fbf3.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10640_412fbf3" title="Spiderwebs" /></p>
<p class="center"><img src="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10642_b7ac046.jpg" alt="2012-01-02_Canon_EOS_REBEL_T1i_IMG_10642_b7ac046" title="The finished piece" /></p>
<p>This is one of the few times where getting a good result is as easy as it looks. I don’t recommend any sawing motions or anything – just be smooth and count along with a friend as you go over all of the tick marks around the shape. I see no reason why you couldn’t achieve some pretty interesting lofted shapes, if each of the metal templates were different shapes.</p>
<h2 id="day-5">Day 5:</h2>
<p>The body of the robot now looks like this.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11190_a82bca7.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11190_a82bca7" title="The piece, ready for fiberglassing." /></p>
<p>It’s time to apply fiberglass. The short version of what we are going to do is to lay fiberglass cloth over the top of the surface, spread epoxy through the fabric until it is fully wetted, and wait 8-24 hours for the epoxy to cure.</p>
<p>Cut out the fiberglass cloth using sharp scissors on a clean table. It’s important to use gloves and keep your oily hands off of the fabric – epoxy will not stick to oil. For common thicknesses of fiberglass fabric, you’ll only need a couple of layers to get a reasonably hard surface that can still be cut with a sharp knife. If you put 4-5 layers on, it will be heavier and much more resistant to puncture. If you want to keep it light and strong, cut out enough fabric to put 2 layers everywhere, and then reinforce the edges that are likely to strike the ground with 4-5 layers. This is additive manufacturing, and it’s acceptable to only reinforce only the places you need. If you are going to drill into the fiberglass, you’ll want to build up 10 or even 20 layers, depending on how you are attaching a bolt to it.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11192_0f6dda3.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11192_0f6dda3" title="Fiberglass cloth." /></p>
<p>Weigh out the appropriate amount of resin and hardener. Many formulations these days ask for equal amounts of resin and hardener, but be careful to check if the label is indicating equal units of volume or mass – they can be different!</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11193_3ed4692.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11193_3ed4692" title="Desc." /></p>
<p>Set a timer for 30 minutes, and have a friend handy whose hands are NOT covered in sticky resin. Now wet out the fabric with a disposable paintbrush, work quickly, and stipple all the resin into the fabric.</p>
<p>TODO</p>
<p>In my case, I use a vacuum bag over the part so that atmospheric pressure pushes the fabric tightly against the foam. It also removes excess resin. Keeping the fabrics tight against one another, and removing all excess resin is generally makes slightly lighter and stronger parts, but you can get great results even without the vacuum bag. Note that if you don’t use enough resin, it will be a structural disaster, so tread carefully along that particular weight optimization pathway. As a rule of thumb, it’s better to have too much resin than too little for hobbyist projects.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11191_ee89bd7.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11191_ee89bd7" title="Vacuum pump and vacuum bag." /></p>
<p>Try to finish within the 30 minutes, then pull the vacuum. If you wait too long and the viscosity of the resin is too high, carefully use a hair dryer (or if you feel like living dangerously, a heat gun) to locally increase the temperature of the resin in specific areas to get the resin to flow more easily.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11196_69f122d.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11196_69f122d" title="Desc." /></p>
<p>Vacuum bagging usually has three layers between the fibers and the vacuum bag itself. The innermost is a “peel ply” made of non-stick vinyl, which you use directly on top of the fiberglass when you intend to bond something else to it at a later stage. The peel ply leaves a matte surface on the part, which is good for adhesion of subsequent layers. Then there is the “breather ply”, which is a smooth plastic that has tiny holes in it through which the resin flows. Finally, the next outermost layer is thick cotton or felt “bleeder ply” which absorbs the excess resin, and then the vacuum bag itself. As the vacuum pumps down, you will see a bunch of dots in the bleeder ply as the resin flows through tiny holes in the breather ply.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11197_ded79bc.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11197_ded79bc" title="Desc." /></p>
<p>The whole thing is pretty funny looking at this point:</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11199_605601b.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11199_605601b" title="Desc." /></p>
<p>As I waited for the body to cure under vacuum for for 4-6 hours, and then another 20 before “unbagging” the part, I also started to work on the exterior surface mold for the shank:</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11200_ec10b6a.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11200_ec10b6a" title="Desc." /></p>
<p>The gray paste is automotive bondo, which you should apply on with a minimum thickness needed to fill any gaps, and then sand as smooth as possible. Bondo dries fairly quickly and can help you fill gaps in the 3D printing texture that would otherwise show up on your molded part.</p>
<p class="center"><img src="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11202_d2741e9.jpg" alt="2012-01-05_Canon_EOS_REBEL_T1i_IMG_11202_d2741e9" title="Desc." /></p>
<p>The last step (not shown here, but you’ll see soon) is to paint the mold with a glossy black color.</p>
<h2 id="day-6">Day 6:</h2>
<p>This is what the body part looks like after unbagging, removing the breather ply and bleeder ply, and just having the peel ply:</p>
<p class="center"><img src="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11203_7b36269.jpg" alt="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11203_7b36269" title="Desc." /></p>
<p>As you remove the peel ply, you may see blue sparks and static electricity. This is normal, but I can’t explain why.</p>
<p>After removing the peel ply and trimming the fiberglass with a sharp knife, I had this:</p>
<p class="center"><img src="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11205_32a2c5f.jpg" alt="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11205_32a2c5f" title="Desc." /></p>
<p>Now it’s time to do the same trick again with the hot wire, and remove the interior foam of the body part…</p>
<p class="center"><img src="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11212_dc29dce.jpg" alt="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11212_dc29dce" title="Desc." /></p>
<p>Tada!</p>
<p class="center"><img src="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11215_bfadb38.jpg" alt="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11215_bfadb38" title="Desc." /></p>
<p>The weight is pretty good considering the size of the part. If I had it to do over again, I would have gone twice as thin with the foam, had I wished. Probably 1cm would have been sufficient instead of the 2cm that I left here.</p>
<p class="center"><img src="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11217_890f1d8.jpg" alt="2012-01-06_Canon_EOS_REBEL_T1i_IMG_11217_890f1d8" title="Desc." /></p>
<p>At this point, I also glued on a back side to the foam box that is being made:</p>
<p class="center"><img src="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11226_d0d2c09.jpg" alt="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11226_d0d2c09" title="Desc." /></p>
<p>After the glue dried, I rounded the corner with some sandpaper. Foam is a real easy thing to work with.</p>
<p class="center"><img src="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11223_5583143.jpg" alt="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11223_5583143" title="Desc." /></p>
<h2 id="day-7">Day 7:</h2>
<p>I don’t have as many photos as I should for this day, but here I’m assembling the the 3D printed mold for the thigh.</p>
<p class="center"><img src="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11220_aba4b6b.jpg" alt="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11220_aba4b6b" title="Desc." /></p>
<p>Once again, I covered the inside of the mold with bondo. In this case, I used a yellow bondo rather than gray, for no particular reason. In this photo, it is not sanded yet.</p>
<p class="center"><img src="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11221_c01af7a.jpg" alt="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11221_c01af7a" title="Desc." /></p>
<p>Most of the structures in this robot are open boxes, so I made the “box lids” out of foam, and sandwiched them between two sheets of fiberglass on a sheet of glass, and then pumped a vacuum on them to make very flat, lightweight pieces. Here’s what the foam looks like, but I don’t have any photos of the parts themselves:</p>
<p class="center"><img src="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11224_e5b01d1.jpg" alt="2012-01-07_Canon_EOS_REBEL_T1i_IMG_11224_e5b01d1" title="Desc." /></p>
<p>Not shown today, I also did the vacuum bagged layup of the shank, which we will see tomorrow.</p>
<p>Very importantly, I probably spent most of the day waxing, waxing, and re-waxing the painted black surfaces of the mold. Some people take this to a religious level and do it a half dozen times or more, and it is indeed religiously important for big molds that have never been used before. But for small molds, three times is usually enough for me. Epoxy will stick to any poorly waxed thing like the world’s strongest leech, and your part will be ruined.</p>
<h2 id="day-8">Day 8:</h2>
<p>We now can see the first carbon fiber part to pop out of the mold. The molded surface will have a fairly shiny and smooth texture to it where it contacted the mold.</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11235_a21fded.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11235_a21fded" title="Desc." /></p>
<p>While releasing the shank’s outer surface from the mold, I broke the mold. Good thing I’m only making one of these! Maybe I should have waxed more than 3x? Also note how I painted it black before waxing to better see bumps, imperfections, and the wax.</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11227_9577d6e.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11227_9577d6e" title="Desc." /></p>
<p>You can see the texture of the carbon fiber weave left on the mold, as it was pushed into the mold by the fiberglass.</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11228_2cead35.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11228_2cead35" title="Desc." /></p>
<p>The next thing is to trim the edge with a sharp knife, and wipe off any wax on the surface. Because this is a two-mold part, all wax needs to be removed so that another layer of carbon fiber can bond with the outside surface around the parting line.</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11238_fb029c5.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11238_fb029c5" title="Desc." /></p>
<p>The weight of the outer layer is really pretty light. Note the matte texture of the inside – a result of the peel ply.</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11243_01fc096.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11243_01fc096" title="Desc." /></p>
<p>The next thing to do will be to fill the gap between the outside surface and inside surface with expanding urethane foam. The next few frames show how the outer skin will fit over the plug.</p>
<p class="center center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11248_052eb49.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11248_052eb49" title="Desc." /></p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11249_8219042.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11249_8219042" title="Desc." /></p>
<p><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11250_975f786.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11250_975f786" title="Desc." /></p>
<p>Time to fill the inside with polyurethane foam… (I will regret this tomorrow!)</p>
<p class="center center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11252_9038b7e.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11252_9038b7e" title="Desc." /></p>
<p><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11254_b14bc7d.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11254_b14bc7d" title="Desc." /></p>
<p>I spent the rest of the day painting the interior of the thigh mold, and waxing a sheet of glass for use with the covering plates:</p>
<p class="center center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11256_b333c8c.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11256_b333c8c" title="Desc." /></p>
<p><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11257_82e4ffe.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11257_82e4ffe" title="Desc." /></p>
<p>I also did the layup of the fiberglass layer on the outside of the body box:</p>
<p class="center"><img src="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11245_0430ebc.jpg" alt="2012-01-08_Canon_EOS_REBEL_T1i_IMG_11245_0430ebc" title="Desc." /></p>
<h2 id="day-9">Day 9:</h2>
<p>The day dawned bright and exciting. Look at this great composite part, with a foam core all perfectly shaped.</p>
<p class="center"><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11260_fc92bf7.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11260_fc92bf7" title="Desc." /></p>
<p>Oh no! The urethane foam didn’t cure! I guess in my excitement I forgot that a 1-part expanding foam relies on oxygen to cure, so it won’t dry out in an enclosed space like this mold.</p>
<p class="center center"><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11261_9dc62ca.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11261_9dc62ca" title="Desc." /></p>
<p><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11262_55ad3f6.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11262_55ad3f6" title="Desc." /></p>
<p>Discouraged by that dumb mistake, I ordered some different urethane foam, and spent the rest of the day waxing the thigh mold:</p>
<p class="center"><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11265_36033dc.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11265_36033dc" title="Desc." /></p>
<p>On the bright side, the body part fiberglass is coming along well.</p>
<p class="center center"><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11270_100195d.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11270_100195d" title="Desc." /></p>
<p><img src="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11272_f6ef1f7.jpg" alt="2012-01-09_Canon_EOS_REBEL_T1i_IMG_11272_f6ef1f7" title="Desc." /></p>
<h2 id="day-10">Day 10:</h2>
<p>Time for some carbon fiber! Carbon fiber is easier to cut than fiberglass because it’s stiffer and more brittle. I have it here between two sheets of plastic to keep it clean and dust-free.</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_1285_423078a.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_1285_423078a" title="Desc." /></p>
<p>The carbon fiber layup of the thigh looks a little weird before it is de-molded, though.</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11273_d33472e.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11273_d33472e" title="Desc." /></p>
<p>After trimming, it is looking better, but the outside texture is pretty dry, like I didn’t use as much resin as I should have. There should be a smoother texture on the mold-side of the carbon fiber. The only way to fix this is to go back and apply more resin to the outside with a paintbrush, and then sand it smooth later. It’s much more time-consuming than if I had simply used the right amount of resin the first time.</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11275_aa30acb.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11275_aa30acb" title="Desc." /></p>
<p>The robot is starting to take shape!</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11283_0c84a30.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11283_0c84a30" title="Desc." /></p>
<p>There’s one more part we have to make today: the hip-joint attachment for the body. I haven’t shown photos of this part yet, but it’s made the same way as everything else thus far.</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11287_9586f77.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11287_9586f77" title="Desc." /></p>
<p>I also did some trimming of the body of the robot, to cut the foam away from the edge fo the fiberglass and make a “lip” internally that the box lid will fit into.</p>
<p class="center"><img src="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11292_0f34dfc.jpg" alt="2012-01-10_Canon_EOS_REBEL_T1i_IMG_11292_0f34dfc" title="Desc." /></p>
<h2 id="day-12">Day 12:</h2>
<p>The hip joint attachement demolded very well, but there surface texture isn’t very good yet.</p>
<p class="center"><img src="2012-01-12_Canon_EOS_REBEL_T1i_IMG_11295_d9eca45.jpg" alt="2012-01-12_Canon_EOS_REBEL_T1i_IMG_11295_d9eca45" title="Desc." /></p>
<p>Adding that to the robot now gives it a more or less complete on the outside surface:</p>
<p class="center"><img src="2012-01-12_Canon_EOS_REBEL_T1i_IMG_11296_ce333f0.jpg" alt="2012-01-12_Canon_EOS_REBEL_T1i_IMG_11296_ce333f0" title="Desc." /></p>
<h2 id="day-14">Day 14:</h2>
<p>Let’s try applying urethane foam again. Here I’ve got some two-part expanding foam. Because it’s two-part epoxy, it can cure without any oxygen, and is suitable for use in a mold cavity.</p>
<p class="center"><img src="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11390_d9ae10d.jpg" alt="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11390_d9ae10d" title="Desc." /></p>
<p>Even so, a test is probably a wise idea….</p>
<p class="center"><img src="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11391_5884894.jpg" alt="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11391_5884894" title="Desc." /></p>
<p>Since the test went well, jumped in with both feet and put the shank on its plug:</p>
<p class="center"><img src="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11399_dd97d80.jpg" alt="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11399_dd97d80" title="Desc." /></p>
<p>And likewise for the hip-attachement piece (not shown) and the thigh:</p>
<p class="center"><img src="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11401_7eb154e.jpg" alt="2012-01-14_Canon_EOS_REBEL_T1i_IMG_11401_7eb154e" title="Desc." /></p>
<h2 id="day-15">Day 15:</h2>
<p>This time the foam cured perfectly, and I was able to gently wiggle the foam off of plug.</p>
<p class="center"><img src="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11404_fecc58c.jpg" alt="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11404_fecc58c" title="Desc." /></p>
<p class="center"><img src="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11406_fa164a0.jpg" alt="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11406_fa164a0" title="Desc." /></p>
<p>Adding the foam only added 108-72g = 36g of mass, yet made the part already much stiffer. When we add the inside layer of fiberglass to the foam, it is going to become incredibly stiff. Let’s just imagine that I did that…</p>
<p class="center"><img src="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11416_cc208d1.jpg" alt="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11416_cc208d1" title="Desc." /></p>
<p>Although it was only a single ply of carbon fiber, unlike the two plies on the outside surface, the inside layer added another 175g - 108g = 67g. This is probably mostly due to the weight of the resin, since I did not use a vacuum bag, and the 5-layer thick reinforcements I put on the large end for a metal piece that will be bolted on.</p>
<p>A little more bad news today as well. Apparently, I got sloppy and didn’t wax the hip attachement plug enough.</p>
<p class="center"><img src="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11407_fb8a61f.jpg" alt="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11407_fb8a61f" title="Desc." /></p>
<p class="center"><img src="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11414_3d50fce.jpg" alt="2012-01-15_Canon_EOS_REBEL_T1i_IMG_11414_3d50fce" title="Desc." /></p>
<h2 id="days-16-and-17">Days 16 and 17:</h2>
<p>Try, try again. At least this time I got the hip attachment foam to de-mold safely.</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11426_2e0b12a.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11426_2e0b12a" title="Desc." /></p>
<p>I applied some fiberglass cloth to the inside of this piece; fiberglass is pretty transparent when properly wetted out, so after curing, you can hardly see it:</p>
<p class="center"><img src="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11438_a8bf3c7.jpg" alt="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11438_a8bf3c7" title="Desc." /></p>
<p>I also added fiberglass cloth to the inside of the body box. Starting mass was 595g…</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11424_85baad6.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11424_85baad6" title="Desc." /></p>
<p>Adding the inside ply brought the mass up to 710g:</p>
<p class="center"><img src="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11443_40659be.jpg" alt="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11443_40659be" title="Desc." /></p>
<p>To back up a bit, when curing, I used the completed body lid, which weighs 134g…</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11425_e7eab05.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11425_e7eab05" title="Desc." /></p>
<p>To make sure the piece would fit perfectly, I sandwiched the lid between two sheets of non-stick plastic and put weights on top while it cured.</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11427_59e2891.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11427_59e2891" title="Desc." /></p>
<p>The hip attachement join needs a metal insert on it, so I added some extra fiberglass under it, and put the metal itself in the vacuum bag so it would be pushed in tight and fit perfectly.</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11433_2afe8be.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11433_2afe8be" title="Desc." /></p>
<p>The last thing to do today is to apply a layer of carbon fiber to the inside of the thigh. This part was big enough I could use a vacuum bag.</p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11418_c2b5aa2.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11418_c2b5aa2" title="Desc." /></p>
<p class="center"><img src="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11429_3db0475.jpg" alt="2012-01-16_Canon_EOS_REBEL_T1i_IMG_11429_3db0475" title="Desc." /></p>
<h2 id="day-18">Day 18</h2>
<p>Things are really coming together! The nearly-finished thigh part weighs 463g now, and is incredibly rigid.</p>
<p class="center"><img src="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11434_91162b2.jpg" alt="2012-01-17_Canon_EOS_REBEL_T1i_IMG_11434_91162b2" title="Desc." /></p>
<p>After a bit of sanding on the sharpe edges, and screwing on the metal mounting pieces onto the shank and hip attachment points, both of those pieces also look good.</p>
<p class="center"><img src="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11445_39b722f.jpg" alt="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11445_39b722f" title="Desc." /></p>
<p class="center"><img src="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11449_d0e71c4.jpg" alt="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11449_d0e71c4" title="Desc." /></p>
<p>The motors fit beautifully in the thigh piece.</p>
<p class="center"><img src="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11447_bbb94d2.jpg" alt="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11447_bbb94d2" title="Desc." /></p>
<p>And after a little trimming, the body box lid also fits well.</p>
<p class="center"><img src="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11450_9727db0.jpg" alt="2012-01-18_Canon_EOS_REBEL_T1i_IMG_11450_9727db0" title="Desc." /></p>
<h2 id="day-20">Day 20</h2>
<p>I added a touch-up coat of resin to the shank, and it looks good!</p>
<p class="center"><img src="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11457_d4701cf.jpg" alt="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11457_d4701cf" title="Desc." /></p>
<p>Similarly, a little extra epoxy made the thigh look good as well.</p>
<p class="center"><img src="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11451_d61edfd.jpg" alt="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11451_d61edfd" title="Desc." /></p>
<p>Because of the stress concentration that occurs where the metal bolts to the aluminum, I actually sandwiched the lip of the between two aluminum pieces:</p>
<p class="center"><img src="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11453_d43f5b7.jpg" alt="2012-01-20_Canon_EOS_REBEL_T1i_IMG_11453_d43f5b7" title="Desc." /></p>
<p>(Not shown, but today I did the last fiberglassing to connect the hip attachment joint to the body. You’ll see the results tomorrow.)</p>
<h2 id="day-22">Day 22</h2>
<p>Time for some glamor photos of the pieces thus far:</p>
<p class="center center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11459_1864a14.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11459_1864a14" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11460_621ccef.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11460_621ccef" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11462_3d0da06.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11462_3d0da06" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11470_2fc999a.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11470_2fc999a" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11473_32dc801.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11473_32dc801" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11474_8387840.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11474_8387840" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11475_ee1eeae.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11475_ee1eeae" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11476_9bc4044.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11476_9bc4044" title="Desc." /></p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11477_e32c4dd.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11477_e32c4dd" title="Desc." /></p>
<p><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11479_4104aa3.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11479_4104aa3" title="Desc." /></p>
<p>And of course, the brush that made it all possible:</p>
<p class="center"><img src="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11466_ee6f2e0.jpg" alt="2012-01-22_Canon_EOS_REBEL_T1i_IMG_11466_ee6f2e0" title="Desc." /></p>
<h2 id="day-23">Day 23</h2>
<p>Today was just assembly, mounting motors, running cables, and other fit and finish stuff.</p>
<p class="center center"><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11486_0197df7.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11486_0197df7" title="Desc." /></p>
<p class="center"><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11487_11e4a6f.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11487_11e4a6f" title="Desc." /></p>
<p class="center"><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11489_e50868a.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11489_e50868a" title="Desc." /></p>
<p class="center"><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11492_b09c36a.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11492_b09c36a" title="Desc." /></p>
<p class="center"><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11500_3ca568e.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11500_3ca568e" title="Desc." /></p>
<p><img src="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11501_c229f73.jpg" alt="2012-01-23_Canon_EOS_REBEL_T1i_IMG_11501_c229f73" title="Desc." /></p>
<h2 id="day-25">Day 25</h2>
<p>Another day of assembly.</p>
<p class="center center"><img src="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11503_2c4ddaf.jpg" alt="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11503_2c4ddaf" title="Desc." /></p>
<p><img src="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11504_1eb91f4.jpg" alt="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11504_1eb91f4" title="Desc." /></p>
<p>I added an emergency stop button to the top, so you can stop it quickly.</p>
<p class="center"><img src="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11507_6693d08.jpg" alt="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11507_6693d08" title="Desc." /></p>
<p class="center">And a power port on the back:</p>
<p><img src="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11508_7c06947.jpg" alt="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11508_7c06947" title="Desc." /></p>
<p>Cables are super important but easy to forget. I ran them axially out the shafts so that they could twist with minimal stress.</p>
<p class="center"><img src="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11510_6193025.jpg" alt="2012-01-25_Canon_EOS_REBEL_T1i_IMG_11510_6193025" title="Desc." /></p>
<h2 id="day-27">Day 27</h2>
<p>The robot is mostly assembled at this point. I added some rubber scuff strips to protect the edges because I expect the robot to fall quite a lot before I get the hopping algorithm dialed in.</p>
<p class="center"><img src="2012-01-27_Canon_EOS_REBEL_T1i_IMG_11520_aecf5d7.jpg" alt="2012-01-27_Canon_EOS_REBEL_T1i_IMG_11520_aecf5d7" title="Desc." /></p>
<p>Final weight, sans batteries, is 8.3kg. Heavier than I would like, but it should still be technically possible.</p>
<p class="center"><img src="2012-01-27_Canon_EOS_REBEL_T1i_IMG_11527_7c8c47f.jpg" alt="2012-01-27_Canon_EOS_REBEL_T1i_IMG_11527_7c8c47f" title="Desc." /></p>
<h2 id="day-28">Day 28</h2>
<p>No actual robot construction today. Instead, I built a moveable backboard out of some lumber, drew 10cm squares on it, and positioned the robot over the treadmill it runs on.</p>
<p class="center center"><img src="2012-01-28_Canon_EOS_REBEL_T1i_IMG_11533_d60c986.jpg" alt="2012-01-28_Canon_EOS_REBEL_T1i_IMG_11533_d60c986" title="Desc." /></p>
<p><img src="2012-01-28_Canon_EOS_REBEL_T1i_IMG_11537_e8d2a00.jpg" alt="2012-01-28_Canon_EOS_REBEL_T1i_IMG_11537_e8d2a00" title="Desc." /></p>
<h2 id="day-65">Day 65</h2>
<p>At this point, I had spent a whole month writing software, trying to tune the balance algorithm I had used in simulation, reverse-engineering the UDP packet structure of the Vicon camera system, and discovering bugs in the motor electronics control software. You can see the Vicon camera marker reflectors that I used to orient the robot in 3D. I waaaaay underestimated the amount of time it would take to tune the control systems and account for all of the communication lag. I also way underestimated the number of sources of friction and inefficiency; I was getting only about half of the toqrue that I expected from the motors.</p>
<p class="center"><img src="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12036_beb9e28.jpg" alt="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12036_beb9e28" title="Desc." /></p>
<p>During testing, I added a big foam “heel” to the robot, since that place was striking the ground occasionally with a lot more force than I intended.</p>
<p class="center"><img src="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12037_f727d37.jpg" alt="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12037_f727d37" title="Desc." /></p>
<p>The scuff pads have also gotten some wear:</p>
<p class="center"><img src="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12066_42d5ebb.jpg" alt="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12066_42d5ebb" title="Desc." /></p>
<p>As has the rubber sports ball I used as the foot tip:</p>
<p class="center"><img src="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12068_63fc50a.jpg" alt="2012-03-10_Canon_EOS_REBEL_T1i_IMG_12068_63fc50a" title="Desc." /></p>
<p>Other things learned during testing is that, unsurprisingly, a foam and fiberglass box is a great heat insulator. I added some small fans to pull air into the body and thigh to keep the motors and electronics cool.</p>
<p class="center"><img src="2012-04-20_Canon_EOS_REBEL_T1i_IMG_12079_d18591f.jpg" alt="2012-04-20_Canon_EOS_REBEL_T1i_IMG_12079_d18591f" title="Desc." /></p>
<p class="center"><img src="2012-04-20_Canon_EOS_REBEL_T1i_IMG_12082_3bcf8a9.jpg" alt="2012-04-20_Canon_EOS_REBEL_T1i_IMG_12082_3bcf8a9" title="Desc." /></p>
<p>The robot usually looked like this in operation. I added some bungee cables at the top to make the falls less hard.</p>
<p class="center"><img src="2012-04-22_Canon_EOS_REBEL_T1i_IMG_12087_8dd1f37.jpg" alt="2012-04-22_Canon_EOS_REBEL_T1i_IMG_12087_8dd1f37" title="Desc." /></p>
<h2 id="conclusion">Conclusion:</h2>
<p>To my great disappointment, it didn’t work! Even after all of this work making a light structure, and as much tuning as I could do, the robot was still too heavy to jump properly. Given the 111W motors, had I more properly accounted for the motor inefficiency and backboard friction, a more realistic estimate on a reasonable jumping weight would have been 4-6kg instead of 8-10kg. Thankfully, I had enough work with the compliant actuators and other work that I was able to complete my Ph.D. even without the robot hopping. And the spring mechanism was a great success, even if the hopping robot itself was not.</p>
<p class="center"><img src="2012-04-26_Canon_EOS_450D_IMG_0040_35da09c.jpg" alt="2012-04-26_Canon_EOS_450D_IMG_0040_35da09c" title="Desc." /></p>
<p>In closing, I would like to thank the Genovese for having shared their beautiful city with me for those years, and thank whatever Italian spirit of energy and passion kept me motivated during this fun and unusual project. I’m still just thankful that I had the experience.</p>
<p class="center"><img src="2012-08-24_Canon_PowerShot_SD1200_IS_IMG_3377_4624bd8.jpg" alt="2012-08-24_Canon_PowerShot_SD1200_IS_IMG_3377_4624bd8" title="Desc." /></p>
<h2 id="addendum-tutorial-photos">Addendum: Tutorial Photos</h2>
<p>Here are some other photos that you may find helpful.</p>
<p>Let’s start with the actual process of doing a fiberglass lay-up. When putting fiberglass onto a mold (or glass in this case), I find it helpful to paint the mold itself with epoxy to ensure there are no air bubbles trapped under the fabric.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_wet-resin-on-glass_e09103a.jpg" alt="2012-02-12_FinePix_S5700_S700_wet-resin-on-glass_e09103a" title="Desc." /></p>
<p>As you wet out the fabric (using the tip of the brush to “stipple” as needed), you will see the fiberglass become clear.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_wet-brush-second-layer_057e8d6.jpg" alt="2012-02-12_FinePix_S5700_S700_wet-brush-second-layer_057e8d6" title="Desc." /></p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_wet-brush-out-fabric_ee6b9c8.jpg" alt="2012-02-12_FinePix_S5700_S700_wet-brush-out-fabric_ee6b9c8" title="Desc." /></p>
<p>During a layup, it’s a good idea to have various things handy. A marker, lots of cups, a squeege, measuring syringes, disposable paintbrushes, and roller tools to help squeeze out excess resin. But the most essential tools are plastic gloves and tons and tons of paper towels to help clean up, and a timer to let you know how many minutes until your resin gels.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_wet-layup-supplies_7dd1f7f.jpg" alt="2012-02-12_FinePix_S5700_S700_wet-layup-supplies_7dd1f7f" title="Desc." /></p>
<p>It’s usually a simple matter to peel 1-3 layers of carbon fiber off of a glass plate or mold. However, if you use more than 3 layers, or if the part is curved in a complicated way, the part will be more rigid and more difficult to remove if you have not waxed the glass properly.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_wet-peel-off-carbon_32af84f.jpg" alt="2012-02-12_FinePix_S5700_S700_wet-peel-off-carbon_32af84f" title="Desc." /></p>
<p>Some people just wonder what various fabrics look like when cured. From left to right, some samples of fiberglass, kevlar, and carbon fiber, in thickenesses of 1, 2, and 3 plys of fabric.</p>
<p class="center"><img src="2012-02-07_FinePix_S5700_S700_trimmed-samples_04adea4.jpg" alt="2012-02-07_FinePix_S5700_S700_trimmed-samples_04adea4" title="Desc." /></p>
<p>Other people ask about core materials. As an experiment, I made a test piece of composite on a sheet of glass, and put three different cores on it. From left to right, nomex honeycomb, pvc foam, and styrofoam. For thick cores like this, it’s important to bevel the edges as I did. Note that the layer on the glass is already cured. It’s also important to note that you should add tiny pinholes through the foam, every 2cm or so, to help resin trapped under the foam cores flow through it as the vacuum pressure squeezes out any excess resin underneath the core material.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_cores-three-types_a74a5f8.jpg" alt="2012-02-12_FinePix_S5700_S700_cores-three-types_a74a5f8" title="Desc." /></p>
<p>After applying another layer of carbon fiber and using a vacuum bag, you get this result:</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_cores-with-backside_0df68b3.jpg" alt="2012-02-12_FinePix_S5700_S700_cores-with-backside_0df68b3" title="Desc." /></p>
<p>After pulling it off the glass and flipping it over, the test piece looks like this:</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_cores-frontside_ca4066b.jpg" alt="2012-02-12_FinePix_S5700_S700_cores-frontside_ca4066b" title="Desc." /></p>
<p>Another trick you can use that is sometimes helpful is to pour resin over fabric, between two sheets of plastic. Then use a squeegee to wet out the fabric, and you have “homemade prepreg” (fabric with pre-impregnated resin) that will last for 30 minutes or so. Use some scissors to cut out a little piece of it and directly apply it to those hard-to-reach areas.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_poor-mans-prepreg_b5d40ad.jpg" alt="2012-02-12_FinePix_S5700_S700_poor-mans-prepreg_b5d40ad" title="Desc." /></p>
<p>An example of a vacuum bag setup. The gummy, clay-like vacuum bag tape is expensiev, but well worth it for removing leaks. Try at all costs to keep resin out of your vacuum hose.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_vacuum-bag_ce76fa0.jpg" alt="2012-02-12_FinePix_S5700_S700_vacuum-bag_ce76fa0" title="Desc." /></p>
<p>A close-up of the vacuum pump I used.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_vacuum-pump_06c25f5.jpg" alt="2012-02-12_FinePix_S5700_S700_vacuum-pump_06c25f5" title="Desc." /></p>
<p>Here’s a sample piece of the Nomex honeycomb core after being cut out. It makes a really, really light and stiff part. This thing is just 1 layer of carbon and some paper honexcomb, and it’s just barely breakable with your hands. If the edges were sealed (leaving the nomex exposed to the air is a big no-no) as would be common in a non-sample part, it would be even harder to break or crush the core.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_weighing-honeycomb_0df929e.jpg" alt="2012-02-12_FinePix_S5700_S700_weighing-honeycomb_0df929e" title="Desc." /></p>
<p>Kevlar is surprisingly resin-hungry, and absorbs a lot of epoxy in my experience. Even just a single layer of kevlar weighs as much as two layers of carbon fiber over nomex honeycomb. If you can stand the extra weight, though, xbKevlar is decidedly “tougher” and less brittle.</p>
<p class="center"><img src="2012-02-12_FinePix_S5700_S700_weighing-kevlar_f3ad879.jpg" alt="2012-02-12_FinePix_S5700_S700_weighing-kevlar_f3ad879" title="Desc." /></p>
<p>If this article interested you, perhaps look at the <a href="../../about/frp-seminar-slides.pdf">FRP tutorial slides I made</a>, or watch the <a href="https://youtu.be/x9cK6zSe-NQ">tutorial video</a>.</p>
<iframe width="708" height="398" src="https://www.youtube.com/embed/x9cK6zSe-NQ" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>This article describes the progress of a fiber-reinforced polymer monopod robot that I built over a period of about 30 days. My hope is that seeing some of the successes and failures I encountered will help you build your own composite robots. Foam and fiberglass are very versatile materials that are just as accessible to garage-level workshops as they are to research institutions, as I hope you will see.Loading CSVs in Clojure2019-02-16T00:00:00-08:002019-02-16T00:00:00-08:00/blog/loading-csvs-in-clojure<p>Loading CSVs in Clojure is really easy using <code class="highlighter-rouge">clojure.data.csv</code>. Depending on what you will do with the data, you can either represent it in tabular form (as a vector of vectors), or as a list of hashmaps. Both approaches have their advantages and disadvantages, and here are some very stock functions for how to achieve that:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">net.roboloco.csvs</span><span class="w">
</span><span class="s">"Functions for loading and saving CSVs."</span><span class="w">
</span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">clojure.data.csv</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">csv</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">empty-string-to-nil</span><span class="w">
</span><span class="s">"Returns a nil if given an empty string S, otherwise returns S."</span><span class="w">
</span><span class="p">[</span><span class="n">s</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">and</span><span class="w"> </span><span class="p">(</span><span class="nb">string?</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nf">empty?</span><span class="w"> </span><span class="n">s</span><span class="p">))</span><span class="w">
</span><span class="n">nil</span><span class="w">
</span><span class="n">s</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">dissoc-nils</span><span class="w">
</span><span class="s">"Drops keys with nil values, or nil keys, from the hashmap H."</span><span class="w">
</span><span class="p">[</span><span class="n">h</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">into</span><span class="w"> </span><span class="p">{}</span><span class="w"> </span><span class="p">(</span><span class="nb">filter</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[[</span><span class="n">k</span><span class="w"> </span><span class="n">v</span><span class="p">]]</span><span class="w"> </span><span class="p">(</span><span class="nb">and</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="n">k</span><span class="p">))</span><span class="w"> </span><span class="n">h</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">load-csv</span><span class="w">
</span><span class="s">"Returns a data structure loaded from a CSV file at FILEPATH."</span><span class="w">
</span><span class="p">[</span><span class="n">filepath</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">with-open</span><span class="w"> </span><span class="p">[</span><span class="n">reader</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/reader</span><span class="w"> </span><span class="n">filepath</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nf">->></span><span class="w"> </span><span class="p">(</span><span class="nf">csv/read-csv</span><span class="w"> </span><span class="n">reader</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="k">fn</span><span class="w"> </span><span class="p">[</span><span class="n">row</span><span class="p">]</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="n">empty-string-to-nil</span><span class="w"> </span><span class="n">row</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="nb">doall</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">save-csv</span><span class="w">
</span><span class="s">"Saves a vector of vectors DATA (i.e. a CSV) to disk at FILEPATH. "</span><span class="w">
</span><span class="p">[</span><span class="n">vec-of-vecs</span><span class="w"> </span><span class="n">filepath</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">with-open</span><span class="w"> </span><span class="p">[</span><span class="n">writer</span><span class="w"> </span><span class="p">(</span><span class="nf">clojure.java.io/writer</span><span class="w"> </span><span class="n">filepath</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nf">csv/write-csv</span><span class="w"> </span><span class="n">writer</span><span class="w"> </span><span class="n">vec-of-vecs</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">tabular->maps</span><span class="w">
</span><span class="s">"Converts a vector of vectors into a vector of maps. Assumes that the
first row of the CSV is a header that contains column names."</span><span class="w">
</span><span class="p">[</span><span class="n">tabular</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">header</span><span class="w"> </span><span class="p">(</span><span class="nb">first</span><span class="w"> </span><span class="n">tabular</span><span class="p">)]</span><span class="w">
</span><span class="p">(</span><span class="nb">-></span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="nb">zipmap</span><span class="w"> </span><span class="p">(</span><span class="nb">repeat</span><span class="w"> </span><span class="n">header</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="n">tabular</span><span class="p">))</span><span class="w">
</span><span class="p">(</span><span class="nf">mapv</span><span class="w"> </span><span class="n">dissoc-nils</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="k">defn</span><span class="w"> </span><span class="n">maps->tabular</span><span class="w">
</span><span class="s">"Converts a vector of vectors into a vector of maps."</span><span class="w">
</span><span class="p">[</span><span class="n">rowmaps</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">[</span><span class="n">columns</span><span class="w"> </span><span class="p">(</span><span class="nf">vec</span><span class="w"> </span><span class="p">(</span><span class="nb">sort</span><span class="w"> </span><span class="p">(</span><span class="nb">into</span><span class="w"> </span><span class="o">#</span><span class="p">{}</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="nb">name</span><span class="w"> </span><span class="p">(</span><span class="nf">flatten</span><span class="w"> </span><span class="p">(</span><span class="nb">map</span><span class="w"> </span><span class="nb">keys</span><span class="w"> </span><span class="n">rowmaps</span><span class="p">))))))]</span><span class="w">
</span><span class="p">(</span><span class="nf">vec</span><span class="w"> </span><span class="p">(</span><span class="nb">conj</span><span class="w"> </span><span class="p">(</span><span class="k">for</span><span class="w"> </span><span class="p">[</span><span class="n">row</span><span class="w"> </span><span class="n">rowmaps</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nf">vec</span><span class="w"> </span><span class="p">(</span><span class="k">for</span><span class="w"> </span><span class="p">[</span><span class="n">col</span><span class="w"> </span><span class="n">columns</span><span class="p">]</span><span class="w">
</span><span class="p">(</span><span class="nb">str</span><span class="w"> </span><span class="p">(</span><span class="nb">get</span><span class="w"> </span><span class="n">row</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="s">""</span><span class="p">)))))</span><span class="w">
</span><span class="n">columns</span><span class="p">))))</span><span class="w">
</span><span class="p">(</span><span class="nb">comment</span><span class="w">
</span><span class="p">(</span><span class="k">def</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="p">(</span><span class="nf">tabular->maps</span><span class="w"> </span><span class="p">(</span><span class="nf">load-csv</span><span class="w"> </span><span class="s">"/path/to/mycsv.csv"</span><span class="p">)))</span><span class="w">
</span><span class="p">(</span><span class="nf">save-csv!</span><span class="w"> </span><span class="p">(</span><span class="nf">maps->tabular</span><span class="w"> </span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="s">"/some/other/path.csv"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre>
</div>
<p>Note that the above functions are not lazy, which is a good choice for CSVs 90% of the time. If you find yourself working with very large datasets that cannot be loaded all at once, you might want to adjust <code class="highlighter-rouge">load-csv</code>, <code class="highlighter-rouge">tabular->maps</code>, and <code class="highlighter-rouge">maps->tabular</code> to work lazily and incrementally.</p>Loading CSVs in Clojure is really easy using clojure.data.csv. Depending on what you will do with the data, you can either represent it in tabular form (as a vector of vectors), or as a list of hashmaps. Both approaches have their advantages and disadvantages, and here are some very stock functions for how to achieve that:A Localhost Swagger Editor2019-02-14T00:00:00-08:002019-02-14T00:00:00-08:00/blog/localhost-swagger-editor<p>I’ve always edited swagger docs using the online editor, but thanks to docker, you may run it locally with equal ease:</p>
<p>From <a href="https://github.com/swagger-api/swagger-editor">https://github.com/swagger-api/swagger-editor</a>:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker pull swaggerapi/swagger-editor
docker run -d -p 80:8080 swaggerapi/swagger-editor
</code></pre>
</div>
<p>Then browse to port 80 on localhost. If you want more than one tab to work in, launch a few more containers on ports 81, 82, and 83.</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker run -d -p 81:8080 swaggerapi/swagger-editor
docker run -d -p 82:8080 swaggerapi/swagger-editor
docker run -d -p 83:8080 swaggerapi/swagger-editor
</code></pre>
</div>
<p>I can’t believe I never thought of this before.</p>I’ve always edited swagger docs using the online editor, but thanks to docker, you may run it locally with equal ease:Connecting a Clojure REPL to a PostgreSQL Docker Container2019-01-24T00:00:00-08:002019-01-24T00:00:00-08:00/blog/docker-postgres<p>In this tutorial, we’ll launch Postgres (a.k.a. PostgreSQL) in a Docker container, create a Postgres database, and connect to it from a Clojure REPL. Postgres is a great open source SQL database with a long history, and is a good choice for many small-to-medium scale projects.</p>
<p>First, we need to download the official Postgres image, named <code class="highlighter-rouge">postgres</code>, create a new container, and start the image running in that container. Remember, in the parlance of Docker, “images” are the recipe and “containers” are the cake(s) made from that recipe. By default, Docker containers can make connections to the outside world, but the outside world cannot connect to containers. So we need to “publish” a port that has been exposed in the Docker image.</p>
<p>Downloading the docker image, instantiating it, setting a password, and publishing a port can all be done in a single command:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker run --name my-postgres-container --env <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>mysecretpassword -p 5432:5432 --detach postgres
</code></pre>
</div>
<p>where</p>
<div class="highlighter-rouge"><pre class="highlight"><code>-p 5432:5432 Publish port 5432 in the container as 5432 on localhost, so you can connect
-name Gives a name to this container
-env Environment variable(s)
-detach Disconnect terminal
</code></pre>
</div>
<p>An important note, as I mentioned in the article on <a href="../clojure-apps-in-docker">using Docker to deploy Clojure apps</a>, is that the <code class="highlighter-rouge">docker run</code> command <em>creates</em> a new container – it “bakes a new cake”. If you want to launch a container you have already baked, you’ll use the <code class="highlighter-rouge">docker start</code> command instead, as we’ll see below.</p>
<p>You can check that the Docker container is running with</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker container ls
</code></pre>
</div>
<p>or more succinctly</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker ps
</code></pre>
</div>
<p>Next, let’s use another Docker trick to create a second <code class="highlighter-rouge">postgres</code> cake, this time running the <code class="highlighter-rouge">psql</code> command, and <code class="highlighter-rouge">link</code> it to the other container. We want this container to be deleted when we close it (<code class="highlighter-rouge">--rm</code>), and to be linked to the <code class="highlighter-rouge">my-postgres-container</code> so that it can access the server we just started. It is interesting that we can use <code class="highlighter-rouge">psql</code> without having it installed on our local machine; we’re using the command that is inside a container!</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker run -it --rm --link my-postgres-container:postgres postgres psql -h postgres -U postgres
</code></pre>
</div>
<p>In that terminal, let’s now create a test database:</p>
<div class="language-sql highlighter-rouge"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">testdb</span><span class="p">;</span>
</code></pre>
</div>
<p>You may now hit control-d to quit the terminal if you want to close it. In my case, I often leave it open for debugging.</p>
<p>Make a new Clojure project, and be sure to add these lines to the <code class="highlighter-rouge">:dependencies</code>:</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">[</span><span class="n">org.clojure/java.jdbc</span><span class="w"> </span><span class="s">"0.7.9"</span><span class="p">]</span><span class="w">
</span><span class="p">[</span><span class="n">org.postgresql/postgresql</span><span class="w"> </span><span class="s">"42.2.5"</span><span class="p">]</span><span class="w">
</span></code></pre>
</div>
<p>Now spin a up a Clojure REPL in that project, and run the following commands, one at a time, and note the output of each.</p>
<div class="language-clojure highlighter-rouge"><pre class="highlight"><code><span class="p">(</span><span class="nf">ns</span><span class="w"> </span><span class="n">mytest</span><span class="w"> </span><span class="p">(</span><span class="no">:require</span><span class="w"> </span><span class="p">[</span><span class="n">clojure.java.jdbc</span><span class="w"> </span><span class="no">:as</span><span class="w"> </span><span class="n">sql</span><span class="p">]))</span><span class="w">
</span><span class="p">(</span><span class="k">def</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="p">{</span><span class="no">:dbtype</span><span class="w"> </span><span class="s">"postgresql"</span><span class="w"> </span><span class="no">:dbname</span><span class="w"> </span><span class="s">"testdb"</span><span class="w"> </span><span class="no">:user</span><span class="w"> </span><span class="s">"postgres"</span><span class="w"> </span><span class="no">:password</span><span class="w"> </span><span class="s">"mysecretpassword"</span><span class="p">})</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/query</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="p">[</span><span class="s">"SELECT 3*5 AS result"</span><span class="p">])</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/db-do-commands</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="p">(</span><span class="nf">sql/create-table-ddl</span><span class="w"> </span><span class="no">:testing</span><span class="w"> </span><span class="p">[[</span><span class="no">:data</span><span class="w"> </span><span class="no">:text</span><span class="p">]]))</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/insert!</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="no">:testing</span><span class="w"> </span><span class="p">{</span><span class="no">:data</span><span class="w"> </span><span class="s">"hahaha"</span><span class="p">})</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/insert!</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="no">:testing</span><span class="w"> </span><span class="p">{</span><span class="no">:data</span><span class="w"> </span><span class="s">"lol"</span><span class="p">})</span><span class="w">
</span><span class="p">(</span><span class="nf">sql/query</span><span class="w"> </span><span class="n">db-spec</span><span class="w"> </span><span class="p">[</span><span class="s">"SELECT * FROM testing"</span><span class="p">])</span><span class="w">
</span></code></pre>
</div>
<p>Now it’s time to develop that app to your heart’s content.</p>
<p>When you are done with your app, you can stop your container with:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker container stop my-postgres-container
</code></pre>
</div>
<p>At later times, you can restart the container, and use its state, with:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker container start my-postgres-container
</code></pre>
</div>
<p>You can delete the stopped container with:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker container rm my-postgres-container
</code></pre>
</div>
<p>Note that if you want a second container for postgres – say, for a different project – you could make a second container called “my-postgres-container2” using the <code class="highlighter-rouge">docker run</code> command near the top of this. Make as many containers as you need, and start and stop them as required.</p>
<p>Keeping the database in the container is fine during first steps and testing, but it can be slow and wasteful of disk. The reccommended solution is use <a href="https://docs.docker.com/storage/volumes/">Docker Volumes</a> to be able to upgrade the Postgres version separately from the data itself.</p>
<p>An example of how to do that is in this case is:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker volume create --name postgresvol
docker run --name my-postgres-container --env <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>mysecretpassword -p 5432:5432 --detach -v postgresvol:/var/lib/postgresql/data postgres
</code></pre>
</div>
<h1 id="references">References</h1>
<ul>
<li><a href="http://clojure-doc.org/articles/ecosystem/java_jdbc/home.html">The Clojure JDBC docs</a> were a helpful starting point for learning how to use JDBC.</li>
<li><a href="http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql.html">The Clojure JDBC SQL page</a> is also very good.</li>
<li><a href="https://docs.docker.com/get-started/">The Docker Getting Started Guide</a>. Docker documentation is detailed and helpful. The getting started guide is decent, but not as example-heavy and in-depth as I would have wanted.</li>
<li><a href="https://docs.docker.com/v17.09/engine/userguide/networking/default_network/binding/">Docker Documentation on Binding Container Ports to the Host</a>. This has a critical quote that helped me on a related problem: “By default Docker containers can make connections to the outside world, but the outside world cannot connect to containers.” Even though I had exposed the ports in docker images with “EXPOSE”, they were not actually be reachable from the host unless you launch the container with <code class="highlighter-rouge">docker run -P ...</code> or <code class="highlighter-rouge">docker run -p 5432:5432</code>. Kind of a gotcha for newbies, I feel.</li>
</ul>In this tutorial, we’ll launch Postgres (a.k.a. PostgreSQL) in a Docker container, create a Postgres database, and connect to it from a Clojure REPL. Postgres is a great open source SQL database with a long history, and is a good choice for many small-to-medium scale projects.Packaging and Running Clojure Apps Locally with Docker2019-01-23T00:00:00-08:002019-01-23T00:00:00-08:00/blog/clojure-apps-in-docker<p>This is just a simple tutorial on how to package and deploy a Clojure application to a Docker file, which can then be deployed locally, on a server, or in the cloud.</p>
<p>We are going to follow a pretty classic pattern in Clojure:</p>
<ol>
<li><em>DEVELOP</em>. In this case, we’ll use <code class="highlighter-rouge">lein repl</code> and emacs (cider). For a mixed Clojure/Clojurescript app, <code class="highlighter-rouge">lein figwheel</code>.</li>
<li><em>BUILD UBERJAR</em>. Very simply, <code class="highlighter-rouge">lein uberjar</code> is all we need to do to package all java dependencies and app resources into a single file.</li>
<li><em>BUILD DOCKER IMAGE</em>. Because the uberjar does most of the work for us, we just need a Java JRE and the uberjar.</li>
<li><em>DEPLOY</em>. If you are running your own a production system, Kubernetes would be a good choice here. Amazon, Google, Microsoft all have their own container services as well.</li>
</ol>
<h2 id="develop">Develop</h2>
<p>It’s probably not the latest and greatest way of making a front end and back end, but I find for many of my small apps <a href="https://reagent-project.github.io/">Reagent</a> for the front end and <a href="https://github.com/ring-clojure">Ring</a> for the back end are enough. Creating a project is simple:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>lein new reagent myapp +cider
<span class="nb">cd </span>myapp
git init
</code></pre>
</div>
<p>And then you just develop like normal until you are happy with the way your app works. The only slightly tricky thing to remember when rolling up an app that you will put in a Docker container is that your web server should be configured to listen on 0.0.0.0 (which means to listen on all interfaces). Typically you just add <code class="highlighter-rouge">:host "0.0.0.0"</code> as an argument to whatever webserver you are starting, <code class="highlighter-rouge">jetty</code> in this case.</p>
<h2 id="build-the-uberjar">Build the UberJar</h2>
<p><code class="highlighter-rouge">lein</code> makes packaging up a web server, front end, and assets all together extremely easy:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>lein uberjar
</code></pre>
</div>
<p>The JAR file will appear in the <code class="highlighter-rouge">target/</code> directory. If you need to control the name of the output uberjar, adjust the <code class="highlighter-rouge">:uberjar-name</code> key in your <code class="highlighter-rouge">project.clj</code>.</p>
<h2 id="build-docker-image">Build Docker Image</h2>
<p>The steps for building a Docker image are stored in a special file called <code class="highlighter-rouge">Dockerfile</code>, which I typically place in the root directory of my project repo. Since all the assets are already stored in the uberjar, the contents of <code class="highlighter-rouge">Dockerfile</code> are simple:</p>
<div class="highlighter-rouge"><pre class="highlight"><code># Use https://hub.docker.com/_/oracle-serverjre-8
FROM java:8-alpine
# Make a directory
RUN mkdir -p /app
WORKDIR /app
# Copy only the target jar over
COPY app-standalone.jar .
# Open the port
EXPOSE 3000
# Run the JAR
CMD java -jar app-standalone.jar
</code></pre>
</div>
<p>During the build process, Docker needs a “context” directory that contains all of the files needed to build the image. Since we have already packaged assets in the JAR, and compiled the source code into bytecode, we do not need to copy the source over in the build process. We can let the Docker build process use the “target” directory only. This can speed up the Docker build, and saves space because it is not copying resources twiec. The only downside is that it means we have to explicitly specify the Dockerfile to use explicitly, and explicitly specify the directory to use as the “context” directory. Run this from the root of the project directory:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker build --tag myapp -f Dockerfile target
</code></pre>
</div>
<p>And that’s it!</p>
<h2 id="deploy-locally-to-test">Deploy Locally to Test</h2>
<p>If your app has no state (and it shouldn’t, if you are making a <a href="https://12factor.net">12-factor app</a>, you can now create a new container from your docker image, passing it whatever environment varibales you need, and exposing internal port 3000 to external port 3000:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker run --name my-app-container --env <span class="nv">MY_ENV_VAR</span><span class="o">=</span>some_value -p 3000:3000 -rm myapp
</code></pre>
</div>
<p>Check to see that it is running:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker containers ls
</code></pre>
</div>
<p>or more concisely,</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker ps
</code></pre>
</div>
<p>I still find it slightly misleading that <code class="highlighter-rouge">run</code> actually means “create and start a container” in the language of Docker-ese. Stopping the container will not delete it in general, and that often means containers accumulating silently in the background. Hence the <code class="highlighter-rouge">-rm</code> flag, which tells Docker to delete the container when it is done.</p>
<p>If you don’t want the container to delete itself when done, omit the <code class="highlighter-rouge">-rm</code> option, and maybe consider instead the <code class="highlighter-rouge">--detach</code> option so you get your shell back. If you aren’t building a new container regularly and the container is lying around, starting and stopping the named container is as simple as you would expect:</p>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>docker container start my-app-container
docker container stop my-app-container
</code></pre>
</div>
<h2 id="resources-and-interesting-reading">Resources and Interesting Reading</h2>
<ol>
<li><a href="https://blog.jessfraz.com/post/docker-containers-on-the-desktop/">https://blog.jessfraz.com/post/docker-containers-on-the-desktop/</a></li>
<li><a href="https://docs.docker.com/get-started/">https://docs.docker.com/get-started/</a></li>
<li><a href="https://kubernetes.io/docs/getting-started-guides/ubuntu/installation/">https://kubernetes.io/docs/getting-started-guides/ubuntu/installation/</a></li>
<li><a href="https://marketplace.automic.com/details/clojure-official-docker-image">https://marketplace.automic.com/details/clojure-official-docker-image</a></li>
<li><a href="https://medium.com/@mprokopov/deployment-of-clojure-app-to-production-with-docker-9dbffeac6ef5">https://medium.com/@mprokopov/deployment-of-clojure-app-to-production-with-docker-9dbffeac6ef5</a></li>
<li><a href="https://medium.com/@divyum/building-a-simple-http-server-in-clojure-part-iii-dockerizing-clojure-application-1f53a6a90af2">https://medium.com/@divyum/building-a-simple-http-server-in-clojure-part-iii-dockerizing-clojure-application-1f53a6a90af2</a></li>
<li><a href="https://docs.docker.com/develop/develop-images/dockerfile_best-practices/">https://docs.docker.com/develop/develop-images/dockerfile_best-practices/</a></li>
<li><a href="https://devcenter.heroku.com/articles/clojure-web-application">https://devcenter.heroku.com/articles/clojure-web-application</a></li>
</ol>This is just a simple tutorial on how to package and deploy a Clojure application to a Docker file, which can then be deployed locally, on a server, or in the cloud.Layered Tetrahedral Geometric Structures2018-02-25T00:00:00-08:002018-02-25T00:00:00-08:00/blog/layered-dual-geometric-structures<p>This article describes another iteration of the dome-like structure that I <a href="../pentakis-dodecahedron-dome">discussed a few months ago</a>. I still don’t have a good vocabulary for describing why I think these things are neat, except to say that I somehow find it relaxing to work on toy engineering problems once or twice a month. I guess <em>de gustibus non est disputandum</em> – “there’s no accounting for (bad) taste!”</p>
<p>Lately, I have been playing around with simple structures that are subject to some unusual but not pointless constraints. Here are the ones I played with for this design:</p>
<ol>
<li>
<p>How lightweight could you make a building? All other things being equal, lighter implies fewer materials, lower cost, and less ecological impact. To make something light, you must use what little material there is in a structurally strong fashion.</p>
</li>
<li>
<p>How simple could the constituent pieces of the structure be? Could they be made of a few simple shapes, repeated over and over? By re-using the same shape over and over again, production becomes more efficient because a miniature economy of scale is created.</p>
</li>
<li>
<p>Could you make temporary housing that could be flat-packed into the bed of a truck? This might be useful when you want something stronger and warmer than a tent, but less permanent than a house.</p>
</li>
</ol>
<p>On to the structure itself! A teaser:</p>
<p class="center"><img src="layered-dual-poly-09.jpg" alt="layered-dual-poly-09" title="The external pentagular skin, mostly covered." /></p>
<h1 id="background">Background</h1>
<p>Previously, when I was working on the <a href="../pentakis-dodecahedron-dome">reinforced pentakis dodecahedron dome</a> toy concept, I mostly just followed the engineering tradition of using triangles and tetrahedra to create light and strong truss structures. I did not spend time deeply studying what the list of possible starting polyhedra were, and at the end of the project there was one thing in my mind that was not really resolved:</p>
<blockquote>
<p>Are there other shapes that are closely-related to the pentakis dodecahedron, but that are simpler to construct or have more desirable properties? How many possible shapes could there be, and how would I know that I have found them all?</p>
</blockquote>
<p>Previously, I started with the icosahedron and then connected it to a triakis icosahedron and pentakis dodecahedron without really understanding from whence these solids come, or what properties they have. Let’s go a little further this time into understanding the properties of regular polyhedra.</p>
<h1 id="platonic-catalan-and-archimedian-solids">Platonic, Catalan, and Archimedian Solids</h1>
<p><a href="../pentakis-dodecahedron-dome">Last time</a>, I briefly discussed the 5 platonic solids, which you may remember are the same shapes as dice:</p>
<table class="center">
<thead>
<tr>
<th> </th>
<th>Platonic Solid</th>
<th>Vertex Figure</th>
<th>Face</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="platonic/Tetrahedron.gif" alt="Tetrahedron" title="Tetrahedron" /></td>
<td>Tetrahedron</td>
<td>3.3.3</td>
<td>Triangle</td>
</tr>
<tr>
<td><img src="platonic/Hexahedron.gif" alt="Hexahedron" title="Hexahedron" /></td>
<td>Cube</td>
<td>4.4.4</td>
<td>Square</td>
</tr>
<tr>
<td><img src="platonic/Octahedron.gif" alt="Octahedron" title="Octahedron" /></td>
<td>Octahedron</td>
<td>3.3.3.3</td>
<td>Triangle</td>
</tr>
<tr>
<td><img src="platonic/Dodecahedron.gif" alt="Dodecahedron" title="Dodecahedron" /></td>
<td>Dodecahedron</td>
<td>5.5.5</td>
<td>Pentagon</td>
</tr>
<tr>
<td><img src="platonic/Icosahedron.gif" alt="Icosahedron" title="Icosahedron" /></td>
<td>Icosahedron</td>
<td>3.3.3.3.3</td>
<td>Triangle</td>
</tr>
</tbody>
</table>
<p>What’s a <em>vertex figure</em>, I hear you ask? It is a way to unambiguously define regular polyhedra. If you pick any vertex on the regular polyhedra, and then count the number of sides on each face that touches that vertex, moving in a clockwise or counterclockwise fashion, you will get a sequence of numbers that we call the vertex figure. For example, each corner of the cube touches three squares, so we can call it 4.4.4. We are going to use vertex figures to help us keep track of complicated shapes rather than memorize scores of names.</p>
<p>There is another advantage to using vertex figures: they are unambigious. In fact, you can use vertex figures to find shapes described by nerds in other countries, even if you don’t speak their language, because the language of math is universal.</p>
<p>You may remember that Platonic solids have several important properties:</p>
<ol>
<li>Every edge is the same length.</li>
<li>Every face has the same number of sides.</li>
<li>Every face’s interior angles are the equal</li>
<li>Every vertex lies on the surface of a sphere.</li>
</ol>
<p>These constraints are so restrictive that there are only five solids that meet all the criteria. But what would happen if we removed one or more of those constraints? As it turns out, by relaxing the constraints we can create two new families of solids: the <strong>Archimedian solids</strong> and the <strong>Catalan solids</strong>.</p>
<table class="center">
<thead>
<tr>
<th> </th>
<th>All Edge</th>
<th>All Faces</th>
<th> </th>
<th>Vertices on</th>
</tr>
<tr>
<th> </th>
<th>Lengths Equal?</th>
<th>Same?</th>
<th>Quantity</th>
<th>Sphere?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Platonic</td>
<td>Yes</td>
<td>Yes</td>
<td>5</td>
<td>Always</td>
</tr>
<tr>
<td>Archimedian</td>
<td>Yes</td>
<td>No</td>
<td>13</td>
<td>Not always</td>
</tr>
<tr>
<td>Catalan</td>
<td>No</td>
<td>Yes</td>
<td>13</td>
<td>Not always</td>
</tr>
</tbody>
</table>
<p>We’ll now go into more details on each of those families of solids. As a spoiler to all you nerds out there, Archimedian and Catalan solids are mathematically dual to each other, meaning that for each Catalan solid with X faces and Y vertices, there will be an Archimedian solid with Y faces and X vertices.</p>
<h1 id="archimedian-solids">Archimedian Solids</h1>
<p>First described by the greek mathematician Archimedes, <strong>the Archimedian solids are the set of polyhedra in which all edges are the same length</strong>. If you are gluing together equal-length matchsticks, there are only 13 possible shapes (or 15, if you count the chiral variants as different) that you can make:</p>
<table class="center">
<thead>
<tr>
<th> </th>
<th>Archimedian Solid</th>
<th>Vertex Fig</th>
<th>Symmetry</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="archimedian/3.6.6.truncatedtetrahedron.gif" alt="3.6.6.truncatedtetrahedron" title="3.6.6.truncatedtetrahedron.gif" /></td>
<td>Truncated tetrahedron</td>
<td>3.6.6</td>
<td>Td</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="archimedian/3.4.3.4.cuboctahedron.gif" alt="3.4.3.4.cuboctahedron" title="3.4.3.4.cuboctahedron.gif" /></td>
<td>Cuboctahedron</td>
<td>3.4.3.4</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="archimedian/3.8.8.truncatedhexahedron.gif" alt="3.8.8.truncatedhexahedron" title="3.8.8.truncatedhexahedron.gif" /></td>
<td>Truncated Cube</td>
<td>3.8.8</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="archimedian/4.6.6.truncatedoctahedron.gif" alt="4.6.6.truncatedoctahedron" title="4.6.6.truncatedoctahedron.gif" /></td>
<td>Truncated Octahedron</td>
<td>4.6.6</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="archimedian/3.4.4.4.rhombicuboctahedron.gif" alt="3.4.4.4.rhombicuboctahedron" title="3.4.4.4.rhombicuboctahedron.gif" /></td>
<td>Rhombicuboctahedron</td>
<td>3.4.4.4</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="archimedian/4.6.8.truncatedcuboctahedron.gif" alt="4.6.8.truncatedcuboctahedron" title="4.6.8.truncatedcuboctahedron.gif" /></td>
<td>Truncated Cuboctahedron</td>
<td>4.6.8</td>
<td>Oh</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="archimedian/3.3.3.4.nubhexahedroncw.gif" alt="3.3.3.4.nubhexahedroncw" title="3.3.3.3.4 - Snub Hexahedron" />or<img src="archimedian/3.3.3.4.snubhexahedronccw.gif" alt="3.3.3.4.snubhexahedronccw" title="3.3.3.4.snubhexahedronccw.gif" /></td>
<td>Snub cube</td>
<td>3.3.3.3.4</td>
<td>O (Chiral)</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="archimedian/3.5.3.5.icosidodecahedron.gif" alt="3.5.3.5.icosidodecahedron" title="3.5.3.5.icosidodecahedron.gif" /></td>
<td>Icosidodecahedron</td>
<td>3.5.3.5</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="archimedian/3.10.10.truncateddodecahedron.gif" alt="3.10.10.truncateddodecahedron" title="3.10.10 - Truncated dodecahedron" /></td>
<td>Truncated Docedahedron</td>
<td>3.10.10</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="archimedian/5.6.6.truncatedicosahedron.gif" alt="5.6.6.truncatedicosahedron" title="5.6.6.truncatedicosahedron.gif" /></td>
<td>Truncated Icosahedron</td>
<td>5.6.6</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="archimedian/3.4.5.4.rhombicosidodecahedron.gif" alt="3.4.5.4.rhombicosidodecahedron" title="3.4.5.4.rhombicosidodecahedron.gif" /></td>
<td>Rhombicosidodecahedron</td>
<td>3.4.5.4</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="archimedian/4.6.10.truncatedicosidodecahedron.gif" alt="4.6.10.truncatedicosidodecahedron" title="4.6.10.truncatedicosidodecahedron.gif" /></td>
<td>Truncated Icosidodecahedron</td>
<td>4.6.10</td>
<td>Ih</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="archimedian/3.3.3.5.snubdodecahedroncw.gif" alt="3.3.3.5.snubdodecahedroncw" title="3.3.3.5.snubdodecahedroncw.gif" />or<img src="archimedian/3.5.5.snubdodecahedronccw.gif" alt="3.5.5.snubdodecahedronccw" title="3.5.5.snubdodecahedronccw.gif" /></td>
<td>Snub Dodecahedron</td>
<td>3.3.3.3.5</td>
<td>I (Chiral)</td>
</tr>
</tbody>
</table>
<p>If that table scares and confuses you, you aren’t the only one – I found this all very bewildering at first! What the heck do all of these weird names mean? Truncated? Rhombi-? Snub? Why is there a combination of a cube and octahedron called a cuboctahedron? How do the icosahedron and dodecahedron combine to make a icosidodecahedron? Also, what is a point symmetry group?</p>
<p>It’s real easy to “lose the forest for the trees” here, so please feel free to NOT learn the names of these things – some of these polyhedra even have multiple names, which can make studying them terribly confusing. In fact, I’m going to totally ignore defining the truncation, rombi, and snub operations, because I am suspicious of starting by describing them in terms of transformations. Instead, I like to focus on observing the most obvious <em>symmetries</em> that they have, and focus on learning the unambiguous vertex figure definitions.</p>
<p>To me, <strong>the important thing to note about polyhedral symmetries is that our 3D universe appears to force polyhedra into just very few point symmetry groups</strong>. If we are a little hand-wavy and ignore the special case of “chiral” polyhedra for now, we can say there are just two symmetries here:</p>
<ol>
<li>Shapes like cubes/octahedra (of which the tetrahedral symmetry can probably be considered a special case);</li>
<li>Shapes like dodecahedra/icosahedra.</li>
</ol>
<p>The point symmetry group, roughly speaking, refers to the ways in which you may rotate a polyhedra into a position such that its vertex points look the same as when you started.</p>
<h1 id="catalan-solids">Catalan Solids</h1>
<p>Catalan solids, named for the Belgian mathematician <a href="https://en.wikipedia.org/wiki/Eug%C3%A8ne_Charles_Catalan">Eugene Catalan</a>, were originally described in 1865. Think about that for a moment – despite Archimedes having discovered important polyhedral shapes in ancient times and despite the fact that the Catalan solids are an incredibly closely related set of shapes, the Catalan solids went formally unrecognized or undiscovered for two millenia. It makes you wonder what other important facts of mathematics lie adjacent to our existing knowledge, but simply have not been discovered yet.</p>
<p>Catalan solids relax a different constraint than the Archemidean solids. Catalan solids do <em>not</em> have edges that are all the same length. Instead, they <em>do</em> have faces that are all the same shape. There are 13 (or 15) of these as well:</p>
<table class="center">
<thead>
<tr>
<th> </th>
<th>Catalan Solid</th>
<th>Face Polygon</th>
<th>Symmetry</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="catalan/v3.6.6.triakistetrahedron.gif" alt="v3.6.6.triakistetrahedron" title="v3.6.6.triakistetrahedron.gif" /></td>
<td>Triakis Tetrahedron</td>
<td>Isosceles<br /> V3.6.6</td>
<td>Td</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="catalan/v3.4.3.4.rhombicdodecahedron.gif" alt="v3.4.3.4.rhombicdodecahedron" title="v3.4.3.4.rhombicdodecahedron.gif" /></td>
<td>Rhombic Dodecahedron</td>
<td>Rhombus<br /> V3.4.3.4</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="catalan/v3.8.8.triakisoctahedron.gif" alt="v3.8.8.triakisoctahedron" title="v3.8.8.triakisoctahedron.gif" /></td>
<td>Triakis Octahedron</td>
<td>Isosceles<br /> V3.8.8</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="catalan/v4.6.6.tetrakishexahedron.gif" alt="v4.6.6.tetrakishexahedron" title="v4.6.6.tetrakishexahedron.gif" /></td>
<td>Tetrakis Hexahedron</td>
<td>Isosceles<br /> V4.6.6</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="catalan/v3.4.4.4.deltoidalicositetrahedron.gif" alt="v3.4.4.4.deltoidalicositetrahedron" title="v3.4.4.4.deltoidalicositetrahedron.gif" /></td>
<td>Deltoidal Icositetrahedron</td>
<td>Kite<br /> V3.4.4.4</td>
<td>Oh</td>
</tr>
<tr>
<td><img src="catalan/v4.6.8.disdyakisdodecahedron.gif" alt="v4.6.8.disdyakisdodecahedron" title="v4.6.8.disdyakisdodecahedron.gif" /></td>
<td>Disdyakis Dodecahedron</td>
<td>Scalene<br /> V4.6.8</td>
<td>Oh</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="catalan/v3.3.3.3.4.pentagonalicositetrahedron-ccw.gif" alt="v3.3.3.3.4.pentagonalicositetrahedron-ccw" title="v3.3.3.3.4.pentagonalicositetrahedron-ccw.gif" />or<img src="catalan/v3.3.3.3.4.pentagonalicositetrahedron-cw.gif" alt="v3.3.3.3.4.pentagonalicositetrahedron-cw" title="v3.3.3.3.4.pentagonalicositetrahedron-cw.gif" /></td>
<td>Pentagonal Icositetrahedron</td>
<td>Pentagon<br /> V3.3.3.3.4</td>
<td>O</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="catalan/v3.5.3.5.rhombictricontahedron.gif" alt="v3.5.3.5.rhombictricontahedron" title="v3.5.3.5.rhombictricontahedron.gif" /></td>
<td>Rhombic Triacontahedron</td>
<td>Rhombus<br /> V3.5.3.5</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="catalan/v3.10.10.triakisicosahedron.gif" alt="v3.10.10.triakisicosahedron" title="v3.10.10.triakisicosahedron.gif" /></td>
<td>Triakis Icosahedron</td>
<td>Isosceles<br /> V3.10.10</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="catalan/v5.6.6.pentakisdodecahedron.gif" alt="v5.6.6.pentakisdodecahedron" title="v5.6.6.pentakisdodecahedron.gif" /></td>
<td>Pentakis Dodecahedron</td>
<td>Isosceles<br /> V5.6.6</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="catalan/v3.4.5.4.deltoidalhexecontahedron.gif" alt="v3.4.5.4.deltoidalhexecontahedron" title="v3.4.5.4.deltoidalhexecontahedron.gif" /></td>
<td>Deltoidal Hexecontahedron</td>
<td>Kite<br /> V3.4.5.4</td>
<td>Ih</td>
</tr>
<tr>
<td><img src="catalan/v4.6.10.disdyakistricontahedron.gif" alt="v4.6.10.disdyakistricontahedron" title="v4.6.10.disdyakistricontahedron.gif" /></td>
<td>Disdyakis Tricontahedron</td>
<td>Scalene<br /> V4.6.10</td>
<td>Ih</td>
</tr>
</tbody>
<tbody>
<tr>
<td><img src="catalan/v3.3.3.3.5.pentagonalhexecontahedron-ccw.gif" alt="v3.3.3.3.5.pentagonalhexecontahedron-ccw" title="v3.3.3.3.5.pentagonalhexecontahedron-ccw.gif" />or<img src="catalan/v3.3.3.3.5.pentagonalhexecontahedron-cw.gif" alt="v3.3.3.3.5.pentagonalhexecontahedron-cw" title="v3.3.3.3.5.pentagonalhexecontahedron-cw.gif" /></td>
<td>Pentagonal Hexecontahedron</td>
<td>Pentagon<br /> V3.3.3.3.5</td>
<td>I</td>
</tr>
</tbody>
</table>
<p>What do the Face Polygon numbers mean that start with a “V”? Well, for the Archimedian solids, we picked a corner and counted the number of sides of each face touching that corner. For the Catalan solids, we take a face, and work our way around the verticies and count the number of polygons touching that point.</p>
<h1 id="engineering-with-catalan-and-archimedian-solids">Engineering with Catalan and Archimedian Solids</h1>
<p>How does all this relate to <a href="../pentakis-dodecahedron-dome">what I tried before</a>? Well, in the Pentakis Dodecahedron concept, the internal structure was an icosahedron to which a bunch of tetrahedra were added, and this formed new convex hull which turned out to be a Pentakis Dodecahedron, a Catalan solid. In other words, we started with a platonic solid (icosahedron, or 3.3.3.3.3), took its dual to get another platonic solid (dodecahedron, 5.5.5), and then connected all the vertices together to form a catalan solid (V5.6.6). This happened to form very strong tetrahedra throughout the structure.</p>
<p>But now let’s revisit that important question:</p>
<blockquote>
<p>Were there other pairs of shapes we could have used to make a self-supporting dome entirely from tetrahedra, instead of the Icosahedron and Dodecahedron?</p>
</blockquote>
<p>The answer appears to be yes – one could begin with any Archimedian or Catalan solid, find its dual, and then connect all verticies together to make a dual-layer structure. It’s not necessarily guaranteed (i.e. I haven’t tried to prove it yet!) that the resulting structure be entirely made of tetrahedra, so it might not be as rigid as the Pentakis Dodecahedron dome, but it would definitely be a dual-layer structure.</p>
<p>At first pass, there would seem to be many reasons to choose an Archimedian solid for an internal structure, since all the edges would be the same length. If you were making this out of 2x4’s or dowels, they would all be the same length. And we might want to choose a Catalan solid for the exterior skin, so that all the exterior pieces would be the same shape and size. We also have some flexibility on the sizing of the internal polyhedron and external polyhedron, to provide more or less insulation between the inside and outside of the structure.</p>
<p>There are other considerations as well. For sealing edges or corners to form a waterproof skin, we’d like it if most of our external corners have only 3 polyhedra coming together at a point. This would exclude shapes like the Triakis Icosahedron (V3.10.10), that have many points at which 10 polygons come together.</p>
<p>Let’s now work through an example of one of the thirtneen dual double-layered domes that you can make, knowing about Catalan and Archimedian solids.</p>
<h2 id="example-33335-and-v33335">Example: 3.3.3.3.5 and V3.3.3.3.5</h2>
<p>Let’s combine the Snub Dodecahedron (Archimedian Solid 3.3.3.3.5) to its dual, the Pentagonal Hexacontahedron (Catalan Solid V3.3.3.3.5). We start by making simple regular tetrahedra:</p>
<p><img src="layered-dual-poly-01.jpg" alt="layered-dual-poly-01" title="A single tetrahedron." /></p>
<p>Then connect them in pairs…</p>
<p><img src="layered-dual-poly-02.jpg" alt="layered-dual-poly-02" title="Two tetrahedra, connected along one edge." /></p>
<p>And proceed adding on tetrahedron after tetrahedron until the inside forms the triangular faces of the Snub Dodecahedron:</p>
<p><img src="layered-dual-poly-04.jpg" alt="layered-dual-poly-04" title="The inside should begin to form a snub dodecahedron" /></p>
<p><img src="layered-dual-poly-03.jpg" alt="layered-dual-poly-03" title="The outside looks like a spiky dodecahedron." /></p>
<p>At this point, the outside should look like a bunch of tetrahedral spikes, but the whole structure will still be quite floppy. We now add other icosceles triangles to hold the spike tips a fixed distance from one another. Two of the edges of these icoseles triangles will be the same lengths as the tetrahedra edges, but one edge will be longer, being equal to the outside pentagons’ (from the Pentagonal Hexacontahedron) base edge.</p>
<p><img src="layered-dual-poly-05.jpg" alt="layered-dual-poly-05" title="Exterior view of supporting structure." /></p>
<p>The resulting structure should now be fairly rigid and self-supporting, and look like this from the inside:</p>
<p><img src="layered-dual-poly-06.jpg" alt="layered-dual-poly-06" title="Inside view of supporting structure." /></p>
<p>The penultimate step is to connect the pentagons into five-piece petals:</p>
<p><img src="layered-dual-poly-07.jpg" alt="layered-dual-poly-07" title="Bottom view of petals" />
<img src="layered-dual-poly-08.jpg" alt="layered-dual-poly-08" title="Top view of petals" /></p>
<p>And finally, cover the top of the supporting structure with these petals:</p>
<p><img src="layered-dual-poly-09.jpg" alt="layered-dual-poly-09" title="One pentagonal petal left" />
<img src="layered-dual-poly-10.jpg" alt="layered-dual-poly-10" title="The completed structure." /></p>
<p>And the inside looks like this:
<img src="layered-dual-poly-11.jpg" alt="layered-dual-poly-11" title="Internal view" /></p>
<h2 id="paper-cut-outs-and-sizing-mathematics">Paper Cut Outs and Sizing Mathematics</h2>
<p>To make a fully spherical dual-layer polyhedron from paper, you will need to make triangles of 3 sizes, and also some irregular pentagons:</p>
<ul>
<li>80 of <script type="math/tex">(a,a,a)</script> triangles that will form the interior snub dodecahedron (3.3.3.3.5)</li>
<li>240 of <script type="math/tex">(a,b,b)</script> triangles that will form the tetrahedral spikes on the outside of the snub dodecahedron</li>
<li>160 of <script type="math/tex">(b,b,c)</script> triangles that will hold the tetrahedral spikes together</li>
<li>60 of <script type="math/tex">(c,c,c,d,d)</script> pentagons to form the exterior pentagonal hexacontahedron (V3.3.3.3.5)</li>
</ul>
<h3 id="equilateral-triangles-aaa">Equilateral Triangles (a,a,a)</h3>
<p>You can choose <script type="math/tex">a</script> however you want; this determines the interior size of the dome. The inner diameter <script type="math/tex">d_i</script> of the dome is:</p>
<script type="math/tex; mode=display">d_i \approx 4.07974 a</script>
<p>I made these by cutting out long strips of paper <script type="math/tex">\frac{\sqrt{3}/2}a</script> wide, marking out a tick every <script type="math/tex">a</script> on one edge of the paper, and <script type="math/tex">\frac{1}{2}a, \frac{3}{2}a, \frac{5}{2}a, ...</script> on the other edge, and then connecting the points to form equilateral triangles.</p>
<h3 id="icoseles-triangles-abb">Icoseles Triangles (a,b,b)</h3>
<p>You can also choose <script type="math/tex">b</script> to be any size you wish, provided that <script type="math/tex">2b >= a</script> so that triangles with edge lengths <script type="math/tex">(a,b,b)</script> exist. The larger <script type="math/tex">b</script> is, the larger the distance between the inner polythedron and outer polyhedron.</p>
<p>Since we usually start with square paper, I find that cutting out little squares with edge lengths of <script type="math/tex">(a,a,a,a)</script> and then cutting along the two diagonals is a fast way to make four triangles. In this case,</p>
<script type="math/tex; mode=display">b = \frac{\sqrt{2}}{2} a \approx 0.707107 a</script>
<h3 id="icosceles-triangles-bbc">Icosceles Triangles (b,b,c)</h3>
<p>The value of <script type="math/tex">c</script> is constrained by the values of <script type="math/tex">a</script> and <script type="math/tex">b</script>. Rather than dive into all the math, we’ll just use some pre-derived constants to simplify our expressions.</p>
<p>We start by <a href="http://dmccooey.com/polyhedra/RsnubDodecahedron.html">looking up the triangle center radius radius</a> of the snub dodecahedron’s triangles:</p>
<script type="math/tex; mode=display">r_{snub tri} \approx 2.0770896597432085994 a</script>
<p>To this, we add the height of the tetrahedral spikes <script type="math/tex">h_{tet}</script> to get the radius of the pentagonal hexacontahedron’s 3-vertexes <script type="math/tex">r_{pent3vertexes}</script>. The value of <script type="math/tex">h_{tet}</script> is easily derived from the pythogorean theorem and our knowledge that the inradius of a equilateral triangle <script type="math/tex">(a,a,a)</script> is <script type="math/tex">\frac{\sqrt{3}}{6}a</script>:</p>
<script type="math/tex; mode=display">h_{tet} = \sqrt{b^2 - (\frac{\sqrt{3}}{6}a)^2}</script>
<p>Thus</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray*}
r_{pent3vertexes} & = & r_{snub\_tri} + h_{tet} \\
& \approx & 2.0770896597432085994 a + \sqrt{b^2 - (\frac{\sqrt{3}}{6}a)^2} \end{eqnarray*} %]]></script>
<p>Finally, we can convert <script type="math/tex">r_{pent3vertexes}</script> into <script type="math/tex">c</script> by multiplying it by the ratio of the <a href="http://dmccooey.com/polyhedra/LpentagonalHexecontahedron.html">length constant of the short edges</a> of the pentagon: to <a href="http://dmccooey.com/polyhedra/LpentagonalHexecontahedron.html">the radius of the 3-vertexes of a pentagonal hexecontahedron’s pentagons</a>:</p>
<script type="math/tex; mode=display">c = \frac{0.58289953474498241442}{2.1172098986276657420} r_{pent3vertexes}</script>
<p>That’s the general case solution.</p>
<p>If you are making the particular shape that I did (in which <script type="math/tex">a=1, b \approx 0.7071...</script>) then <script type="math/tex">c \approx 0.7495688684810217</script>.</p>
<p>It’s important to note that not all of the pentagonal hexecontahedron’s points are the same distance from its center; the 12 vertexes where 5 edges come together are actually about 10% farther out than the 80 vertexes where 3 edges come together.</p>
<p>Once again, the fastest way to make these triangles is to cut out a strip of paper of width <script type="math/tex">h</script>, and use the same trick we used when making the equilateral triangles earlier. That is, we mark off regular distances of <script type="math/tex">c</script>, offset by <script type="math/tex">\frac{1}{2}c</script> on the other side, connect the dots diagonally, and cut out the resulting triangles.</p>
<h3 id="pentagons-c-c-c-d-d">Pentagons (c, c, c, d, d)</h3>
<p>Finding <script type="math/tex">d</script> from <script type="math/tex">c</script> is easy if you <a href="https://en.wikipedia.org/wiki/Pentagonal_hexecontahedron">look on wikipedia</a> to find the ratio of edge lengths:</p>
<script type="math/tex; mode=display">d \approx 1.7498525667362 c</script>
<p>When laying out the pentagon, don’t forget that the angle between the two <script type="math/tex">d</script> sides is about 67.45351 degrees, and to use appropriate symmetry.</p>
<p>I don’t have any rules of thumb on the construction of those pentagons, sorry! Just make a template and get to work, I guess.</p>
<h2 id="conclusion">Conclusion</h2>
<p>So there you have it: a self-supporting, extremely rigid structure in which all the internal structural edges are the same length, and in which all the external polygons are the same. More generally, it appears you could apply this technique to make a dozen other dual-layer structures made of Catalan and Archimedian solids.</p>
<p>This 3.3.3.3.5 design does have a few nice qualities about it, though:</p>
<ul>
<li>The pentagonal gaps in the internal structure make excellent places for doors or windows.</li>
<li>The pentagonal floorplan enables one to see any distant point from at least 2 windows.</li>
<li>If you are willing to make the tetrahedral spikes shorter, you can reduce the volume of insulation needed.</li>
<li>Everything is self supporting, and any one piece can be replaced as the structure stands.</li>
<li>It is made of overlapping tetrahedra that form a strong and resilient structure.</li>
</ul>
<p>I hope you enjoyed this little excursion in geometry and origami with me.</p>This article describes another iteration of the dome-like structure that I discussed a few months ago. I still don’t have a good vocabulary for describing why I think these things are neat, except to say that I somehow find it relaxing to work on toy engineering problems once or twice a month. I guess de gustibus non est disputandum – “there’s no accounting for (bad) taste!”