Loading CSVs in Clojure is really easy using clojure.data.csv
. Depending on what you will do with the data, you can either represent it in tabular form (as a vector of vectors), or as a list of hashmaps. Both approaches have their advantages and disadvantages, and here are some very stock functions for how to achieve that:
(ns net.roboloco.csvs
"Functions for loading and saving CSVs."
(:require [clojure.data.csv :as csv]))
(defn empty-string-to-nil
"Returns a nil if given an empty string S, otherwise returns S."
[s]
(if (and (string? s) (empty? s))
nil
s))
(defn dissoc-nils
"Drops keys with nil values, or nil keys, from the hashmap H."
[h]
(into {} (filter (fn [[k v]] (and v k)) h)))
(defn load-csv
"Returns a data structure loaded from a CSV file at FILEPATH."
[filepath]
(with-open [reader (clojure.java.io/reader filepath)]
(->> (csv/read-csv reader)
(map (fn [row] (map empty-string-to-nil row)))
(doall))))
(defn save-csv
"Saves a vector of vectors DATA (i.e. a CSV) to disk at FILEPATH. "
[vec-of-vecs filepath]
(with-open [writer (clojure.java.io/writer filepath)]
(csv/write-csv writer vec-of-vecs)))
(defn tabular->maps
"Converts a vector of vectors into a vector of maps. Assumes that the
first row of the CSV is a header that contains column names."
[tabular]
(let [header (first tabular)]
(-> (map zipmap (repeat header) (rest tabular))
(mapv dissoc-nils))))
(defn maps->tabular
"Converts a vector of vectors into a vector of maps."
[rowmaps]
(let [columns (vec (sort (into #{} (map name (flatten (map keys rowmaps))))))]
(vec (conj (for [row rowmaps]
(vec (for [col columns]
(str (get row col "")))))
columns))))
(comment
(def data (tabular->maps (load-csv "/path/to/mycsv.csv")))
(save-csv! (maps->tabular data) "/some/other/path.csv")
)
Note that the above functions are not lazy, which is a good choice for CSVs 90% of the time. If you find yourself working with very large datasets that cannot be loaded all at once, you might want to adjust load-csv
, tabular->maps
, and maps->tabular
to work lazily and incrementally.