Portability | non-portable |
---|---|
Stability | experimental |
Maintainer | Edward Kmett <ekmett@gmail.com> |
Safe Haskell | None |
This module provides a Vector
-based Map
that is loosely based on the
Cache Oblivious Lookahead Array (COLA) by Bender et al. from
"Cache-Oblivious Streaming B-Trees".
Currently this Map
is implemented in an insert-only fashion. Deletions are left to future work
or to another derived structure in case they prove expensive.
Unlike the COLA, this version merely provides amortized complexity bounds as this permits us to
provide a fully functional API. However, even those asymptotics are only guaranteed if you do not
modify the "old" versions of the Map
. If you do, then while correctness is preserved, the
asymptotic analysis becomes inaccurate.
Reading from "old" versions of the Map
does not affect the asymptotic analysis and is fine.
Fractional cascading was originally replaced with the use of a hierarchical bloom filter per level containing the elements for that level, with the false positive rate tuned to balance the lookup cost against the costs of the cache misses for a false positive at that depth. This avoids the need to collect forwarding pointers from the next level, reducing pressure on the cache dramatically, while providing the same asymptotic complexity.
With either of these two techniques when used ephemerally, this Map
had asymptotic performance equal to that
of a B-Tree tuned to the parameters of your caches with requiring such parameter tuning.
However, the constants were still bad enough that the naive O(log^2 n) version of the COLA actually wins at lookups in benchmarks at the scale this data structure is interesting, say around a few million entries, by a factor of 10x! Consequently, we're currently not even Bloom filtering.
Compared to the venerable Data.Map
, this data structure currently consumes more memory, but it
provides a more limited palette of operations with different asymptotics (~10x faster inserts at a million entries)
and enables us to utilize contiguous storage.
NB: when used with boxed data this structure may hold onto references to old versions of things for many updates to come until sufficient operations have happened to merge them out of the COLA.
TODO: track actual percentage of occupancy for each vector compared to the source vector it was based on.
This would permit split
and other operations that trim a Map
to properly reason about space usage by
borrowing the 1/3rd occupancy rule from a Stratified Doubling Array.
- data Map k v
- empty :: Map k v
- null :: Map k v -> Bool
- singleton :: (Arrayed k, Arrayed v) => k -> v -> Map k v
- lookup :: (Ord k, Arrayed k, Arrayed v) => k -> Map k v -> Maybe v
- insert :: (Ord k, Arrayed k, Arrayed v) => k -> v -> Map k v -> Map k v
- fromList :: (Ord k, Arrayed k, Arrayed v) => [(k, v)] -> Map k v
Documentation
singleton :: (Arrayed k, Arrayed v) => k -> v -> Map k vSource
O(1) Construct a Map
from a single key/value pair.
lookup :: (Ord k, Arrayed k, Arrayed v) => k -> Map k v -> Maybe vSource
O(log^2 N) persistently amortized, O(N) worst case. Lookup an element.