Comments on: Linear Bloom Filters http://comonad.com/reader/2008/linear-bloom-filters/ types, (co)monads, substructural logic Sat, 29 Dec 2012 15:18:06 -0800 http://wordpress.org/?v=2.8.4 hourly 1 By: Darrell Teague http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-7126 Darrell Teague Wed, 08 Apr 2009 00:41:27 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-7126 Ed, I know you and some of your readers love Haskell so I thought I would share this funny (but real) programming language I just read about: http://compsoc.dur.ac.uk/whitespace/index.php Ed,
I know you and some of your readers love Haskell so I thought I would share this funny (but real) programming language I just read about:

http://compsoc.dur.ac.uk/whitespace/index.php

]]>
By: Edward Kmett http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-6604 Edward Kmett Tue, 10 Mar 2009 21:43:50 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-6604 Yes. If you have p bits of hash function available to you, and the top tier consumes q of them, then at one extreme q = 0, and you have all p bits available for use within the one bucket, cutting p up into k different hash functions yields the traditional Bloom filter. In the other extreme you have q = p, and you have the efficacy of a bloom table/bit-hash with one hash function. In the middle the false positive rate lies somewhere in between, but you only have to deal with data points within a single bucket to answer the query, so you can typically save a factor of k in paging or cache-thrashing costs. Yes.

If you have p bits of hash function available to you, and the top tier consumes q of them, then at one extreme q = 0, and you have all p bits available for use within the one bucket, cutting p up into k different hash functions yields the traditional Bloom filter.

In the other extreme you have q = p, and you have the efficacy of a bloom table/bit-hash with one hash function.

In the middle the false positive rate lies somewhere in between, but you only have to deal with data points within a single bucket to answer the query, so you can typically save a factor of k in paging or cache-thrashing costs.

]]>
By: Christian http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-6577 Christian Mon, 09 Mar 2009 11:07:58 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-6577 Thanks, Edward. If I get this right, the larger the single buckets, the closer the two-tier variant performs to the standard variant (in sense of the false-positive rate for a total length of m)? Thanks, Edward. If I get this right, the larger the single buckets, the closer the two-tier variant performs to the standard variant (in sense of the false-positive rate for a total length of m)?

]]>
By: Edward Kmett http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-6569 Edward Kmett Sun, 08 Mar 2009 23:00:19 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-6569 Christian you might want to check out: http://algo2.iti.uni-karlsruhe.de/singler/publications/cacheefficientbloomfilters-wea2007.pdf Although they concern themselves with clustering within a single cache line, the idea of clustering within a page follows similarly. IIRC they take up like ~32% extra space to get the same precision, but they use VERY small buckets of the same size as a cache line. So that should give you an upper bound for space comparison that you can back into a false positive rate comparison. Christian you might want to check out:

http://algo2.iti.uni-karlsruhe.de/singler/publications/cacheefficientbloomfilters-wea2007.pdf

Although they concern themselves with clustering within a single cache line, the idea of clustering within a page follows similarly.

IIRC they take up like ~32% extra space to get the same precision, but they use VERY small buckets of the same size as a cache line. So that should give you an upper bound for space comparison that you can back into a false positive rate comparison.

]]>
By: Christian http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-6564 Christian Sun, 08 Mar 2009 17:52:39 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-6564 Edward, are you aware of any specific paper discussing theoretical properties of the two-tier Bloom filter architecture? I am specifically interested in a comparison of false-positive rates of two-tier filters and standard filters. Edward, are you aware of any specific paper discussing theoretical properties of the two-tier Bloom filter architecture? I am specifically interested in a comparison of false-positive rates of two-tier filters and standard filters.

]]>
By: Darrell Teague http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-2378 Darrell Teague Tue, 05 Aug 2008 19:02:05 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-2378 Your article was well-written and is very interesting. I was really geeking out on it and I am not even a mathematician. I think the article would have made my ex-girlfriend hot. Anyway, all jokes aside, I am surprised to find Bloom Filter usage very sparse within the general software industry (I am a consultant-of-one running my own company but have large clients in the Fortune 100, start-ups, etc). The case you presented here for being able to efficiently serialize the filter has notable use in web-application farm environments where sessions and other state table information must be migrated between servers (depending on session management strategy employed). Indeed one challenge I am considering is somehow managing some Bloom Filter construct across a RAIN architecture without sacrificing the small memory footprint and performance advantages that Blooms provide. Thanks again. Your article was well-written and is very interesting. I was really geeking out on it and I am not even a mathematician. I think the article would have made my ex-girlfriend hot.

Anyway, all jokes aside, I am surprised to find Bloom Filter usage very sparse within the general software industry (I am a consultant-of-one running my own company but have large clients in the Fortune 100, start-ups, etc).

The case you presented here for being able to efficiently serialize the filter has notable use in web-application farm environments where sessions and other state table information must be migrated between servers (depending on session management strategy employed). Indeed one challenge I am considering is somehow managing some Bloom Filter construct across a RAIN architecture without sacrificing the small memory footprint and performance advantages that Blooms provide.

Thanks again.

]]>
By: links for 2008-06-06 http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-1618 links for 2008-06-06 Fri, 06 Jun 2008 01:28:19 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-1618 [...] The Comonad.Reader » Linear Bloom Filters (tags: algorithm algorithms article data math research programming) [...] [...] The Comonad.Reader » Linear Bloom Filters (tags: algorithm algorithms article data math research programming) [...]

]]>
By: Edward Kmett http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-1607 Edward Kmett Thu, 05 Jun 2008 03:30:51 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-1607 Hah! Sorry Mo. Unfortunately nothing I do these days makes for great screenshots. Maybe I should go back to voxels and stabbing line problems. ;) I suppose you could Bloom a scene's PVS or something, but its not all that useful for its primary purpose because then you wouldn't be able to extract the member cells. Hrmm. That said, I guess it could be used for a quick, "is this model that overlaps these cells probably visible?" tester. I almost referred to the hierarchical representation as a kind of Haar wavelet encoding, but I figured that would cause the last person in the audience to tune me out. P.S. The alpaca on your page still refers to blabberize as 'the technology of 2007' ;) Hah! Sorry Mo.

Unfortunately nothing I do these days makes for great screenshots. Maybe I should go back to voxels and stabbing line problems. ;)

I suppose you could Bloom a scene’s PVS or something, but its not all that useful for its primary purpose because then you wouldn’t be able to extract the member cells. Hrmm. That said, I guess it could be used for a quick, “is this model that overlaps these cells probably visible?” tester.

I almost referred to the hierarchical representation as a kind of Haar wavelet encoding, but I figured that would cause the last person in the audience to tune me out.

P.S. The alpaca on your page still refers to blabberize as ‘the technology of 2007′ ;)

]]>
By: Mo Kakwan http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-1604 Mo Kakwan Thu, 05 Jun 2008 02:28:19 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-1604 With all that talk of bloom filters and even a few mentions of mipmaps I thought you would have thrown in a few screenshots. Ha. With all that talk of bloom filters and even a few mentions of mipmaps I thought you would have thrown in a few screenshots. Ha.

]]>
By: Edward Kmett http://comonad.com/reader/2008/linear-bloom-filters/comment-page-1/#comment-1602 Edward Kmett Wed, 04 Jun 2008 07:01:19 +0000 http://comonad.com/reader/2008/linear-bloom-filters/#comment-1602 Bah. You don't count. You already have access to the original library. =) Bah. You don’t count. You already have access to the original library. =)

]]>