Fri 16 May 2008
Does anyone know of any work on "forgetful laziness?"
The basic idea being that for each thunk instead of overwriting it with the answer as usual in call-by-need, you'd just write a forwarding pointer, and allow GC to collect the answers over time.
This results in recalculation and may be subject to thrashing, so the obvious fix would be either
- a 'forget at most once' policy, which would only mitigate the kind of memory leaks you get from laziness under limited conditions, but which has a worst case payout of doubling the workload or
- an exponential backoff on how often you'll try to recollect a given value, which should preserve for practical purposes the asymptotic behavior of any algorithm, but with a much larger constant for pathological access patterns. [Edit: it may affect asymptotic behavior, because you could lose sharing]
This would allow the recollection of large CAFs, etc. eventually once they had bitrotted long enough.
Not sure if its worth the cost of recalculating and of storing any backoff counter, but most of the horror stories you hear about Haskell center around its occasional horrific memory usage profile.
Tuning points might include studying average thunk lifetimes to construct thunk access profiles rather than use a naive exponential backoff.
It also may exascerbate the opposite problem where naive code often builds up a tower of thunks it needs to evaluate all at once in order to provide an answer (i.e. when working with a lazy accumulating parameter).
May 16th, 2008 at 6:37 pm
This is basically weak references, yes? I looked once upon a time at combining automatic memoization with weak references (useful for dynamic programming), but I never got around to completing it.
May 16th, 2008 at 9:37 pm
Its a little bit different. The goal is much the same, but with a thunk you have a delayed computation that you may need the result of along with a closure of the values you need to actually generate the result.
When you force the thunk, you usually (in call-by-need) update it in place to be the result, unless you know that you only need the result once.
Here the idea is to allow the thunk to ‘unforce’ itself over time, but to remember that it was unforced by tagging the thunk (or ticking a counter in it similar to a mark cost) to cause it to become harder to ‘unforce’ in the future.
Weak references don’t keep their values alive at all. Here a thunk would slowly bitrot. Once you ‘forget’ a value, the weak reference to it never becomes valid again. Here if you look at the same thunk again and you regenerate the value it becomes valid again, but of course thunks themselves are subject to garbage collection.
In some senses there is conceptual overlap between memoization and thunking, but a memo-table gives you some control over bounding the amount of stuff you hold on to, while the only time you get benefit from a thunk is when two computations reference the same thunk.
If you are interested in automatic memoization you might find some of Umut Acar’s papers interesting starting with something like: http://ttic.uchicago.edu/~umut/papers/ml05.pdf
January 7th, 2009 at 4:19 pm
> …and allow GC to collect the answers over time.
> …but most of the horror stories you hear about Haskell center around its occasional horrific memory usage profile.
Isn’t the common complaint about lazy evaluation that together with the unevaluated thunks you have to keep their environment around, thus preventing its constituent variables from being GC’ed even if they are not referenced from anywhere else. The answers/results aren’t so much a cause for surprise since you also have keep them around in a strictly evaluated language which is what most people are used to. So wouldn’t your solution actually in many cases aggravate the problem, since even if you have already evaluated a thunk you have to keep the variables from its environment around just in case it needs to be reevaluated.
To be sure I can think of a few cases where “forgetful laziness” or even call-by-name instead of call-by-need would reduce memory usage, the trouble is that a good compiler or GC heuristic for deciding when it might be appropriate seems not so obvious to me.
January 7th, 2009 at 5:45 pm
Well, there are two scenarios. In one you can get huge chains of unevaluated thunks in code that isn’t sufficiently strict for simple accumulated (+1) type modifications. That is pretty much what you describe. One potential remedy is to gamble on optimistic evaluation techniques like from Robert Ennals’ dissertation. Though, I seem to recall hearing that he himself seems to think lazy semantics isn’t a good default. Perhaps burnout from implementing all of that machinery to get optimistic evaluation to work in GHC?
Another leak happens in CAFs. This is the one I sought in a fairly poorly thought out manner to resolve here.
Because there always exists a reference to the object, if you have a CAF that is, say, an infinite list of primes and you ever evaluate the millionth prime, then you keep the entire spine of the list forever.
If I still thought it to be a viable idea the answer would probably be to only forget stuff that was bigger than the environment contained in the thunk according to some measure, but then that leads to the question of what constitutes a robust metric for size in the presence of cycles, etc.
Alternately I suppose you could consider all references from an evaluated thunk to be weak references. That would let the garbage collector have its way with them, and you would only get to forget evaluations that still had fully intact environments.
January 8th, 2009 at 9:05 am
> …he himself seems to think lazy semantics isn’t a good default
For me it works quite well so far and doesn’t get in the way too often. I can’t complain. Is there really such a thing as a good default evaluation semantic, or are we just so used to the workarounds we employ in conjunction with eager evaluation that we no longer notice them anymore? Take forward iterators for iterating through the results from a query on some container. Instead of a forward iterator we could just return a list with all the result values. With strict evaluation this might be a bit of a problem since keeping the entire list in memory takes a bit of space, whereas with lazy semantics we no longer need any iterators and we can work with all those wonderful list functions like foldr instead of duplicating them for iterators. Thus, seen from this angle forward iterators are just a workaround for the lack of lazy lists, albeit we usually don’t think of them in this way. (In the iterator example call-by-name as well as your “forgetful-laziness” scheme might be even more of an improvement, just in case we want to keep a reference to an earlier entry whilst traversing. Lazy evaluation might keep too much of the spine in this case).
> …Perhaps burnout from implementing all of that machinery to get optimistic evaluation to work in GHC?
Can’t comment on it, never took a look at the GHC source nor at his optimistic evaluation papers (although I always wanted to take a look at GHC to see if I could figure out a way to try some of my ideas).
> …huge chains of unevaluated … That is pretty much what you describe
Mostly, but I also had thunks like “a ! 1″ in mind, where “a” would be some huge array that isn’t referenced from anywhere else. Once the thunk is evaluated the GC could usuallly free the array. However with forgetful evaluation we have to keep the array around since the thunk might be reevaluated.
> Because there always exists a reference to the object, if you have a CAF that is, say, an infinite list of primes and you ever evaluate the millionth prime, then you keep the entire spine of the list forever.
It depends, doesn’t it? Consider a list of naturals instead of a list of primes according to the following (otherwise maybe somewhat unusual) code fragments
nats1 = 1 : map (+1) nats1
nats100 = drop 99 nats where nats = 1 : map (+1) nats
print (nats1 !! 99)
print (nats100 !! 0)
As it is written every list entry in nats1 and nats100 depends directly on the previous entries result (which is somewhat simpler than for a prime sieve where each entry directly depends on many previous ones). Now forcing (nats1 !! 99) and (nats100 !! 0) basically do the same, return the 100′th entry from a list of naturals. Except of course in the case of nats1 we keep the evaluated spine with the first 100 entries around for later reuse whereas with “forgetful laziness” we could blissfully discard the spine and only keep the first thunk. However for nats100 we might get somewhat the opposite problem. With lazy evaluation, after forcing (nats100 !! 0) there is no reference left to “nats” (the “where” part in nats100), the head of nats100 contains the evaluated value and the tail consists of a thunk that references the value in the head, but no other previous values, thus the gc could discard the entire front of the spine from nats up to the head from nats100, keeping only the head of nats100 itself. With “forgetful laziness”, if the GC starts to traverse the root pointers, first encounters nats100, it might discard the result in the head, thereby being obliged to keep the thunk which references the previous element, the gc follows this reference, discards the result and again needs to keep the reference to the next previous element in order to be able to recompute the result and so on until the first thunk for the head of “nats”. So here we would end with a cascading chain of thunks when using “forgetful laziness” whereas normal “laziness” could collect most of the spine only keeping the head. Given how similar “nats1 !! 99″ and “nats100 !! 0″ appear to us otherwise, their different behaviour under the two evaluation schemes might come to us as a a bit of surprise.
> Alternately I suppose you could consider all references from an evaluated thunk to be weak references. …you would only get to forget evaluations that still had fully intact environments.
This is interesting, if we had “nats100′ = drop 99 nats1″ and force “nats1 !! 99″ and “nats100′ !! 0″, then might the space usage depend on the order in which the gc encounters the roots of nats1 and nats100′ ? Might we get a cascading chain in one case and keep the evaluated spine in the other? I might have to think through it.
> … an infinite list of primes and you ever evaluate the millionth prime, then you keep the entire spine of the list forever.
It just occured to me, that you might have the same problem with the ‘forget at most once’ policy but in an indeterministic way. Suppose again a list of naturals as above instead of primes and you are evaluating ‘nats !! 999999′. If the gc starts collecting after the evaluation, then indeed the entire spine gets collected and only the head of nats is left. However imagine the gc starts collecting during the evaluation of ‘nats !! 999999′, say when already the next to last value has been computed, then the gc would collect all those elements and mark them as ‘collected once’. When the gc has finished, the computation for the last element would start, triggering recomputation of all the previous elements. However the next time the gc starts, it can’t recycle anything of the spine except for the last element since all the others have already been ‘forgotten once’. Now if we had a program that has a hundred such lists and cases and let each fully evaluated list take up 100MB, then depending on when the gc starts in the best case the program won’t take up more than around 100MB and in the worst case it might take 100*100MB if the gc collects at the most inopportune moment thereby preventing all those spines from getting collected. So sometimes the program would trash the memory and in other cases it would run just fine, you might not notice the problem until many runs latter. This could be fun, think of the outcry: “My program just runs fine most of the time, but sometimes it completely trashes the memory on the same input. I have now already spent weeks on debugging it but I just can’t figure out the reason. Help!” I guess there are less surprises with lazy-eval.
January 8th, 2009 at 7:52 pm
The non-deterministic problem with any forgetful laziness policy is the bit I had posted the [Edit:] block in the original post for, commenting on with the possible loss of sharing. If you had relied on shared thunks for your asymptotic bounds as in, say, dynamic programming then the forgetful laziness thing can be really bad. ;)
The problem I was alluding to with the CAF list was a little bit simpler than the environment problem you describe. If you have a CAF, it is always considered referenced and is hence uncollectable.
So something as simple as:
nats = [1..]
when used at the module level, will never get collected. This is mostly so things like global MVars can be created using unsafePerformIO, etc.
The only way I was able to redeem the idea of forgetful laziness was to execute it in a setting that is fully hash consed ala linear lisp, but there are even issues with that, mostly when it comes to trying to find a good hash function that can work recursively, since you _can_ get cycles in a lazy setting – unlike linear lisp – due to the presence of thunk evaluation.
It can work if you do something like modify GRIN or the STG to perform hash-consing lookup on let binding, but that also has corner cases, Interestingly, in a GRIN setting, using hash consed thunk construction is effectively memoizing every function application that isn’t inlined.
Mixed with speculative evaluation to avoid explicit construction of small thunks, it has always struck me as an amusing toy.
It all runs into theoretical results w.r.t. optimal lambda calculus reduction strategies that prove that in the general case you can’t get it right all the time, but it is fun trying to get it close.