Garbage Collection and Virtual Memory

by Andreas Leitner (modified: 2007 Dec 04)

This seems to be an interesting read http://lambda-the-ultimate.org/node/2552 . In short GC overhead seems to be acceptable by itself. In combination with virtual memory the GC cause a huge overhead because it constantly pages everything in. Conclusion: garbage collectors have to be aware of virtual memory.

Andreas

Comments
  • Colin Adams (16 years ago 4/12/2007)

    A previous article

    The comments reference a previous article, which is interesting:

    Garbage collecting without paging

    I presume the ISE GC does not use such techniques. What about the Boehm collector (as used in gec)?

    In my youth, I was an MVS systems programmer. Assembler programs (I don't know about HLL programs) could call a system service that would fix pages in memory (so they weren't subject to paging). Doing this would be a simple solution to the problem, but I assume that Windows, MacOSX, Linux etc. don't provide such a service. But maybe I'm wrong.

    Colin Adams

    • Colin Adams (16 years ago 4/12/2007)

      Obviously not

      Well I can answer my own question - since the bookmarking collector requires a modification to Linux, obviously no exisitng GCs can be using this technique. Colin Adams

  • Manu (16 years ago 5/12/2007)

    I would say virtual memory needs to be improved to not have paging when not needed. I can't speak of Windows, but on Linux the number of page faults have largely decreased since the first releases. For example compiling EiffelStudio 5.0 would trigger 200,000 page faults, and compiling 5.6 only 3795. This was on the same machine but using a different version of the linux kernel (See $EIFFEL_SRC/Eiffel/doc/benchmarks/).

    • Manu (16 years ago 7/12/2007)

      The real issue is CPU caches

      The feeling I have is that page faults and virtual memory are not really the issue. The main issue is locality of memory and since a GC cycle may traverse a lot of memory, you are penalized if it is not done sequentially by having a lot of cache misses. This slows down the performance of the garbage collection, but not really of the application per say when you ignore the GC.

      Therefore the optimization should be done at the GC level by reducing its intervention for system using a lot of memory.