| enlivend ( @ 2008-05-08 15:53:00 |
Finally, I've got the OK to open source some of the work I was doing last year. First step is the fixes I made to jfli (Java Foreign Language Interface), now available for CVS download or as a tarball. It's the first time anyone has posted changes to this library in some years, so I took the bull by the horns and gave it version number 0.2.
For the record, this is what I have changed. It was an interesting object lesson in trying to get two GCs to be nice to each other.
Memory problems:
add-special-free-actiontakes a symbol, not a function. If you give it a function it doesn't do anything. In this case, that meant that all global-refs were leaking on the Java side. That's a hell of a pile-up - run anything for long enough and Java will run out of memory.Lisp processes just accumulated in
*process-envs*, which meant that the associated stacks etc end up leaking on _both_ sides of the fence, i.e. into both Java and lisp. My first attempt at solving this involved a call tomp:ensure-process-cleanupbut...Suppose a new thread allocates before
mp:*current-process*has been set. Thendelete-global-refmight be called, which invokescurrent-envfor the first time on this thread, which goesensure-process-cleanupwith nullmp:process. SEGV. Farfetched? Well, it happened.The "access functions" (calls to anything seen in
defvtable, i.e. all the JNI's calls into the JVM) failed to memoize a dereferencedforeign-slot-valuewhich was the same every time, and so burned 56 unnecessary bytes per shot (in lisp). This was reclaimed by the GC but it messed up the allocation figures when I was out hunting for real leaks.Untimely Finalization
Consider this little problem which exhibited pathological behaviour from time to time:
- The special free actions are not run when a GC occurs inside
mp:without-interrupts, because that could cause a deadlock if the action function claims any locks (e.g. uses hash-tables). Instead of freeing them, the GC just keeps them alive with their special free actions intact. - The system maintains a table of all of the objects marked for special free actions (so the GC can find them all easily). Unfortunately,
flag-special-free-actiontakes O(n2) time for n objects in the table. ["Ouch", says Nick.] - This table is enlarged by
flag-special-free-action, insidewithout-interruptsto make it atomic with respect to other calls toflag-special-free-actionand finalization. - If you're unlucky, all of the GC operations are triggered by the enlargement of this table.
This final aspect completed a vicious cycle: none of the special objects were ever freed, because all of the GC operations occurred inside
without-interruptsand hence their special free actions could not be run at that time. Excessive allocation occured, caused by the enlargement of the table which was always filled again quickly for the same reason. (By "excessive", I mean images bloating to over 1GB in very short order.)The recommended solution from lispworks-support was to manually
mark-and-sweepgeneration 0 every 1000 or so allocations. Without it, you'd occasionally get generation 0 trying to climb over 1GB while you're sat there wondering why your emacs was running so slow.- The special free actions are not run when a GC occurs inside
Non-memory problems:
Exports from JFLI package of
box-integerandunbox-integerinstead of the documentedbox-intandunbox-int- I restored the documented behaviour.I needed more configurable exception handling.
No support for system building - you needed a live JVM connection in order to macroexpand source and so couldn't save the image (well, you could, but when you restarted it you wouldn't be able to connect).
Fixed by upgrading to LispWorks 5.1:
Occurances of java.lang.NullPointerException, java.lang.ArrayIndexOutOfBoundsException,
etc which had no explanation even after reading Sun Java sources. This turned out to be caused by a bug in the FLI, which could leave the CPU's direction flag set incorrectly in some cases. When the direction flag was set incorrectly, some optimized memory copying routines would corrupt adjacent objects. The bug, fixed in LispWorks 5.1, affected 32-bit x86 platforms running Linux, FreeBSD or Mac OS X (not Windows).