Sunday, December 30, 2012

Logging Key Events on Mac OS X with JNA

In the previous post I introduced the APIs available in Mac OS X to get access to key events. The task was to write a small Eclipse plugin to log all keystrokes including hot-keys issued by the user while using the IDE. To my knowledge, this is not possible just using Java. The basic idea was to use JNA in order to call the native Objective-C APIs to log the key events.

Road Blocks

The first thing I tried was to write a simple call to Cocoa's NSEvent API (as outlined last time) via JNA. As a reminder, the goal was to implement this one-liner in JNA:

    [NSEvent addLocalMonitorForEventsMatchingMask:NSKeyDownMask 
                                          handler:
      ^NSEvent *(NSEvent * event) {
       /**
        * Here goes your logging...
        */
        return event;
    } ];

Easy, right?

Not quite so easy, as I had to find out. First it's Objective-C: You can't do it in JNA out of the box. As you may know, a message in Objective-C is sent to a class ( javaish: a static method is called ) like this:

   [classname messagename:argument]

The compiler translates this into something that could also be expressed as:

   Class cls = objc_getclass(classname);
   SEL selector = sel_registerName(messagename);
   objc_msgSend(cls, selector, argument);

It starts by looking up the class in the Objective-C runtime (l.1: objc_getclass). Then (l.2), sel_registerName registers a message name with the runtime. In our case, the message has already been registered (remember we are trying to call an existing API), so this function just returns the so called selector identifying the message. The last call actually tries to send the message to the class as the receiver.

This API exposes quite clearly the dynamic nature of Objective-C, which is based on message passing to receivers determined at runtime instead of just calling methods on entities defined at compile time.

You might wonder why these low-level functions are exposed in the Objective-C runtime at all. The reason is that they allow you to write all sorts of bridging layers between Objective-C and other languages. As a matter of fact, there used to be an Apple-provided Java-Objective-C bridge.

Well, not any more. Java on the desktop seems to be considered legacy technology by Apple. So we have to build all this by hand in JNA. Still no big deal, as we just need the three methods just mentioned out of the runtime API.

import com.sun.jna.Library;
import com.sun.jna.Native;

public interface ObjC extends Library {
 
 public static final ObjC INSTANCE = (ObjC) Native.loadLibrary("objc", ObjC.class);
 
 public void objc_msgSend(Object id, SEL theSelector, Object...objects);
 
 public Class objc_getClass(String name);
 
 public SEL sel_registerName(String str);

}
All there is left to do now is to call [NSEvent addLocalMonitorForEventsMatchingMask: handler:] from Java using the methods now available via JNA.
  ObjC objc = ObjC.INSTANCE;
  Class cls = objc.objc_getClass("NSEvent");
  objc.objc_msgSend(cls, objc.sel_registerName("addLocalMonitorForEventsMatchingMask:handler:"), mask.getMask(), block);  

Brilliant. Except it does not work. Why? Well, if you scroll to the very right in the code snippet—ignoring the mask.getMask() statement to define the event mask—you will notice that the last argument of the method is something I clumsily named block, because it's an Objective-C block acting as the callback when a key event is received. So surely this must work just like a function pointer, I thought. Not quite. Time to look at blocks in more detail.

Objective-C Blocks and JNA

Blocks are a C-level language feature built into the compilers that ship with Mac OS X since 10.6. They are closures capturing the enclosing lexical scope. So calling from Java you cannot just use com.sun.jna.Callback instead and pray that it works.

One of the reasons why this won't work is that you would pass in a function pointer where actually a pointer to a block literal would be expected.

A block literal is a C struct of the following form:

struct Block_literal_1 {
    void *isa; 
    int flags;
    int reserved; 
    void (*invoke)(void *, ...); //function pointer
    struct __block_descriptor_1 *descriptor;//omitted for brevity
    // imported variables go here
};

The important part for now is that it contains a pointer to a function with the logic you put into your block. You will find that this function pointer is located at offset 16 (assuming x86_64). The struct contains an isa pointer (8 bytes, so it's structurally also an Objective-C object!) and two integer fields (4 bytes each) followed by the function pointer.

If we then look at how the call of the block looks like in assembly we find another difference from a regular function pointer. We assume that a pointer to the block literal is stored in %rax and a pointer to the argument for the block (in our case an instance of NSEvent) is held on the stack at -32(%rbp). An invocation of the block could look like this:

0x100001452:  movq   %rax, %rdx //copy the block literal pointer to %rdx
0x100001455:  movq   -32(%rbp), %rsi //copy the argument to %rsi
0x100001459:  movq   %rdx, %rdi // copy the block literal pointer to %rdi
0x10000145c:  callq  *16(%rax) //call the function contained in the block literal (offset 16 remember)

The block function is thus invoked with the block literal as the implicit first argument and all other explicit arguments follow shifted by one place. In our case it's just the pointer to the NSEvent instance that comes in %rsi instead of %rdi as you would expect for a regular function call.

Even if you managed to get your function pointer in there, it would never be executed. The calling code expects the actual function pointer at *16(%rax) and it uses a different calling convention with the added block literal argument in the first position.

So in order to make this work from JNA, you would need to fabricate a structure isomorphic to the ones created by the compiler for Objective-C blocks and pass that instead. This would require a change to JNA itself.

Two Solutions

If you are free to choose your dependencies, you could just give up on JNA and use either JNI or give BridJ a try. BridJ seems relatively immature compared to JNA but it implements support for Objective-C blocks. It is actually done by creating an empty block as a template and then manipulating the function pointer inside the struct.

But I did not want to introduce another dependency just for the Mac OS version of the plugin or resort to JNI, consequently my solution was to implement a Quartz event tap using JNA. As explained in the last post, this is a low level C-API intended for the implementation of assistive devices and such. It works with JNA because it does not use blocks. If you are interested in the code it can be found here.

Friday, December 28, 2012

Logging Key Events in Mac OS X

A friend asked me to look into writing a Mac version of a little Eclipse plugin he had been developing for Windows. What it does is it records all your keystrokes in order to allow you to analyse your typing efficiency and to "eliminate" unnecessary keystrokes :-)

I liked the idea as it was a bit on the out of the ordinary side of things. Especially because one of the requirements was to be able to record the use of hotkeys and all of this was supposed to be done from within a Java application (I will cover that aspect in a separate post)

As far as I know, there are three ways of achieving this goal with the public APIs provided by Apple:

  1. Using Cocoa's NSEvent class
  2. Carbon's InstallApplicationEventHandler API
  3. Core Graphics offers a low level C API

Cocoa

By far the easiest approach is to use the Cocoa API. It's basically a one-liner:

    [NSEvent addLocalMonitorForEventsMatchingMask:NSKeyDownMask 
                                          handler:
      ^NSEvent *(NSEvent * event) {

        // Here goes your logging...
        return event;
    } ];

This works great. It gives you all the information you would want about pressed modifier keys etc. The callback is implemented as an Objective-C block, which is nice and concise.

Carbon

If you can't use the Cocoa API for some reason, you could give the old Carbon API a go. Documentation is scarce as Carbon is considered legacy technology by Apple. But it's not hard to come up with something similar to this:

/**
 * The callback function
 */
pascal OSStatus KeyEventHandler(EventHandlerCallRef  nextHandler,
                                EventRef             theEvent,
                                void*                userData) {
    
    // here goes the logging
    return CallNextEventHandler(nextHandler, theEvent);
    
}

/** --------------snip ------------------**/

    EventTypeSpec eventType;
    EventHandlerUPP handlerUPP;
    
    eventType.eventClass = kEventClassKeyboard;
    eventType.eventKind = kEventRawKeyDown;
    handlerUPP = NewEventHandlerUPP(KeyEventHandler);
    InstallApplicationEventHandler(handlerUPP,
                                   1, 
                                   &eventType,
                                   NULL,
                                   NULL);

There is one small catch though: it won't work from Cocoa apps, because the key events will never reach the Carbon event monitor. While I haven't quite figured out why it does not work, my suspicion is that you have to have a Carbon event loop running in order to listen for the raw key events. While there is some bridging code in [NSApplication sendEvent:] calling Carbon's SendEventToEventTarget, it does so only for hotkey events and not for every raw key event.

Quartz

But all is not lost because there are still Quartz Event Taps to help you out. This API was originally designed to support the implementation of assistive devices. That's why the user has to enable the accessibility features in System Preferences for you to be able to install a Quartz Event Tap (or the process must run with root privileges).

The API is similar in structure to the Carbon version. You need a callback function:

CGEventRef keyDownCallback (CGEventTapProxy proxy, CGEventType type, CGEventRef event, void *refcon) {
    /* 
     * do something with the event here
     *  turn Xs into Us if you want ...
     */
    return event;
}
Being a low level API, it gives you much more control about where you place your tap: you can get at the events as they enter the window server, the login session or when they are annotated to go to your application.
    CFMachPortRef keyDownEventTap = CGEventTapCreate( kCGHIDEventTap,
                                                      kCGHeadInsertEventTap,
                                                      kCGEventTapOptionListenOnly,
                                                    CGEventMaskBit(kCGEventFlagsChanged) | CGEventMaskBit(kCGEventKeyDown),
                                                      &keyDownCallback,NULL);

    CFRunLoopSourceRef keyUpRunLoopSourceRef = CFMachPortCreateRunLoopSource(NULL,
                                                                    keyDownEventTap,
                                                                    0);
    CFRelease(keyDownEventTap);
    CFRunLoopAddSource(CFRunLoopGetCurrent(),
                       keyUpRunLoopSourceRef,
                       kCFRunLoopDefaultMode);

    CFRelease(keyUpRunLoopSourceRef);

It is also well worth noting that the method shown above creates a global event tap that listens to all events not just the ones going to your application. There is

CFMachPortRef CGEventTapCreateForPSN(void *processSerialNumber,
  CGEventTapPlacement place, CGEventTapOptions options,
  CGEventMask eventsOfInterest, CGEventTapCallBack callback,
  void *userInfo)
to tap into the events targeted at a specific application.