Enterprise Software Hackathons

Thursday, February 28th, 2013Posted by Eric Schrock

At Delphix, we just concluded one of our recurring Engineering Kickoff events where we get everyone together for a few days of collaboration, discussion, idea sharing, and fun. In this case it included, for the first time, an all-day hackathon event. To be honest, it was a bit of an experiment and one where we were unsure of how it would be received. We had all read about, participated in, or hear praise of, hackathons at other companies, but these companies were always more consumer-focused or had technologies that were more easily assembled into different creations. As an enterprise software company, we were concerned that even the simplest projects would be too complex to turn around over the course of a day. Given the potential benefit, however, it was clearly something we wanted to experiment with - any failure would also be a learning opportunity.

Some companies go big or go home when it comes to hackathons - week long activities, physical hacks, etc. We wanted to preserve freedom but be a little more targeted. The directive was simple: spend a day doing something unrelated to your normal day job that in some way connects to the business. People volunteered ideas and mentorship ahead of time so that even the newest engineers could meaningfully participate. The result was a resounding success. Whether people were able to give a demo, sketch on a whiteboard, or just speak to their ideas and the challenges they faced, everyone pushed themselves in new directions and walked away having learned something through the experience.

The set of activities covered a wide swath of engineering, including:

  • Using D3.js for visualizing analytics data
  • "zero copy" iSCSI in illumos
  • web portal for customer data analysis
  • "zpool dump" to store pool metadata for offline zdb(1M) use
  • Real time engineering dashboard to aggregate commits, bugs, reviews, and more
  • "D++" DTrace syntactic sugar: function elapsed time, unrolled while loops, callers array
  • Mobile application to monitor Delphix alerts and faults
  • Global symbol tab completion for MDB
  • Network performance tool
  • Speeding up unit tests
  • Browser usage analytics
  • 'zfs send' to a POSIX filesystem
  • BTrace++ (a.k.a. CATrace) to make java tracing safe and easy
  • New V2P (virtual to physical) mechanisms in Delphix
  • Tools to more easily deploy changes to VMs

For myself, I put together a prototype of a hosted SSH/HTTP proxy for use by our support organization. This was my first real foray into the world of true PaaS cloud software - running node.js, redis, and cloudAMQP in a heroku instance, and it's been incredibly interesting to finally play with all these tools I've read about but never had a reason to use. I will post details (and hopefully code) once I get it into slightly better shape.

Only a fraction of these are really what I would consider a contribution to the product itself, which is where our initial trepidation around a hackathon went awry. No matter how complex your product or how high the barriers to entry , engineers will find a way to build cool things and try out new ideas in a hackathon setting. Everything that people did, from learning how to make changes to our OS to improving our quality of life as engineers to testing new product ideas, will provide real value to the engineering organization. On top of that, it was incredibly fun and a great way to get everyone working together in different ways.

It's something we'll certainly look at doing again, and I'd recommend that every company, organization, or group, find some way to get engineers together with the express purpose of working on ideas not directly related to their regular work. You'll end up with some cool ideas and prototypes, and everyone will learn new things while having fun doing it.

Your MDB fell into my DTrace!

Wednesday, October 26th, 2011Posted by Eric Schrock

Yesterday, several of us from Delphix, Nexenta, Joyent, and elsewhere, convened before the OpenStorage summit as part of an illumos hackathon.  The idea was to get a bunch of illumos coders in a room, brainstorm a bunch of small project ideas, and then break off to go implement them over the course of the day.  That was the idea, at least - in reality we didn't know what to expect or how it would turn out.  Suffice to say that the hackathon was an amazing success.  There were a lot of cool ideas, and a lot of great mentors in the room that could lead people through unfamiliar territory.

For my mini-project (suggested by ahl), I implemented MDB's ::print functionality in DTrace via a new print() action. Today, we have the trace() action, but the result is somewhat less than useful when dealing with structs, as it degenerates into tracemem():

# dtrace -qn 'BEGIN{trace(`p0); exit(0)}'
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 00 00 00 00 00 00 00 00 60 02 c3 fb ff ff ff ff  ........`.......
        10: c8 c9 c6 fb ff ff ff ff 00 00 00 00 00 00 00 00  ................
        20: b0 ad 14 c6 00 ff ff ff 00 00 00 00 02 00 00 00  ................
        ...

The results aren't pretty, and we end up throwing away all that useful proc_t type information. With a little tweaks to dtrace, and some cribbing from mdb_print.c, we can do much better:

# dtrace -qn 'BEGIN{print(`p0); exit(0)}'
proc_t {
    struct vnode *p_exec = 0
    struct as *p_as = 0xfffffffffbc30260
    struct plock *p_lockp = 0xfffffffffbc6c9c8
    kmutex_t p_crlock = {
        void *[1] _opaque = [ 0 ]
    }
    struct cred *p_cred = 0xffffff00c614adb0
    int p_swapcnt = 0
    char p_stat = '02'
    ....

Much better! Now, how did we get there from here? The answer was an interesting journey through libdtrace, the kernel dtrace implementation, CTF, and the horrors of bitfields.

To action or not to action?

The first question I set out to answer is what the user-visible interface should be. It seemed clear that this should be an operation on the same level as trace(), allowing arbitrary D expressions, but simply preserving the type of the result and pretty-printing it later. After briefly considering printt() (for "print type"), I decided upon just print(), since this seemed like a logical My first inclination was to create a new DTRACEACT_PRINT, but after some discussion with Adam, we decided this was extraneous - the behavior was identical to DTRACEACT_DIFEXPR (the internal name for trace), but just with type information.

Through the looking glass with types and formats

The real issue is that what we compile (dtrace statements) and what we consume (dtrace epids and records) are two very different things, and never the twain shall meet. At the time we go to generate the DIFEXPR statement in dt_cc.c, we have the CTF data in hand. We don't want to change the DIF we generate, simply do post-processing on the other side, so we just need some way to get back to that type information in dt_consume_cpu(). We can't simply hang it off our dtrace statement, as that would break anonymous tracing (and violate the rest of the DTrace architecture to boot).

Thankfully, this problem had already been solved for printf() (and related actions) because we need to preserve the original format string for the exact same reason. To do this, we take the action-specific integer argument, and use it to point into the DOF string table, where we stash the original format string. I simply had to hijack dtrace_dof_create() and have it do the same thing for the type information, right?

If only it could be so simple. There were two complications here: there is a lot of code that explicitly treats these as printf strings, and parses them into internal argv-style representations. Pretending our types were just format strings would cause all kinds of problems in this code. So I had to modify libdtrace to treat this more explicitly as raw 'string data' that is (optionally) used with the DIFEXPR action. Even with that in place, the formats I was sending down were not making it back out of the kernel. Because the argument is action-specific, the kernel needed to be modified to recognize this new argument in dtrace_ecb_action_add. With that change in place, I was able to get the format string back in userland when consuming the CPU buffers.

Bitfields, or why the D compiler cost me an hour of my life

With the trace data and type string in hand, I then proceeded to copy the mdb ::print code, first from apptrace (which turned out to be complete garbage) and then fixing it up bit by bit. Finally, after tweaking the code for an hour or two, I had it looking pretty much like identical ::print output. But when I fed it a klwp_t structure, I found that the user_desc_t structure bitfields weren't being printed correctly:

# dtrace -n 'BEGIN{print(*((user_desc_t*)0xffffff00cb0a4d90)); exit(0)}'
dtrace: description 'BEGIN' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0      1                           :BEGIN user_desc_t {
    unsigned long usd_lolimit = 0xcff3000000ffff
    unsigned long usd_lobase = 0xcff3000000
    unsigned long usd_midbase = 0xcff300
    unsigned long usd_type = 0xcff3
    unsigned long usd_dpl :64 = 0xcff3
    unsigned long usd_p :64 = 0xcff3
    unsigned long usd_hilimit = 0xcf
    unsigned long usd_avl :64 = 0xcf
    unsigned long usd_long :64 = 0xcf
    unsigned long usd_def32 :64 = 0xcf
    unsigned long usd_gran :64 = 0xcf
    unsigned long usd_hibase = 0
}

I spent an hour trying to debug this, only to find that the CTF IDs weren't matching what I expected from the underlying object. I finally tracked it down to the fact that the D compiler, by virtue of processing the /usr/lib/dtrace files, pulls in its own version of klwp_t from the system header files. But it botches the bitfields, leaving the user with a subtly incorrect data. Switching the type to be genunix`user_desc_t fixed the problem.

What's next

Given the usefulness of this feature, the next steps are to clean up the code, get it reviewed, and push to the illumos gate. It should hopefully be finding its way to an illumos distribution near you soon. Here's a final print() invocation to leave you with:

# dtrace -n 'zio_done:entry{print(*args[0]); exit(0)}'
dtrace: description 'zio_done:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  42594                   zio_done:entry zio_t {
    zbookmark_t io_bookmark = {
        uint64_t zb_objset = 0
        uint64_t zb_object = 0
        int64_t zb_level = 0
        uint64_t zb_blkid = 0
    }
    zio_prop_t io_prop = {
        enum zio_checksum zp_checksum = ZIO_CHECKSUM_INHERIT
        enum zio_compress zp_compress = ZIO_COMPRESS_INHERIT
        dmu_object_type_t zp_type = DMU_OT_NONE
        uint8_t zp_level = 0
        uint8_t zp_copies = 0
        uint8_t zp_dedup = 0
        uint8_t zp_dedup_verify = 0
    }
    zio_type_t io_type = ZIO_TYPE_NULL
    enum zio_child io_child_type = ZIO_CHILD_VDEV
    int io_cmd = 0
    uint8_t io_priority = 0
    uint8_t io_reexecute = 0
    uint8_t [2] io_state = [ 0x1, 0 ]
    uint64_t io_txg = 0
    spa_t *io_spa = 0xffffff00c6806580
    blkptr_t *io_bp = 0
    blkptr_t *io_bp_override = 0
    blkptr_t io_bp_copy = {
        dva_t [3] blk_dva = [ 
            dva_t {
                uint64_t [2] dva_word = [ 0, 0 ]
            }, 
    ...