I started working with flash in 2006 — fortunate timing as flash was just starting to take hold in the enterprise. I started asking customers I’d visit about flash. I’ll always remember the response from an early adopter when I asked about how he planned on using the new, expensive storage, “We just bought it, and we have no idea.” It was a solution in search of a problem — the garbage can model at play.
Flash has evolved significantly since then from a raw material used on its own to a component in systems of increasing complexity. I wrote recently about the various techniques being employed to get the most out of flash; all share the basic idea of trading compute and IOPS (abundant commodities) for capacity (still more expensive for flash than hard drives). The ideal use cases are the ones that benefit most from that trade-off, ones where compression and dedup consume cheap compute cycles rather than expensive space on the NAND flash. Flash storage is best with data that contains high degrees of redundancy that clever software can squeeze out. With those loose criteria, it’s been amazing to me how flash storage vendors have flocked to the VDI use case. It’s certainly well-suited — big on IOPS with nearly identical data from different Windows installs that’s easily compressed and deduped — but seemingly every flash vendor has decided that it’s one part — if not the part — of the market they want to address. Take a look at the information on VDI from various flash storage vendors: Fusion, Nimble, Pure Storage, Tegile, Tintri, Violin, Virident, Whiptail — the list goes on and on.
I worked extensively with flash until leaving Oracle in 2010 when I decided to leave for a start up. I ended up not sticking with flash precisely because it was — and is — such a crowded space. I’d happily bet on the space, but it was harder to pick one winner. One of the things that drew me to Delphix though was precisely its compatibility with flash. At Delphix we create virtual database copies by sharing blocks; think of it as dedup before the fact, or dedup but without the runtime tax. Creating a virtual copy happens almost instantaneously saving tremendous amounts of administration time, unblocking developers, and accelerating projects — hence our credo of agile data. Unlike storage-based snapshots, Delphix virtual copies are database aware, provisioning is fully integrated and automated. Those virtual copies also take up much less physical space, but with as many or more IOPS hitting the aggregate of those virtual copies. Sound familiar yet? One tenth the capacity with the same workload — let’s call it 10x greater IOPS intensity — is ideally suited for flash storage.
Flash storage is best when clever software can squeeze out redundancies; Delphix is that clever software for databases. Delphix customers are starting to combine our product with their flash storage purchases. An all-flash array that’s 5x the $/TB as disk storage suddenly becomes half the price of disk storage when combined with Delphix — with substantially better performance. We as an industry still haven’t realized the full potential of flash storage. Agile data through Delphix fills in another big piece of the flash picture.
I wish that none of our customers encountered problems with our product, but they do, and when they do our means for remotely accessing their systems is often via a Webex shared screen. We remotely control their Delphix server to collect data (often using DTrace). While investigating a customer issue recently I developed a couple of techniques to work around common problems; I thought I’d share them in case others have similar problems — and as a note to my future self who will certainly forget the specifics next time.
Copying and Pasting
Webex makes it fairly easy to copy text from the remote system and paste it locally: just select the text, and that implicitly copies it to the clipboard. I do this very very often as I write DTrace scripts to collect data, and then want to record both the script and the output. To that end, the Mac OS X pbpaste(1) utility is unbelievably helpful; pbpaste emits the contents of the clipboard. For example, I’ll select text in the webex and use pbpaste like this:
$ pbpaste | tee -a data.log
Doing that, I can both verify that I selected the right data, and append it to the log of all data collected. Sometimes, though, the remote data is annoying to copy because I need to scroll up — the mouse latency over webex can make this an exasperating experience. In those cases where the text I want to transfer is longer than a page, I do the following on the remote system:
$ cat output | gzip -9c | uuencode /dev/stdin begin 644 /dev/stdin M'XL(`..C4E`"`]5:W7_;-A!_#Y#_@>@P),&0A,<O5=X2=&LWH`_M]K`^%9TK M2THBU+8\24[3C^UO'TG%L2D1,B6[0ZJG0+[[Z7CWN^,=PRQ-BZ?+?#:-%G<P ...
I then select the text, and back on my mac do this to dump out the data:
$ pbpaste | uudecode -o /dev/stdout | gzip -cd
By compressing and uuencoding the data, even large chunks of output easily fit on one screen. Here are the results on a large-ish chunk of data I copied from a customer system:
$ cat customer.data.txt | wc -l 234 $ cat customer.data.txt | gzip -9c | uuencode /dev/stdin | wc -l 44
234 lines would have had me tearing my hair out as I tried to capture the output, scrolling backward with 250ms screen refresh latency; 44 lines wasn't bad at all. Depending on the exact text I seem to get an 80-90% reduction in lines to copy. Many thanks to Brendan Gregg who had mentioned this technique to me; I hadn't appreciated it fully until I absolutely needed it.
Screen Savers v. Thinking/Lunch
When diagnosing a problem on a customer system, we like to be as unobtrusive as possible, so it's annoying when we need to disturb the customer to enter his or her password because the screen lock has kicked in while I'm thinking about the next step in the investigation, or I'm getting something to eat. Many enterprise environments make it such that the screen saver delay can't be changed. I spent a day a couple of weeks ago bringing my laptop to meetings, and running to get lunch (and elsewhere) so that I could move the mouse at least every 15 minutes.
I didn't want to modify the customer system ("I let you remotely access my computer, and you're installing what?!"). Instead I wanted to programmatically move the mouse every so often on my system to ensure the remote system wouldn't lock the screen. I couldn't find anything pre-fab, but thanks to the tips at stackoverflow, I pieced something together that wiggles the cursor around if it hasn't moved in a little while. I could post it compressed and uuencoded in keeping with the theme above (it's just 17 lines!), but instead I've added a github repo: github.com/adamleventhal/wiggle.
I hope people find these tips useful. Given my penchant for looking up past tips on my own blog, I'm sure at least my future self will be thanking me at some point...
Tonight, my Delphix colleague Zubair Khan and I presented the integration we’ve done with git at the SF Bay Area Large-Scale Production Engineering meetup. When I started at Delphix, we were using Subversion — my ire for which the margins of this blog are too narrow to contain. We switched to git, and in the process I became an unabashed git fanboy.
Git is a powerful tool generally, but in particular has some powerful hook points that we use to enforce our code integration criteria and to do some handy things after we integrate. For this, we wrote some custom bash scripts, and python integrations with Bugzilla and Review Board. You can check out the slides, and we’ve open sourced it all on github with the hope that it might help people with their own integrations.
It's my pleasure to welcome Matt Amdur to Delphix, to the world of DTrace, and -- just today -- to the blogosphere. Matt joined Delphix about two months after 10 years of software engineering, most recently at VMware. Matt and I met in at Brown University in 1997 where we worked together closely for all four years. We've had in the back of our minds that our professional lives would converge at some point; I couldn't be happier to have my good friend onboard.
Matt's first blog post is a great war story, describing his first use of DTrace on a customer system. It was vindicating to witness first hand, both in how productive an industry vet could be with DTrace after a short period, and in what a great hire Matt was for Delphix. Working with him has been evocative of our collaboration in college -- while making all the more apparent the significant distance we've both come. Welcome, Matt!
Today I took the train out to Long Island to meet up with our New York sales team for a visit with a prospective customer. You never know with an initial meeting, but this one was great. I thought I'd share a bit about what made these guys so excited which is the same stuff that gets me excited about what we're doing at Delphix.
First though, there are some engineers who have never spoken with a customer. There are some engineering organizations in which requirements are collected from customers, correlated by product managers, handed to engineering managers, and given to engineers. It's a fine workflow, but this needs to be balanced against engineers engaging directly with customers, hearing their issues, and brainstorming solutions technologist to technologist. Engineers talking to a small number of customers may miss broad trends or fail to connect certain dots, but it's a complementary activity and part of being a holistic engineer.
I've heard software engineers groan that the right technical decision was trumped by business concerns. Those people might be good engineers, but they aren't great ones. Engineering can't stop at the boundaries of software; it must necessarily consider the whole ecosystem of the product and the company. Yes, we might not have architected the feature this way if we didn't have legacy customers support, but we do (and we should be happy for it). (And, of course, this logic can be taken to the other extreme with equally bad results.) This doesn't mean that a great engineer collects all data first hand, but the whole system must be considered, and walking into a customer's office from time to time is a reality check.
In today's meeting, the customer was learning about Delphix for the first time. And they got it right away. As with many enterprises, they have a initiative around virtualization to enable more self-service and more empowerment of their developers. The data in their relational databases is a big anchor weighing down those efforts; the time and effort required to copy and provision databases is a huge drag. Smart guys, they oscillated between how Delphix works -- a super-smart, database-optimized storage gateway -- and what Delphix does -- virtualizing their Oracle databases, bringing the agility and cost-savings of other virtualization technologies. And the slide-ware made real through a demo of the product GUI elicited an terse expression of comprehension: "That's cool."
And maybe the best reason for engineers to get into the field is to witness customers who get how cool the product is.
It's rare to get software right the first time. I'm not referring to bugs in implementation requiring narrow fixes, but rather places in a design that simply missed the mark. Even if getting it absolutely right the first time were possible, it would be prohibitively expensive (and time-consuming) so we make the best decisions we can, hammer it out, and then wait. Users of a product quickly become experts on its strengths and weaknesses for them. Customers aren't beta testers -- again, I'm not talking about bugs -- but rather they expose use cases you never anticipated, and present environments too convoluted to ever conceive at a whiteboard.
When I worked in the Fishworks group at Sun, we learned more about our market in the first three months after shipping 1.0 than we had in the 30 months we spent developing it. We found the product both struggling in unanticipated conditions, and being used to solve problems we could have never predicted. Some of these we might have guessed earlier given more time, but some will never come to light until you ship. That you need to ship 1.0 before you can write 2.0 is a deeper notion than it appears.
I joined Delphix a couple of weeks before our formal launch at the DEMO conference. Since then, we've engaged in more proofs-of-contept (PoCs) and more customers have rolled us into use, and we've continued to learn of new use cases for Delphix Server, and found the places where we needed to rethink our assumptions. And we knew this would be the case -- you can't get it right the first time. Over the past several months, we in Delphix engineering have been writing the second version of the most critical components in our stack, incorporating the lessons learned with our customers. The team has enjoyed the opportunity to revisit design decisions with new information; it's fun to feel like we're not just getting it done, but getting it right.
When building 1.0, you make a mental list of the stuff you'd fix if only you had the time. In the next release, you figure out all the whizzy stuff you now can build on the stable foundation. We're excited for the forthcoming 2.6 release -- more so even for new ideas we found along the way that will be the basis for our future work. We've got a great team working on a great product. Check in on the Delphix blogs in the coming months for details on the 2.6 release and the other stuff we've got in the works.