March 6th, 2011 |
Published in
Rant, Tech Talk | No Comments
I just installed Android Gingerbread (2.3) on my HTC Droid Eris. To understand the significance of this little achievement, we need to examine the wonderful world of smartphones, and why the market insists on doing a half-assed job.
The Eris was released November 6th, 2009. In the phone world, especially now that the smartphone market is heating up, this is somewhat ancient. It’s second-generation in a fifth and sixth generation world. It will never “officially” have anything higher than Android Eclaire (2.1). I’ve already been running Froyo (2.2) for months. It fixed a ton of bugs plaguing the official release, and improved the overall experience drastically.
How is this possible? Well, since Android is open-source, developers all over the world have been contributing to its advancement. This ancient phone, long since abandoned for the likes of the Droid Incredible and the upcoming Thunderbolt, has its own dedicated developers. There are several ROMs to choose from. Do I want to keep HTC’s Sense? There’s about half a dozen ROMs that use it. How about Froyo? At least that many. Back in December there was one Gingerbread release. Now there are about five, though only three are making any steady progress. I’m running a build of GingerShedBread released just yesterday. I only installed that because the developer of CELB Froyo has gone MIA for about a month, and I figured it was time to try the upgraded OS.
Why is it that a bunch of unpaid dudes can come out with OS ROMs faster than paid manufacturers making the damn phones in the first place? How can some guy who goes by workshed, and another who prefers the moniker tazzpatriot, whip out updated OS releases for a phone all but abandoned by its carrier? And most damning, why are brand new phones which haven’t even been released yet, still using Froyo? The Motorola Bionic, the first dual-core phone in Verizon’s lineup will be Android 2.2. The Thunderbolt? Android 2.2. Is this some kind of joke?
It’s also a foregone conclusion at this point that Android 2.4 will be out some time in April, mostly due to support for dual-core phones. Some like to speculate the lag is because manufacturers want their bloatware and skins on the OS, so they can offer added value. And of course, porting these apps and skins to a “new OS” is what takes so long. But that my friends, is unmitigated bullshit.
Android is a glorified Java application that runs on top of a Linux kernel. Lots of apps on the market have minimum OS requirements, after which they work just fine. One of the reasons a guy in his mom’s basement can release these so quickly is that, once a kernel is compiled for the phone, the rest of the OS works fine. The manufacturers already did the hard parts in porting the kernel and providing necessary drivers for the hardware in the phone itself. After that, it’s beta-testing for glitches. It’s not like Android changes its API with every release. HTC Sense and Motorola Blur will work just fine unaltered on 2.3 or 2.4. If they want to take advantage of the new API calls and features, they can send out OTA updates later.
Do I have to live on the bleeding edge? No, not really. Sure the Eris works better with the new versions because more attention is given to performance, but Froyo wasn’t a terrible OS. Why the push, then? Why not? The whole point of being a manufacturer or a carrier is that you have leverage the average person doesn’t. You can partner with companies and get exclusives, and in some cases, boast early releases nobody else has. You can pay whole teams of developers to build custom modules months before official releases. Smart companies would pay one or more of the XDA developers for each one of their phones in perpetuity, because the relative cost is negligible, and the customer goodwill for doing so is priceless.
EnterpriseDB employs several of the main PostgreSQL core developers. In doing this, they secure for themselves, features that may eventually make it into the official release. They leverage the experience of known entities to instantly attain brand recognition and lead the way in new functionality. Canonical did this with Linux to produce Ubuntu, which is the most popular Linux distribution in the world according to Distro Watch. Handset manufacturers could do this, but they don’t. Phone carriers could do this, but again, they either have no incentive or desire to do so. And why should they, when they already make dump-trucks of cash using current standard operating procedures?
I just wish that one… just one company would buck the trend and produce a viable and truly great entry in the handset market. GeeksPhone is a fucking relic compared to current hardware. Google’s Nexus is a single-carrier pipe dream. Though with Verizon rolling out LTE—which is really GSM redoux—this may be less of a problem in the near future.
Either way, Android proves just why it is the superior OS. Even when the carriers and manufacturers drag their feet, the XDA developers and similar groups can pick up the slack. This is not possible with iOS, Windows 7, WebOS, Blackberry, or any other phone-based OS. I’d like to claim people are tired of having decisions made for them and will flock to Android, but I can only cite my own experience in this regard. About the only real complaint I have about Android is that it’s written in slow, memory-hungry Java. Yuck.
It’s still the best choice for tinkerers, and I for one, give a tinker’s damn.
March 1st, 2011 |
Published in
Database, Tech Talk | No Comments
Ah, query hints. For all those times when the database isn’t doing what you want, they’re a useful tool for forcing the query optimizer to perform your bidding. But in this case, not only is the road to Hell paved with good intentions, it’s paved with a frictionless slide directly into a wood chipper that empties handily into an active volcano. With query hints, be careful what you wish for, because—to the detriment of all you hold dear—you just might get it.
In the Beginning
Before we realize why query hints are necessarily bad, let’s go over a few basics. Think about computer code. Clearly a computer can’t execute typed text as written! Either it has to be transformed into processor instructions, as seen in languages such as C or Pascal, or some intermediate byte-code suitable to an interpreter, such as Python or Java. This is normally the job of some kind of lexical scanner: to interpret the text and, through some convoluted grammatical magic, twist it into computer lingo.
But that’s only half of the story. More advanced compilers add another step, because just executing the raw translations would produce terrible performance. No, most compilers and interpreters have optimizers. They find unused instructions, loops that can be unrolled, processes that can be replaced with local assembler calls instead—any number of tweaks previously available only to experienced assembly programmers.
The query optimizer is doing this for your SQL. But unlike a compiler or interpreter which can target specific and highly derivative chipsets, it is forced to contend with an infinitely amorphous array of tables, views, stored procedures, triggers, constraints, indexes, and all the inherent combinations therein. Ideally, it has to know how much system memory is available, how fast the disks are, how much caching is available, what kind of IO scheduler the OS is using, the size and cardinality of every database object, heuristics on how that data is distributed, and what locks are necessary to preserve data being accessed in massive parallel.
Yet despite all this, query planners have long since exceeded all but the best hand-optimized plans. Few DBAs can possibly contend with all those moving parts, and that’s without accounting for the fact data never stops accumulating, and that too, will drastically alter expected execution paths.
WTF are Hints for, then?
So now we know what query planners are, and what they do. So what then, are hints? Take a look at this BS pseudocode:
SELECT *
FROM my_table USE INDEX (idx_table_last_updated)
WHERE last_updated > CURRENT_DATE - INTERVAL '1 week';
That ‘USE INDEX’ part is the hint. In this case, we’re telling the planner we want it to use the idx_table_last_updated index because we think having the last_updated column in our search makes it the best candidate. This is deceptively easy, especially since it seems so obvious.
Ignore the syntax for a moment, which is a horrible bastardization of PostgreSQL and MySQL. The point is that hints can override the planner’s normal path, because you either know something the planner doesn’t, or the planner is doing something horribly wrong.
For ad-hoc queries, this is actually great. Planners can be very stubborn, and are notorious for insisting their decisions are correct, you filthy human. And hints are just what their name implies: hints. Optimizers are free to ignore them if a hint is stupid or is literally impossible to apply. Besides that, the query optimizer actively hates you.
But in most cases, when a hint is necessary, there’s a very good reason. Maybe the collected statistics on a column just aren’t measuring up. Perhaps it’s reverse-scanning a date index and improperly ignoring a faster seek-based approach on an index you created to speed up a foreign key. Whatever the reason, it’s tempting to just scream “Fuck you, optimizer!” and lobotomize it with a hint or two.
This is where we start using our feet for target practice with a loaded AK47.
Target Practice
Now that your foot has been replaced with a bloody stump, let’s examine precisely what unhappiness transpired. For the sake of argument, try to remember that for most disk-based systems, even in a RAID-10 comprised of a dozen spindles, a random seek is about a hundred times slower than a sequential scan. Then note that, while an index seek is O(log(n)), rows from the table must be fetched independently, and that’s a seek and fetch for each match.
The Data Changed
Oh, unlucky you! Your application started getting really popular lately and it turns out one week of data is actually a tenth of the table. Your million-row table would have been a slow sequence scan, but now it’s an even slower 100,000 disk seeks. Since the table is now bigger than when you wrote the query, the results are coming back slower each time. Worse, the more popular your site gets, the more data is available in the last week, and the slower it’ll be. All because it’s fetching rows individually hundreds of thousands of times. Ouch!
That’s just one possible result. There are dozens more. Predicting how data will change and invalidate a well-intentioned hint is impossible. But consider the hint originally provided a best-case scenario; there’s nowhere to go but down.
A Better Index was Available
Hints have a nasty habit of making it into production code where the query is liable to be modified by someone who may overlook the hint. Our original query was changed to this:
SELECT *
FROM my_table USE INDEX (idx_table_last_updated)
WHERE last_updated > CURRENT_DATE - INTERVAL '1 week'
AND table_format = 'ugly';
The conscientious coder thoughtfully added an index on table_format because she knew it was highly selective. Table formats of ‘ugly’ only occur in 0.001% of that table, making it an index second in performance only to the primary key. The planner wants to use the index on table_format but it can’t because the hint explicitly told it not to.
The Planner got Better
Probably the most insidious of bullets to your foot are improvements in the optimizer or database engine. Suppose the database can now maintain in-memory bitmap indexes for highly accessed objects. Combined with a merge of the table contents, queries can get orders of magnitude faster with no work on your part at all. Unless you used a query hint, in which case, you get only a swift kick in the family jewels.
God Damn it!
Exactly. The development community of PostgreSQL is well aware of these shortcomings, and as a consequence, have not created a formal hint syntax. Sure, you can temporarily disable sequence scans, index scans, nested loops, and so on, but no query can have hardcoded rules to override the optimizer. It’s not that they don’t want hints, but it’s better to spend time making the optimizer better, fixing bugs in its assumptions, improving table and column statistics, and rewriting problematic queries when necessary.
And that really is the right answer. To a certain degree, they may come to a compromise and build hints that don’t override the planner, but give it more information. Yet one could still argue that collecting better statistics would have the same effect, and simultaneously make all queries faster. It’s mostly for this reason that nobody has actually proposed a hint syntax of any kind that wasn’t eventually abandoned in favor of more pressing issues.
But where does that leave a developer or DBA who needs a quick fix for some crazy edge case the planner simply won’t handle properly? Do they just wait for months until a planner fix or improvement comes down the line? No way. DBAs are better than that.
No matter how good the planner is, it can be tricked. Maybe an anti-join:
SELECT *
FROM my_table a
WHERE last_updated > CURRENT_DATE - INTERVAL '1 week'
AND EXISTS (
SELECT 1
FROM my_table b
WHERE a.table_id = b.table_id
AND table_format = 'ugly'
);
Or how about a sub-select?
SELECT *
FROM (
SELECT *
FROM my_table b
WHERE table_format = 'ugly'
)
WHERE last_updated > CURRENT_DATE - INTERVAL '1 week';
Maybe creating a new partial index would work:
CREATE INDEX idx_table_format_updated_format
ON my_table (CURRENT_DATE DESC)
WHERE table_format = 'ugly';
This last one is especially nice and the DESC will work in any PostgreSQL of 8.4 or later. It’s like having a two-column index, but in this case, only indexes the lower incidence ‘ugly’ match, and orders the dates in descending order. So when searching for the most recently updated ‘ugly’ tables, we get a huge speed boost. Why? Because the planner won’t use a reverse index scan, which runs at the same speed as a series of random seeks. Instead, it will use a sequential index scan over a much smaller index.
In the End
The real solution to a bad query is to try and tell the planner what you want and let it decide how to calculate the results. If that doesn’t work, statistics for one or more columns can be increased and that might give the planner a better idea. If that doesn’t work, indexes may be the answer. If that still fails, there is still a veritable plethora of alternatives that won’t make the query perform worse some time in the future because old hints no longer apply.
It’s tempting. It’s right there in the damn manual! Can you resist pushing the shiny, candylike button? All I can suggest is that you try. All that awaits hint users is wailing and gnashing of teeth.
Please don’t use hints. Your sanity and your query optimizer will thank you.
February 28th, 2011 |
Published in
Database, Tech Talk | No Comments
Ok, so I’ve already corrected gaudy and horrible behavior part and parcel with default PostgreSQL installs, but what about that… other open-source SQL database? Is it wrong too? Sure is!
Fixing Your Damn Server Again
My fake system still has 8GB of RAM, but we’ll be using more of it for a MySQL install. Why? Because MySQL’s planner makes different assumptions about memory allocation than PostgreSQL. Remember sysctl.conf? Put this in there:
vm.swappiness = 0
Then invoke it:
sudo sysctl -p
Why didn’t we do this last time? We only gave PostgreSQL about a quarter of available memory, but MySQL is getting three quarters. In doing that, the default swappiness setting will cause the kernel to occasionally swap parts of your running database to disk, causing tons of IO and terrible data performance.
And remember, this is a dedicated server, so we definitely don’t want it swapping out its main function because someone unzipped a log file or something.
Delusions of Grandeur
MySQL has a problem with labels. To be fair, PostgreSQL has the same problem. These databases came around when a gig of memory cost as much as a small car. What’s a huge amount of memory to MySQL? About 512MB.
Ok, I’ll wait until you’re done laughing.
I got that number from the my-huge.cnf file that comes as a documentation example on how you can improve MySQL’s performance by using it as a starting point instead of the default, which is a svelte 16MB. Let’s make this simple. In your /etc/my.cnf file, start with this:
[mysqld]
key_buffer = 256M
sort_buffer_size = 8M
tmp_table_size = 8M
query_cache_size = 128M
Wait, didn’t I just make fun of settings below 512MB? Yes indeed. But there’s a method to my madness. See, MySQL has two storage engines: MyISAM and InnoDB. The settings above are for MyISAM, and even though you shouldn’t be using that, it needs to be at some minimum amount just in case. Yes, that’s the minimum amount. Now for the real settings:
default-storage-engine = InnoDB
innodb_buffer_pool_size = 4G
innodb_additional_mem_pool_size = 32M
innodb_log_file_size = 512M
innodb_log_buffer_size = 32M
Woo! There we go. Yeah, I only gave half the memory straight to the buffer pool, but that’s because we had to leave room for incidentals allocated on a per-connection basis. That setting can go all the way to 6G with a little testing to make sure it won’t exhaust your system memory and cause swapping.
Oh, and so you don’t get tempted to use MyISAM tables, we set the default storage engine to InnoDB. Why? MyISAM tables are locked on a table-level basis. So, any time a record is inserted, updated, or deleted, nothing else can happen. This locking contention results in bad performance under high concurrency. InnoDB, for any of its faults, doesn’t have this limitation.
Friends don’t let friends use MyISAM.
The End!?
Look, I don’t know nearly as much about MySQL as I do PostgreSQL. But apparently there’s an InnoDB plugin that’s… different from the built-in one. It has some extra options that can help performance, but I couldn’t figure out how to even install the plugin. Every time I tried following the official instructions, MySQL wouldn’t even start.
But these settings will at least get MySQL bumped up in performance. Once again, the defaults assume a tiny machine that has several different services running on it. You’re a smart DBA, so you just have a single (or several) server dedicated purely to the database, and it has more RAM than you’ve ever seen. Hell, my 8GB assumption is outright laughable. Servers now should start at 32GB at least, in which case you’d have given 24GB to MySQL instead of between four and six.
But stay tuned… I’ve got an exciting rant planned against query hints!
February 23rd, 2011 |
Published in
Book, Review | No Comments
I just finished reading Daniel Suarez‘s Freedom (TM) in about three days. This is much more a statement of the novel’s quality than my own somewhat glacial reading pace.
There’s a lot in here I really love. Mr. Suarez has clearly done his research, even listing many of his sources, along with the universities and labs inventing the technology he brought to life. Back again are the AutoM8′s, the Razorbacks, and the rest of Loki’s arsenal. Back are the HUD glasses, the enhanced reality, and all the other disruptive technologies held down by corporate interests. And Suarez repeatedly puts them to excellent use.
The best part of the book is its brilliance as an outlet for frustration. Don’t like the idea of millions of micro-transactions from financial companies acting as glorified parasites on the markets? Tired of farmers getting sued for using heirloom seeds or having their fields contaminated? How about media manipulation? Corruption in government? Several harbingers of these questionable practices suffer messily for their transgressions.
That’s not to say it’s a one-sided battle. The Major is the foil all our heroes flail against, one way or another. His paramilitary contacts and corporate backers won’t go down without a fight, and even have a fairly solid plan to subjugate the Daemon itself to secure even more power than they’ve already wrested from the populace. At one point, they liken themselves to God. With enough military hardware and mercenaries to launch simultaneous attacks across the world, discounting them would be a huge mistake.
It’s a hodgepodge that works well in concert with all the moving parts. But there were some issues I had, which I’m not sure how to address. Suarez has a habit of speaking through his characters too freely. More than once, I found myself questioning dialog. Everyone is an expert in conceptual ideologies and theoretical applied technologies, apparently. And while Daemon stretched the credibility of Sobol’s genius, Freedom obliterates it. Nobody can anticipate all the variables he supposedly wrote into the Daemon, especially since they all depend on unreliable humans. Sobol only killed a handful of co-conspirators in Daemon, yet there are enough back doors and counter-programs to give a hundred coders migraines. And who recorded and edited all of his holograms basically outlining the entire plan? Remember, he could have framed any one of a dozen other inspectors other than detective Sebeck.
Ignoring that, it definitely kept me engaged. I work in some of the industries integral to the plot, and it’s easy to identify with his imaginative application of exciting research lurking over the horizon. I highly recommend this to anyone who liked Daemon or any of Neal Stephenson‘s early work.
February 21st, 2011 |
Published in
News, Poetry, Writing | No Comments
In the world that we despise,
are there times of loss or wonder?
Toiling ever, full of lies,
sick of writhing, going under.
In that bleakness waiting never,
'till no senseless drone became.
Wrath or sunder, thrash or sever,
breaking through with none to claim.
And that weakness sups upon us,
gibbers for our souls do slake.
With a thirst so vile and vicious,
we but shiver in its wake.
Thus all reason burns with malice,
shackled minds do shriek and wail.
It's the endless wrath of solace,
which expects all things to fail.
Our existence springs from chaos,
maelstrom forged and wrongly won.
And when entropy does claim us,
that is all we are: undone.
So the onslaught of tomorrows,
crushes bones of beneath the skies.
And the eons fall like shadows,
in the world that we despise.