I’m starting a little series here featuring interviews with industry luminaries. I figured people might like to hear what they have to say. My first subject is Joe Celko.
Q1: Tell us a little about yourself, in case somebody does not know you.
A1: I was a member of the ANSI X3H2 Database Standards Committee from 1987 to 1997 and helped write the ANSI/ISO SQL-89 and SQL-92 standards. I had been writing about Software Engineering in the trade press for years, so I just switched over to writing about the SQL standards as they emerged.
Two decades later, I have six books — #7 is in the works now — and around 800 columns in the computer trade and academic press, mostly dealing with data and databases.
Q2: What trends do you see in the Database Field?
A2: The obvious one is that databases are getting bigger. Just a few years ago, nobody used the prefix “Peta-” on anything in IT. A Terabit was huge! Today, you are just starting to hear about Exabytes (EB) and worse!
Why are databases getting bigger? It’s not just more data; it is also more data sources. I have been teaching an intro RDBMS Design course on line for MySQL AB for the past few months. One of the Power Point slides shows some of the storage engines they can sit on top of. There are well over a dozen options from tiny embedded databases to federated systems.
Another cause is that more and more programmers are working with a database. There was a survey in the trade press this month that said about 60% of all developers are using a database today; it used to be about 40% in their last survey.
Q3: So how can we deal with all this extra data?
A3: The first thing is to get metadata in place. Without it, you cannot control all of the data sources pouring into your enterprise. In the old days, we did not have Google so finding industry standards was a lot of leg work and mountains of paper. Today, you can look up anything in less than a day. When I am teaching a college RDBMS class, I hand out 3×5 cards with the something that you might want to model in a database with a standardized scale of some kind — shoe sizes, tires, whatever. Nobody has ever failed and they usually come back with several scales or standards.
Q4: What is the big news in hardware?
A4: Parallelism. We are looking at multicore processors being the normal state of affairs in the next decade. Intel has already announced 80 processors on a single chip. But the problem is that we do not have the software for that architecture. This month’s DR. DOBBS JOURNAL has an article about writing a graph searching algorithm for the IBM Cell processor. The code is much larger than the traditional “mono-processor” solution but it runs orders of magnitude faster. The authors had to handle all of the processor and register assignments by hand. You need to have a compiler that will do that for you.
SQL is the only major language that has a history of parallelism because it is based on sets. You can partition a set, do quite a few things with each partition, then union the results back together to get an answer for the whole set. This is the basis for several VLDB products, such as Teradata, Kognitio and SAND engines.
Remember “Maxwell’s Demon” from physics? It is an imaginary creature that only permits faster moving molecules to go past it and thus can keep things hot on its side of the room. Well, we can have “Celko’s Demon” — a processor devoted to a single row or column or partition in a table. When a query comes into the engine, each demon decides if he needs to ignore it or to pull up his data and pass it over to a higher level demon.
Q5: What about end users?
A5: There are more of them, too. Do you use a phone book to look up a telephone number or do you Google it? How many electronic newspaper and magazine editions do read every day? How do you shop now?
Your preferences and history are part of any serious commercial shopping site. Every one of us has gotten an email from Amazon.com telling us that people who bought a copy of some book also liked another book. It is the same with NetFlixs and every on-line company.
This kind of volume will have another hardware effect. Hard drives are going to die and be replaced by solid state devices. Right now I have a 512 Kbytes memory stick I use to hold my PowerPoint slides. The stick was a give-away at a trade show, not a personal purchase.
If I have tens or hundreds of thousands of end users trying to get information from my database, at pretty much the same time, then physically moving a read/write head across magnetic storage media is not going to work. I have to get access at the speed of electricity or light, not at electric motor speeds. If I cannot get a fast enough response, then the customer is not going to stay around and wait. In the old days, when we were using telephones with modems to process credit cards, the end users would hang up and re-dial if they did not get some signal in 3 seconds. One solution was to have red light or tone that went on in 1-2 seconds; it had nothing to do with the connection, but it made the end user feel good.
We are already hitting the limits today; that is why sites are mirrored. But at some point, you cannot keep throwing hardware at the problem. You need to throw a different kind of hardware at it. Solid state devices also have another advantage over hard drives; no moving parts to break. We have gotten very good at making reliable chips. Right now, solid state disk replacements are relatively small and expensive. Moore’s Law will take care of the size problem and the need to have a fast response time will make any extra cost worth it.
Q6: You’re sometimes perceived as a bit of an ANSI SQL zealot and as being a bit acerbic toward people on newsgroups who don’t agree with you. Do you think this is a fair perception? Would you change it if you could? What do you have to say to your critics about it?
A6: It a fair perception and it is deliberate. My wife is a Soto Zen Monk who could beat you with a stick, so I am a pussy cat. I do it to get the attention of the poster.
Look at most of the posters in the newsgroups. They do not post DDL — often because they have no idea what it is — and invent their own pseudo-code. They have no idea what a spec is, but post requests for ways to implement an approach they have locked into — and it is usually a non-relational approach at that.
What we have are some really lazy programmers who want to use the newsgroups to do their job or homework for them. Even worse, they want to get an instant college education which is not possible in a short reply. The questions they ask can most often be answered by (1) RTFM, well BOL (2) “Try it and see” (3) a quick Google search in the newsgroup to which they are posting.
What they get instead from most replies is a kludge to get rid of them. If this was a woodworking newsgroup and someone posted “What is the best kind of rocks to pound screws into fine furniture?” are you really helping them when you say “Granite! Use big hunks of granite!” I am the guy who replies with “Your question is bad. Don’t you know about screwdrivers?” And I like to remind them that it takes six years to become a Journeyman Union Carpenter in New York State. Not Master, Journeyman.
I began my programming career in the Cold War, doing defense and medical systems. These are fields where bad programming kills the wrong people and you do not get a do-over. I did research on Software Engineering and wrote columns in the trade press on it for years before I became “the SQL Guy” for a living.
I am amazed at the amount of “cowboy coding” I see today. I have even had one guy email me about my “ANSI Standards hang up” telling me that I was just like those Structured Programming bastards that got rid of the GOTO statement . He was arguing for the return of spaghetti code!
I just made a few bucks inspecting a system that violated the principle of a tiered architecture. Their consultant decided it was faster to format some of the data in back end instead of the front end (leading zeros and UK versus US dates). The results were a very inconsistent database since his stored procedures might not quite match anyone else’s, but he might have saved 15 minutes a year in computing time. Wow.