A contracted developer recently had me roll a piece of code into production that contained the following nugget:
if (orderRec.image_size >= vpMin_size or vpMin_size is null or vpMin_size = 0))
Nothing too outrageous, then: check image_size, and if it’s greater than or equal to some number supplied when the procedure is invoked, the record qualifies.
What I hadn’t noticed (black marks all round, then, because no-one else had spotted it either!) is that IMAGE_SIZE is a VARCHAR2 column. Yes, really. We’ll come to the reasons why in just a moment!
Meanwhile, we all soon found out about this when the website broke with the infamous ‘ORA-06502 character to number conversion’ error: a record with an image_size value of ‘4.7mb’ had fallen foul of the troubles that ensue when you try to perform mathematical comparisons on the phrase ‘mb’!
I suggested to the developer that we convert the column to a NUMBER data type. Can’t do that, he said, because the field had been deliberately created as free text, because (get this!) they needed to be able to see a “nice” rendering of the image size (that is, one qualified with ‘MB’ or ‘KB’ descriptions) on printed invoices. Turns out, indeed, that this field contains such size gems as “small”, “medium” and “large”!!
Instead, the developer put into production (without telling me first) this fix:
if (nvl(trim(regexp_replace(lower(orderRec.image_size), '[^[:digit:]]', '')), 1 >= vpMin_size or vpMin_size is null or vpMin_size = 0))
Now, I am not a fan of regular expressions at the best of times. They are difficult to read, harder to interpret. They make code dense and obscure -especially when (as in this case) not one word of commentary is made in code comments to explain what it’s trying to do. Moreover, they are a fairly new feature in Oracle, and I don’t like using new features unless they are compelling -providing something you couldn’t do any other way, for example. That’s mainly because Oracle has a habit of not implementing new features very well (think ‘ANSI-compliant join syntax’ in 9i, as a case in point!). I only glanced at it, but typing “REGEXP_REPLACE” into Metalink’s bug database got me 18 matches this morning -at which point, I’d just as sooner not use the things, really!
Anyway, that’s a side issue… because what really got me about this developer’s particular regular expression was that it wouldn’t work! Ostensibly, it seeks to strip out any non-numeric character from the field and turn NULLs into ‘1′ -and in those particular endeavours, it works well. The only slight fly in the ointment is that a full stop (period) counts as a non-numeric character, so that an entry of “4.7mb” gets turned into one of “47″ -inflating the size by a factor of 10 and thus rendering the comparison to ‘vpMin_size’ rather redundant!
When I pointed this out to the developer, I was sent this:
The regular expression is doing exactly what I want it to do… I cant believe we have wasted so much time discussing it.
Well, faced with such supreme confidence in his own coding abilities, what could I do, but repeat my objections to a piece of code which was (a) trying merely to work around a fundamental problem in the data model whereby free-form text was suddenly being used to perform mathematical comparisons and (b) getting it wrong by a factor of 10 in the process.
Happily, persistent paid off in this case:
After reviewing the code again, you are right in that it wasn’t working exactly as I wanted it too
Unhappily, this particular developer is reluctant to change his work habits and instead of implementing a proper numeric field to perform mathematical comparisons with, he came up with this suggestion:
select image_size, (case when instr(lower(image_size),'kb') > 0 then to_number(nvl(trim(regexp_replace(image_size, '[^[:digit:].]', '')), 0))/1024 when regexp_instr(lower(image_size),'mb|megabytes|meg') > 0 then to_number(nvl(trim(regexp_replace(image_size, '[^[:digit:].]', '')), 0)) when isnumber(image_size) = 1 then to_number(nvl(image_size,0)) else 0 end) as image_size2 ...
So, we get two calls to REGEXP_REPLACE to replace the original one, with an extra REGEXP_INSTR to keep things jolly. We still have the ‘strip out all non-numerics’, but by cunningly spotting whether the text being stripped out mentions ‘kays’ or ‘megs’, we can divide appropriately and come out winners. Of course, there is no mention of what to do if the user has entered “0.46Gb” instead of “460Mb”, but why quibble. As the developer puts it:
This will take care of kb (I pulled out TB because no-one would use it.)
Yup, we can just ignore the ‘TB’ (and presumably the ‘GB’ problem) because “no-one would use it”. Never mind that everyone could use it because free-text entry means exactly that: FREE. You can’t know what someone would or wouldn’t use and your code had better be able to deal with all eventualities, or it will break! Let’s just hope some user doesn’t ever a size of “4.7 m. bytes”, shall we?
My developer, however, remains sublimely confident:
This is a temporary but better workaround.
Er, no it isn’t. It’s more complex than ever, and thus no better than before! It seeks to fix the earlier error by ignoring possible conditions, rather than tackling them and thus is, again, no better than before. In fact, it makes assumptions about what things users might type into fields, which makes it much worse than before. It still doesn’t use a number field to do mathematical computations, which is the fundamental problem here.
So, to sum up.
- A developer performs a mathematical computation on a field, not noticing it’s text;
- When the inevitable data type conversion error pops up, his code workaround inflates image sizes by a factor of 10;
- When I point this out, I am told I’m wasting his time and that the code does ‘exactly what I want it to do’;
- When I re-point it out, I finally get acknowledgment that the logic flaw was not, after all, ‘exactly what I wanted’.
- But instead of fixing the data model (”very, very expensive”!) we simply get more complex code than before -which still fails to deal with the logical fundamentals at all.
The DBA’s job can seem ever thus: to annoy and frustrate developers who think that their latest piece of whizz-bangery is ‘exactly what I want’ by pointing out that it probably isn’t (and definitely isn’t what the client who’s paying for it all wants!)
I must say, the old saw about DBAs and Developers being at each others’ throats is just that: a lame old joke with little truth in it. I have always got on fine with all the developers I’ve worked with, though I have seen notes of frustration in their eyes when I check their code for things I might not like instead of just rolling it ‘on trust’! My job is to slow them down, to some extent; or at least to act as a sort of ‘chamber of review’, a House of Lords (or Senate) to the developer’s House of Commons (or Representatives) -and until very recently, I would have said that all the developers it’s ever been my privilege to work with have recognised that, so there have been no major dramas.
But I do now see that there is at least one developer who will (apparently) always see the DBA as a problem, rather than a part of the solution.
Just another little cross for the DBA to bear, therefore…
Update: I’ve reluctantly agreed to another developer-proposed workaround. We’ll change the web page that is used to access this particular field. Users will now select entries from a list of purely numeric values displayed in a combo-box. The letters ‘MB’ will be hard-coded on the web form so users know what their number selection ‘means’. Only numeric data will therefore get sent to the database (though it will still be stored in a VARCHAR2 column). This effectively means a free-form text field is no longer free-form; it also means we’re pretending that a text field is actually a numeric one. But we are also not changing the structure of the table and thus potentially breaking the 1001 reports based on it -at least, not until the business makes an informed decision later on whether or not to go down the route of fixing the fundamentals. I can’t say I’m happy about it, especially because ‘making a decision later’ usually turns out to mean ‘forgetting all about it’… but at least we get rid of some distinctly iffy-looking regular expressions in the meantime.