Interdum stultus opportuna loquitur...

Friday, December 28, 2007

AdminRant: USRant Note...

Note - from June 24th 2009, this blog has migrated from Blogger to a self-hosted version. Click here to go straight there.

The more astute amongst the rant family will have noticed that for some time I have been rather remiss as regards the U.S. Post-Market Report - a.k.a. USRant. This is because I have developed an almost pathological aversion to all things Microsoft, and as such have removed Excel from my hard drive. To this end, all of the wizardry that creates the 'generated' bits of USRant - which I had written as Excel macros - was rendered defunct. (We faced - and overcame - the same problem wth the OzRant).

I have promised myself that I would do the 'port' of the USRant Excel logic across to PHP, the better to automate (via 'cron') the process by which the automatic stuff is generated. In other words, I want to do the same migration as I have already done with OzRant, which now works perfectly.;

That said, I also;I want to add some 'fuzziness' to the sentence structures so that the generated bits of both Rants don't always read exactly the same. It's two days work (assuming that my fat-fingered typing style generates large numbers of syntax errors which defy debugging... if I get it right first time it will take me three hours). Anyway, it will hold up the porting across of USRant for a little while yet.

You might wonder why this matters, so I will explain in a little more detail (because I just know you're interested...)...

Consider the following bit of the OzRant (I've coloured in 'calculated' variables so that you can see how it works currently):

At the other end of the market-cap spectrum lie the denizens of the ASX Small Ordinaries (XSO) - the place where non-mania excess returns lie. The small end of the market had a significantly worse day than its large-cap counterpart. The Small Ords slid modestly, falling 15.9 points (0.41%), closing out the session at 3840.6 points.

The green bits of text are just numbers - no fuzziness there. They are just data or manipulations of data.

The other coloured bits rely on levels or ratios of outcomes. They vary depending on the magnitude of the non-fuzzy stuff.

So if the Small Ords performance for the session only just exceeds, or just falls short of, the ASX20, then the  bit of blue text will say something like 'performed pretty much in line with'. If the Small Ords performs significantly worse (better) than the ASX20, the language becomes what you see in the blockquote. And if the out (under) performance is striking, there's different language again.

Likewise the second coloured bit: a fall of under 0.2% will make the purple text say 'retreated mildly', 0.2%-0.5% is 'slid modestly', 0.5%-1.5% is 'fell rather sharply', and anything above 1.5% is 'was hammered to the tune of'... or something like that. 'Falling' can become 'dipping', 'sliding', 'losing', 'plummetting' and so forth. 

You get the picture.

But here's what I would prefer: before any of the underlying calculations are performed, I want the actual sentence structure to be determined by the selection of several random numbers: these would determine the order of sentence phrases. 

The other thing I would change is this: at present there is also only one outcome for each 'band' - a single adjective for a given percentage range, for example (e.g., the orange text choices). I would prefer to have a set of, say, three for each 'band' of outcomes... one of which would be chosen at random once you know which band you're in.

The reason for this, is that I would like the RantScreed to change every time the auto-generated bits were generated. At present, if I push the button marked 'Create RantScreed', it will create an IDENTICAL RantScreed for a given set of data. That is, if the data is for December 28th 2007, every piece of the generated stuff will be the same no matter how often you push the button.

But imagine if it was set up so that every time you hit the button, there were subtle changes in phrasing, word order, adjective selection and so on. 

That would, in my view, be neat.

So if I can get it to work properly, you could wind up with this

A much harder time of it was had by the little stocks that make up the ASX Small Ordinaries (XSO). Although historically this has been a good place to find underpriced value, today saw significant underperformance compared to the big end of the market-cap spectrum. The drop was hardly earth-shattering - under half a percent - with Small Ords registering a closing print of 3840.6 points, representing a decline of 15.9 points (0.41%) for the day.

or this -

The ASX Small Ordinaries (XSO) index slid 15.9 points (0.41%) to 3840.6 points. That's not 'off a cliff' bad, but it is still significantly worse than both the ASX20 and ASX100: today the big end of town got the nod from the voters.

I've highlighted in red the pieces of text that would vary (selected at random from the list) once the structure had been determined (by selecting at random from some small number of alternatives. Note that in the two examples, the sentence structure itself has changed relative to the first (blue-highlighted) original. I think I will have four alternatives for each identifiable paragraph in the Rants. 

Overall, the exact same data would provide an almost infinte number of reports with the same broad structure, but which would read very differently - the difference would be caused by a stochastic vector variable, with elements which would determine overall structure, as well as selecting from a range of alternatives for each key adjective or phrase. And as I thought up new ways to express the same sort of thing, the range of alternatives would increase.

OK, so maybe youre not interested. But I think it's going to be pretty neat.