|
This piece is aimed at researchers, primarily in high energy physics, who use the ROOT analysis software and find it lacking in oh, so many ways. I'm going to whinge about the things that annoy me in ROOT and then suggest a few of the things that I do to minimise the pain. Suggestions and comments are very welcome. Apologies in advance for the fact that this article may be implicitly HEP-political, but I genuinely believe that ROOT's poor design is a very dangerous thing for particle physics.
Before beginning, I should point out that these are simply my own views and that I hold no animosity against the developers — their design simply doesn't work for me. Presumably there are many people "out there" who think ROOT an excellent piece of software. In complete honesty, though, I have yet to meet any of them. In fact, I've never had any complaints that this article mis-represents ROOT, and I've had a fair bit of "fan mail", not mention discussions with well-respected developers and physicists who hold precisely the same views :-)
You may also be interested in my articles on dealing with some of ROOT's flaws and a wishlist for things to be fixed.
Update (03/08/2006): I thought I should update this page after some substantial discussions on the ROOT mailing list about this page and, more particularly, the ROOT page on Wikipedia, to which I added a criticism section. Here are some Web links to the mailing list archive:
The discussion isn't yet over, since Rene has yet to respond to the technical points as promised. I'm sending another mail today to "prod" the ROOT team into responding :-)
The ROOT Wikipedia talk page also contains some very informed discussion (and also some not so informed!). But maybe I just like it because it includes phrases like this: In my experience, many people who use ROOT at least have vague feelings that it is making their life more difficult then it rightly should. Nearly everyone I know that writes code that other people use feel even more strongly that ROOT's poor design leads to productivity losses. I grant that it is less frequent that someone levels the criticism as succintly and accurately as Andy. :-)
Update (14/02/2006): I'm at the CHEP06 conference in Mumbai, India and between getting annoyed about the level of ignorance with respect to ROOT's flaws, I've decided to split and re-name these articles. Hence, this one is now just a guide to ROOT's problems. Suggestions of personal solutions to selected problems (as opposed to system solutions of the deeper issues) can now be found in my "basic ROOT" article. There is also now an article comprising a wishlist of things to be fixed in ROOT, for it to become an analysis system worthy of the field in which it is used.
Update (20/01/2006): I've just re-written and expanded a lot on this article, spurred on by a contact from Philippe Canal from the ROOT development team. Personally, I've moved to a new HEP job where I don't have to use ROOT and I have to say it's been very refreshing! But ROOT is still an important phenomenon in HEP, with the LHC fast approaching, and I think the points I've already made (and the new ones I've just added) are still entirely valid. As ever, comments on this article and my views are more than welcome.
Update (28/04/2004): I've added a couple more hideousnesses that sprang to mind, being global objects, global state and those disgusting string arguments that get passed all the time. I should also comment that I've heard reports that the ROOT developers are now suggesting that compiled ROOT scripts are a better idea than CINT and that maybe this new fangled STL thing is worth supporting.
ROOT can be an awkward piece of software — unless you want to use the defunct PAW program for your data analysis there's not really anything else around that handles histograms and ntuples in the way that particle physicists have come to expect.
Some of the ideas in ROOT are good — a set of robust (in principle at least) libraries which provide common HEP objects like histograms, data trees (`ntuples') and statistical/discrimination/fitting algorithms is an excellent idea. However ROOT has failed to meet its promise for several reasons:
std::string and containers, which are not
properly supported in ROOT (see the next point), data formats and interfaces like
AIDA, FITS and HDF5, and code documentation with Doxygen (ROOT's own C++
documentation class is a travesty by comparison with Doxygen's syntax and flexibility).
int foo(3); rather than
int foo = 3;). As CINT is the work of Matsuhara Goto, perhaps we should
refer to its shortcomings as "Goto considered harmful" in best comp-sci in'joke
tradition :-).
I will now consider several of these points in more detail:
std::string function arguments can transparently
handle char*[] old-style C strings and are much safer
and more powerful.
void test(TH1* histo1, TH1* histo2) {
THStack* hs = new THStack();
if (0.5 < rand()) {
hs->Add(histo1);
} else {
hs->Add(histo2);
}
delete hs;
}
int main() {
TH1* histo1 = new TH1F(/* ... */);
TH1* histo2 = new TH1F(/* ... */);
test(histo1, histo2);
delete histo1;
delete histo2;
return EXIT_SUCCESS;
}
The code will core dump either on "delete histo1" or "delete histo2",
because the THStack destructor deletes the contained elements, even though it doesn't own them.
To use code like this, the test method has to copy the passed histos, a needless
waste of processor power. Gah.
g++ from within CINT and compile your ROOT macros. You'd think that
that might involve taking your single file with a bunch of user macros and building
a binary library file from them, i.e. adding the standard C++ and ROOT header #includes
and so-on behind the scenes so that any macro that will run in CINT can be compiled
in ACLiC. But that isn't the case: ACLiC needs the full set of header
declarations that a full C++ program needs to already be in the file to be compiled.
And it can't handle the splitting of user classes into header and implementation files,
which seems to be necessary. In addition, if ACLiC fails to compile your macros file
(probably for one of the above reasons e.g. missing #includes), then debugging the
failure point in ACliC is very hard, specifically because it uses lots of temporary
files but doesn't map the C++ compiler errors back to the CINT macro file, so the
reported error won't be easily reconcilable with any of your input files. Aaargh.
In short, ACLiC requires you to have written your macros as if they're C++ programs
to be compiled (with full C++ syntax strictness: none of the sloppiness encouraged by
CINT will work), but actually makes things harder for you than if you ran the C++ compiler
explicitly because it obsfucates the compiler output. Nice one, ACLiC.
_hs->Draw("HIST,E,9,NOSTACK");
What the hell sort of argument is that? For starters I don't get to
specify which TCanvas to draw it on to; instead I have to do some
sort of hideous gROOT->cd("mydirectoryname"); crap first. And second,
that string is performing the role that a set of class enums (although varadic methods
are fairly mining in their own right) or, better, a config object should be doing.
Add to this that the string parsing is apparently quite forgiving and you're
in for a nightname experience. Why would you do something as horrible as this? Step
up CINT and interactive use. Lovely.
void*, and that's simply unacceptable in a C++ system. Surely
there are other C++ persistence interfaces that don't have this problem (using
RTTI or similar)?
In short, ROOT sucks more than a warehouse full of hoovers, and isn't likely to change its ways any time soon, to the detriment of the entire field.
The best thing to do, in my opinion, would be to take what there is of ROOT and to split it into a kernel and a set of modules and for the whole thing to take the form of a C++ library rather than an executable. The executable is really secondary to the class structure. In addition, the class structure needs overhauled, STL compliance needs to be introduced, standard I/O formats and interfaces need to be developed, external solutions need to be dropped into place in many cases, and so-on. It's a big job and I can't see it happening :-(
Next-best, or possibly best given the unfeasability of the above and the existence
of better systems anyway, is to move your analysis to a multi-stage one which ignores
ROOT as much as possible [see footnote], uses Hippodraw,
JAS or the BaBar StatPatternRecognition code to do the statistical analysis, and uses
something like PyX or
jFig to produce the
publication-quality plots, again using a standard data file format (or even just
columned ASCII files) for communication in the final step. Although these programs
don't (currently) support 3D plots, I don't believe that these often give information
that can't be expressed more clearly in several 2D plots. The exception is rendering of
actual 3D systems like detector structure, which admittedly can be useful in event
reconstruction analyses.
Actually, I like this "modular" statistical analysis and presentation idea most
of all: I've only put "rehacking ROOT" as the most desirable solution due to its
large, established user base, since personally I'm more than happy to leave ROOT alone
entirely. You might find my
list of HEP software
to be useful if you are similarly-minded. I see
definite parallels here with the Unix "small tools, each of which does its job well"
philosophy here: it's peculiar that high-energy physics has set its heart so firmly
on monolithic systems given a) its traditional centring around Unix computing and b)
the obvious success of the Unix philosophy. But maybe not that surprising, given that
many physicists treat computing methods with contempt, as something that gets in the
way of producing good work. Hmph (rant over!).
As a next-to-next-best approach, if you really aren't allowed to use anything other than ROOT (maybe you depend on a bunch of ROOT analysis macros written by someone else), we can try to use the good bits of what ROOT and minimise the interaction with the lame bits. For me, this is luckily no longer the case. Anyway, this primarily involves ignoring CINT entirely and using ROOT as a library set. Note that you will still have to deal with the world's worst class structure! Hence, in addition I try to write STL wrapper classes of my own when possible. This tends to occur on an ad-hoc basis. Note that if ROOT had been done right in the first place, no-one would ever have to do any of these things. You can find some workarounds described in my article on basic root usage, which in fact contains entirely of workarounds since any attempt to do robust statistical analysis in ROOT is made hideously complicated by its flaws! If I haven't convinced you of that by now, I never will :).
Thanks for reading and please feed back your thoughts to me. Hopefully someone will listen and ROOT can be made into a well-designed, robust data analysis system for the LHC.