LinkedIn

Tuesday, January 27, 2009

Optimizing XQueries

Well can we imagine programming in the SOA world without knowledge of XML Technologies. As a matter of fact if we are working on ALSB and ALDSP then the knowledge of XPaths and XQueries is of prime importance. Here i would be discussing the optimal practice of writing XQueries.

Explanations of each of the following tips can be found at the end of the article.

DON'TS
Here are a few things that we must seek to avoid:
  1. Don't use eval ()
  2. Don't evaluate expressions several times over, and avoid redundant expressions.
  3. Don't use //
  4. Don't query constructed document fragments
    DOS
    Here are some recommendations for optimization:
    1. Minimize the execution of queries based on a given search expression. Try instead to use navigation paths based on the parent, children and siblings of a node which has already been retrieved
    2. Make appropriate use of indexes adapted to your search criteria.
    3. Code Quality
      TODO
      1. Put $Id$ inside a comment at the top internal documentation of HTTP parameters
      2. Document in Xquery the argument types and the return type
      3. Use meaningful names for variables and functions, without abbreviations, and avoid ambiguous terms
      4. Use Javadoc-style tags as in XQDOC ( http://www.xqdoc.org/ ) : @param, @return
      5. Keep data retrieval separate from result construction
        _______________________________________________________________________________________________

        EXPLANATIONS

        Don't use eval ()

        The snag is, the arguments to the eval () function can't be cached. Beyond that, using eval () leads to a style of programming that's hard to read and to debug. And eval () can always be replaced by a standard expression.

        Don't evaluate expressions several times over and avoid redundant expressions

        Xquery doesn't perform any analysis or optimization of queries akin to what a Java compiler does. So no refactoring of repeatedly-evaluated expressions, no elimination of code that won't be executed, etc. Pay particular attention to repeatedly evaluated expressions, they should be evaluated once only and the result placed into a variable, which also makes for more readable code.

        Don't use //

        $a//b causes a complete traversal of all nodes of which $a is the root in search of an element b. In most cases the location of b is fairly precisely known, and so would be better to specify it.

        Don't query constructed document fragments

        A typical example (to avoid):
        let $e := content (: $e is a constructed document fragment :)
        let result := $e/b/text()

        Minimize the execution of queries based on a given search expression.

        A query like

        res := collection("/db/projects") /a/b [ id = $val ]

        causes a complete scan of an entire collection. Admittedly, queries like this are at the heart of an XQuery (and account for most of its execution time). But once the result $res has been retrieved, it can be efficiently used as a starting point for navigation to its parent, siblings and children:

        $a: = $res / parent::a
        $next-sibling: = $a / next-sibling:a

        Make appropriate use of indexes adapted to your search criteria.

        There are currently three types of user-configurable indexes in Xquery. All require pre-indexation either of the base collection or of specified node-sets in sub-collections.
        • The fulltext index, which indexes lexical tokens ("words" in Western scripts). Indexation can be configured to include or exclude nodes specified using a limited subset of XPath
        • Typed indexes over nodes specified by a limited subset of XPath (called "range indexes" because they permit queries referring to a range of numerical values)
        • Indexes by tag name ("Qname index") http://wiki.exist-db.org/space/jmvanel/New+index+by+QName
        Index 2. is slower than 3., but has two advantages

        The request code doesn't have to be changed in order to use the index with 2, there is no danger of getting wrong results if the indexation hasn't been done.

        Index 3 lacks these advantages, but is almost as fast as a relational database. Such an index cannot be constrained by an XPath, but only by a tag name. Both index and and index 2 are typed (integers or strings), and allow matching by criteria of equality or inequality (comparison).

        Document in XQuery the argument and return types

        Don't write :

        declare function local:add($n, $m) {
        $n + $m
        };
        This is more explicit and auto-documenting. And for the same price you get run-time arguments checking. If you know for sure the types you manipulate, declare them !

        declare function local:add($n as xs:integer, $m as xs:integer)
        as element(result) {
        $n + $m
        };

        Also Keep data retrieval separate from result construction. It is good to create variables of child element nodes that are used in result construction rather than retrieving them every time from the root.

        Monday, January 12, 2009

        Open Source SOA Implementation

        Hardly would there be anybody who hasnt heard or been a part of the SOA hype over the couple of years. It seems as if an entire industry is converging towards this amazing concept of sharing, interoperability and plug-ability. For long software architects for medium and large businesses have always had to go through a nightmare while designing systems that were heterogeneous. Well this is not a blog to present the advantages and disadvantages of SOA, its various pros and cons and the ideas revolving around its concepts.

        Neverthless there has been a host of vendors that have leaped to embrace SOA and have created an entire product suite/stack for implementing it. For instance there is BEA with its entire Weblogic and Aqualogic suite, IBM with its Websphere Development Studio, Tibco, WebMethods and a host of other proprietary vendors.

        I have however tried to make a presentation on how to implement using open source tools and technologies. Here is a slide that evaluates the various options available for an open source implementation of SOA.

        Find it here.