LinkedIn

Showing posts with label Remove Duplicates from XML Elements. Show all posts
Showing posts with label Remove Duplicates from XML Elements. Show all posts

Monday, May 30, 2011

Eliminating Duplicate Element set using XSLT and XPath

In my previous blogpost here I had described how to remove duplicate element sets from a XML collection using XQuery and XPath constructs.

Well in case you would need to use XSLT this can be achieved relatively much easier.

Here is the XSLT that can be used to remove all duplicate elements from an xml collection for the same example.

<xsl:stylesheet version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="xsl xsd">
<xsl:template match="/">
<FaultLogCollection>
<xsl:for-each select="/FaultLogCollection/FaultLog[not(faultCode=following::faultCode)]">
<xsl:sort select="./faultCode" order="ascending"/>
<FaultLog>
<xsl:copy-of select="./faultCode"/>
<xsl:copy-of select="./faultText"/>
<xsl:copy-of select="./faultSeverity"/>
<xsl:copy-of select="./faultingServiceName"/>
<xsl:copy-of select="./faultLogId"/>
</FaultLog>
</xsl:for-each>
</FaultLogCollection>
</xsl:template>
</xsl:stylesheet>

You can create an XSLT resource in Eclipse and copy paste the above to see this example running. See below to see how

image

image

Use the same input XML for FaultCollection

<FaultLogCollection>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>002</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>002</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
</FaultLogCollection>

And here is the output after running the XSL.

image

Pretty easy. Does the same work as the XQuery in the previous example.

Tuesday, April 26, 2011

Eliminating Duplicate Element set using XQuery and XPath

Well i am not to sure as how often one would encounter a scenario wherein we have to select only the distinct elements when it is defined as unbounded and there are duplicate values.

However if we ever encounter this scenario while using BPEL or OSB we can use a small xquery construct to eliminate the duplicate elements.

Consider that you have a collection of Fault coming as

<FaultLogCollection>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
</FaultLogCollection>

This may very well be a scenario in case we are reading from a DB/File which has duplicate values. Consider that faultCode is supposed to uniquely determine the FaultLog element.

I would use the .xq resource in Eclipse to show how this can be achieved

The schema for to be used for creating the .xq resource can be copied from under

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="FaultLogCollection" type="FaultLogCollection"/>
<xs:complexType name="FaultLogCollection">
<xs:sequence>
<xs:element name="FaultLog" type="FaultLog" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FaultLog">
<xs:sequence>
<xs:element name="faultCode" type="xs:string" minOccurs="0"/>
<xs:element name="faultText" type="xs:string" minOccurs="0"/>
<xs:element name="faultSeverity" type="xs:string" minOccurs="0"/>
<xs:element name="faultingServiceName" type="xs:string" minOccurs="0"/>
<xs:element name="faultLogId" type="xs:decimal" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
  • Open Eclipse with OEPE and create a OSB Configuration Project. 
  • Also create an OSB project a shown below image
  • Right click on the XQProject and create two new folder under it. 
  • Name them as xsd and xq respectively.
  • Right click on the xsd folder and then click New to create a new XML schema. 
  • Name it as FaultCollection
  • Copy the inline xsd content into FaultCollection. image
  • Right click on the xq folder and then click New to create a XQuery Transformation resource as under image
  • Click on the Source view image
  • Copy and paste the following code snippet in the source editor
xquery version "1.0" encoding "Cp1252";
(:: pragma bea:global-element-parameter parameter="$faultLogCollection" element="FaultLogCollection" location="../xsd/FaultCollection.xsd" ::)
(:: pragma bea:global-element-return element="FaultLogCollection" location="../xsd/FaultCollection.xsd" ::)
declare namespace xf = "http://tempuri.org/XQProject/xq/DistinctFaultCollection/";
declare namespace custxf = "http://tempuri.org/XQProject/CustomXq/DistinctFaultCollection";

declare function custxf:distinct-deep ( $nodes as node()* )  as node()*
{
for $seq in (1 to count($nodes))
return $nodes[$seq][not(custxf:is-node-in-sequence-deep-equal(.,$nodes[position() < $seq]))]
} ;
declare function custxf:is-node-in-sequence-deep-equal($node as node()? ,$seq as node()* )
as xs:boolean {
some $nodeInSeq in $seq satisfies deep-equal($nodeInSeq,$node)
} ;

declare function xf:DistinctFaultCollection($faultLogCollection as element(FaultLogCollection))
as element(FaultLogCollection) {
<FaultLogCollection>
{
let $distinctFaultLogCol := custxf:distinct-deep($faultLogCollection/FaultLog)
for $FaultLog in $distinctFaultLogCol
return
<FaultLog>
<faultCode>{data($FaultLog/faultCode)}</faultCode>
<faultText>{data($FaultLog/faultText)}</faultText>
<faultSeverity>{data($FaultLog/faultSeverity)}</faultSeverity>
<faultingServiceName>{data($FaultLog/faultingServiceName)}</faultingServiceName>
<faultLogId>{data($FaultLog/faultLogId)}</faultLogId>
</FaultLog>
}
</FaultLogCollection>
};
declare variable $faultLogCollection as element(FaultLogCollection) external;
xf:DistinctFaultCollection($faultLogCollection)
image
  • Having done this, now we are all set to test the XQuery function in Eclipse. 
  • Click on the the Test tab.
  • Use the sample FaultLogCollection xml posted above to test the function and see that the output of the function has eliminated the duplicate node. image
  • The output collection filters the FaultLogCollection and removes the duplicates.
  • Try with yet another request and see the result data again. 
  • Here is the XML that can be used
<FaultLogCollection>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>001</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>002</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
<FaultLog>
<faultCode>002</faultCode>
<faultText>SystemFault</faultText>
<faultSeverity>High</faultSeverity>
<faultingServiceName>OrderProcessingPipeline</faultingServiceName>
<faultLogId>23</faultLogId>
</FaultLog>
</FaultLogCollection>
image

The OSB Eclipse XQ project used in the example can be downloaded from here.

Unzip to get the OSB jar.