Grouping according to structure
The first step of the grouping means that only one representative from a set of items with equal manifestations with regard to the grouping attribte is taken over into the result. Similar to the notation in SQL where it is absolutely necessary that all attributes used for the grouping must stand in the SELECT clause, in XQuery the variables used for the grouping are initialised in a for/let clause. In case the grouping direction is given by the hierarchy, only one aggregation function is to be used via the "child elements" of the target node of the grouping – in contrast to the duplicate elimination for a grouping via equal values. As a consequence, the representative of a group is represented by the above node.
As an example serves the following XML fragment in which the medical personnel is devided into doctors and nursing staff:
<MedicalPersonnel>
<Doctors>
<Doctor> <Name>Naumann</Name> <Age>32</Age> </Doctor>
<Doctor> <Name>Shore</Name> <Age>27</Age> </Doctor>
<Doctor> <Name>Meier</Name> <Age>25</Age> </Doctor>
</Doctors>
<NursingStaff>
<Nurse> <Name>Guldenstern</Name> <Age>41</Age> </Nurse>
<Nurse> <Name>Murawitz</Name> <Age>65</Age> </Nurse>
</NursingStaff>
</MedicalPersonnel>
On this XML fragment, the average age per occupational group shall be determined as the target of the grouping and the aggregation. As group members the individual Doctor or Nurse elements apply; as representatives the top elements already existing in the structure – Doctors or NursingStafff – are used, so that the following output document shall result from this:
<MedicalPersonnel>
<Doctors> <Age>28</Age> </Doctors>
<NursingStaff> <Age>53</Age> </NursingStaff>
</MedicalPersonnel>
In this example, the query formulation is orientated towards the OQL syntax by passing on the set of elements to be aggregated as parameters of the fn:avg() function.
<MedicalPersonnel>
{
for $p in fn:doc("...")//MedicalPersonnel/*
let $x := $p/*/Age
return
element { fn:node-name($p) }
{ <Age>{ fn:avg($x) }</Age> }
}
</MedicalPersonnel>
The aggregation following the group formation is done via functions. In general, each aggregation function (fn:sum(),fn:avg(), fn:count(), fn:min() and fn:max()) expects a sequence of items with the same data type. Therefore, with regard to the ongoing example of the calculation of the average age per occupational group, for each occupational group the sequence with the different indications of age must be passed on to the fn:avg() function.
For clarification, an auxiliary variable ($x) is used containing all items within a group, i.e. for each individual occupational group the age indication in the form of a sequence. In this case of the grouping in terms of an ancestor node, the variable $x includes all grandchild elements of the current occupational group. In the first iteration the $x variable has the value (32,27,25), resulting from the average age of 28 years for the occupational group of the doctors. In the second iteration $x has the value (41,65), wherewith the average value of 53 years for the nursing staff is determined.
Source: "XQuery – Grundlagen und fortgeschrittene Methoden", dpunkt-Verlag, Heidelberg (2004)
<< back | next >> |