Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees bert.verhees at
Sat Feb 13 12:51:22 EST 2016

On 13-02-16 18:13, Birger Haarbrandt wrote:
> Even as a research associate or independent developer, its sometimes 
> good to take a break from work :) I hope I didn't miss a point in your 
> e-mail and hope that I understood corretly: when you store 
> openEHR-data using XML technologies, you will have to use XML 
> technologies to query that data. To my understanding, also Oracle will 
> need xpath/xquery when you store XML data using the XML datatyped 
> column. If you want to combine this data with other data that isn't 
> based on XML, you will have to convert the external data to XML or mix 
> query languages in the mentioned data bases.

I don't understand the point you make. The leafnodes in an archetype are 
always constraints on primitives, and XML can represent all types of 

Everything that is not a leafnode is a structure, the structure is 
object oriented and easy to convert to XML, where the tag-names are the 
attribute-names, and the type-names are the typenames from the RM-XSD. 
The structure which is represented in XML is also easy to bring back to 
an object oriented structure.

It is just a notation, nothing else.

There are even standard java-classes (I thought in  JAXB)  which support 
this kind of things.

AQL is easy to convert to XQuery, but maybe I would prefer XQuery, and 
it is, for a non technical person possible to use a drag and drop 
interface which represents the archetypepaths.

I am missing your point, or better sad, we are not on the same track of 

Best regards

> In short: It makes sense to not use the openEHR CDR for such queries 
> but export it to something more suited for such purpose. This does not 
> need to be a classical Data Warehouse (it has only limited use for the 
> questions clinicians have) but certainly XML would not be my first 
> choice for the analytics layer.
> Birger
> Von meinem iPad gesendet
> Am 13.02.2016 um 17:56 schrieb Bert Verhees <bert.verhees at 
> <mailto:bert.verhees at>>:
>> Thanks,
>> I always forget it is weekend, as independent developer, there ain't 
>> no such thing as a day of.
>> I agree that mixing query-languages is ugly, and I don't see the 
>> necessity.
>> But maybe, you will explain that next week
>> Bert
>> On 13-02-16 17:46, Birger Haarbrandt wrote:
>>> Hi Bert,
>>> I will post some more thoughts on these things after the weekend :) 
>>> To just give a quick answer: imo it's important to have a flexible 
>>> data format like the one I2B2 uses (roughly said EAV) to mix openEHR 
>>> data with non-openEHR data. Making analysis on XML 
>>> documents/databaes might prevent integration of other sources (and 
>>> even if its possible like in SQL Server or Oracle, queries that mix 
>>> XQuery and SQL might become pretty ugly. And I see no good way to 
>>> provide a drag & drop interface to researchers/physicians to make 
>>> queries).
>>> Cheers,
>>> Birger
>>> Von meinem iPad gesendet
>>> Am 13.02.2016 um 16:24 schrieb Bert Verhees <bert.verhees at>:
>>>> No comments, on the other hand it is Saturday
>>>> I had left out some necessary technical details.
>>>> I will possible build it and then have possible the fastest 
>>>> two-level-modeling engine in the world, which will, of course, also 
>>>> support OpenEHR.
>>>> So it is not really bad that this happens.
>>>> When it is ready, I will make an announcement.
>>>> Best regards and enjoy the weekend.
>>>> Bert Verhees
>>>> On 12-02-16 20:49, Bert Verhees wrote:
>>>>> On 12-02-16 18:26, Erik Sundvall wrote:
>>>>>> if you are experimenting with open source native XML DBs for 
>>>>>> openEHR, it preformed well for "clinical" patient-specific 
>>>>>> querying even though all xml databases we tested were not 
>>>>>> suitable for ad hoc epidemiological population queries (without 
>>>>>> query specific indexing)
>>>>> A very interesting paper. I have some first opinions on that. But 
>>>>> first I need to explain what I think about the matter.
>>>>> I have not prepared the story below, so there may be things which 
>>>>> I write to fast. See it as provisional view, not as a hard opinion.
>>>>> ---------
>>>>> There are relational database-configurations for OLAP and for 
>>>>> OLTP. The combination is hard to find. There are reasons. This is 
>>>>> a classic problem.
>>>>> You need specific indexes for data-mining (OLAP), and for every 
>>>>> extra data-mining query you need more indexes, especially if you 
>>>>> don't have time to wait a night for the result. Those extra 
>>>>> indexes stand in the way for transactional processing (OLTP) 
>>>>> because they need to be updated, and that is unnecessary burden 
>>>>> for the OLTP-processes, as longer as the database exist, the 
>>>>> burden becomes heavier.
>>>>> That is why OLAP and OLTP are not often combined in one configuration.
>>>>> So many professional databases have extra features for OLAP, I 
>>>>> worked, years ago with Oracle for a big Dutch television company, 
>>>>> and my main job was to create indexes for marketing purposes.
>>>>> We ran those unusual queries during the night and stored the 
>>>>> result in special tables, Oracle called them "materialized views".
>>>>> The day after, those views were processed in analyzing software, 
>>>>> like SPSS, and after that, thrown away.
>>>>> It was a database with 900,000 persons in it, and every person had 
>>>>> a lot of history of web-history, personal interests, etc.
>>>>> "How much interest does a person have for opera, and is he able to 
>>>>> pay for opera, is it worth to call him for a ticket-offer, we 
>>>>> cannot call 900,000 persons"
>>>>> These were complex queries based on things the customer bought, 
>>>>> television programs he was interested in, web-activities.
>>>>> That was the kind of thing they did with the database.
>>>>> So this could well be compared with a medical database, regarding 
>>>>> to size and complexity.
>>>>> The same difficulties count for XML databases. That is why XML 
>>>>> databases have also features for creating extra indexes.
>>>>> Oracle, by the way, if it knows the structure of XML (via XSD), it 
>>>>> breaks, underwater, XML into relational data, and store it in a 
>>>>> relational database. It also converts XQuery to SQL.
>>>>> In this way, it has the weakness and advantages of a relational 
>>>>> database, and it needs the extra indexes for unusual queries, but 
>>>>> on the developer view it is XML.
>>>>> -------
>>>>> Comparing XML and relational, for OpenEHR, I favor XML, because it 
>>>>> can easily reflect the structures which need to be stored. It 
>>>>> makes the data-processing layer less complex. There is a lot of 
>>>>> tooling around XML, XML-schema to make the database-structure 
>>>>> known to Oracle, Schematron to validate against archetypes. This 
>>>>> is very matured software, and therefor the complexity is solved 
>>>>> years ago, and well tested. It is hidden complexity, and matured 
>>>>> hidden complexity is no problem.
>>>>> --------
>>>>> And if you want to do data-mining, like epidemiological research, 
>>>>> and you have the time to plan the research, then the classical 
>>>>> database, XML or RDB, is OK.
>>>>> In my opinion, there is not often a real need for adhoc 
>>>>> data-mining (epidemiological research) queries, with result in a 
>>>>> few minutes. They are always planned, and creating the indexes and 
>>>>> storing the result in "materialized views" are part of the work 
>>>>> one has to do for data-mining research on data.
>>>>> So, I don't think there is a real need for this.
>>>>> --------
>>>>> Regarding to XML databases, Oracle has a solution, which can 
>>>>> perform well if it is professionally maintained.
>>>>> This is often a point, because professional Oracle maintaining 
>>>>> regarding to advanced use is very expensive.
>>>>> Another company is MarkLogic. It is said that MarkLogic is better, 
>>>>> but I don't know that from own experience.
>>>>> Both are free to use for developers.
>>>>> You must think of numbers to 35,000 Euro a year for licenses, 
>>>>> which is not very much for a big hospital, but very much for a 
>>>>> small health service
>>>>> The open source XML ExistDB database is not very good for 
>>>>> data-mining, is my personal experience.
>>>>> So, we must ask ourselves, are we solving a problem that no one 
>>>>> experiences?
>>>>> --------
>>>>> There are a few advantages to OpenEHR. Data are immutable, never 
>>>>> changed, never deleted. This makes a few difficult steps unnecessary.
>>>>> The Dewey concepts look very attractive, although it is also 
>>>>> created with deleting and changing data in mind.
>>>>> This is very important for normal company use.
>>>>> But, as said, we don't have that in OpenEHR. Medical data always 
>>>>> grow, it are always new data. An event that has passed will never 
>>>>> change.
>>>>> The only things that change (in OpenEHR they are versioned, but 
>>>>> from the user perspective, they change), that are demographic 
>>>>> data. And one can live with that, create extra provisions for that 
>>>>> demographic database-section, which is only a small part of the 
>>>>> complete database.
>>>>> Often, the demographic database is external anyway.
>>>>> So, my thoughts, maybe Dewey is too good.
>>>>> Path-values (to leaf-nodes) storage is enough
>>>>> The paths, combined with ID's are keys, and are much alike XPath, 
>>>>> so it is easy to store XML in a path-value database.
>>>>> And querying is also easy, because all queries are path based.
>>>>> I think, for OpenEHR this is the fastest solution. But maybe I 
>>>>> overlook something.
>>>>> Maybe I say something stupid. I am not offended if you say so 
>>>>> (maybe in other words ;-).
>>>>> We all want and need to learn.
>>>>> What we still need to do to build this solution is handle the 
>>>>> XQuery grammar and let it run on path-based-database.
>>>>> This is not very easy, but also not very hard. Maybe the 
>>>>> algorithms are already to find.
>>>>> Like the Dewey algorithm this can run on any database, also on 
>>>>> free open source databases.
>>>>> I think you get an excellent performance on Postgres.
>>>>> A path-value database is easy to index, it only needs a few. The 
>>>>> inserting will stay very fast, always.
>>>>> Lets do some calculations, for fun:
>>>>> How many path-value-combinations do you have to store for an 
>>>>> average composition?
>>>>> Maybe 30? How many compositions are for an average patient? 10,000?
>>>>> So every patient needs 300,000 path-values. So you can store 
>>>>> 10,000 patients in 3 billion records.
>>>>> This is not much for Postgres, and the simple indexes needed.
>>>>> When you need to store 900,000 patients you need 90 separate tables.
>>>>> Very cheap, very fast, also for adhoc queries, and easy to 
>>>>> accomplish, I think.
>>>>> I am very interested in opinions on this.
>>>>> Thanks
>>>>> Bert
>>>> _______________________________________________
>>>> openEHR-technical mailing list
>>>> openEHR-technical at 
>>>> <mailto:openEHR-technical at>
>>> _______________________________________________
>>> openEHR-technical mailing list
>>> openEHR-technical at
>> _______________________________________________
>> openEHR-technical mailing list
>> openEHR-technical at 
>> <mailto:openEHR-technical at>
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the openEHR-technical mailing list