Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees bert.verhees at
Sat Feb 13 11:56:30 EST 2016


I always forget it is weekend, as independent developer, there ain't no 
such thing as a day of.

I agree that mixing query-languages is ugly, and I don't see the necessity.

But maybe, you will explain that next week


On 13-02-16 17:46, Birger Haarbrandt wrote:
> Hi Bert,
> I will post some more thoughts on these things after the weekend :) To 
> just give a quick answer: imo it's important to have a flexible data 
> format like the one I2B2 uses (roughly said EAV) to mix openEHR data 
> with non-openEHR data. Making analysis on XML documents/databaes might 
> prevent integration of other sources (and even if its possible like in 
> SQL Server or Oracle, queries that mix XQuery and SQL might become 
> pretty ugly. And I see no good way to provide a drag & drop interface 
> to researchers/physicians to make queries).
> Cheers,
> Birger
> Von meinem iPad gesendet
> Am 13.02.2016 um 16:24 schrieb Bert Verhees <bert.verhees at 
> <mailto:bert.verhees at>>:
>> No comments, on the other hand it is Saturday
>> I had left out some necessary technical details.
>> I will possible build it and then have possible the fastest 
>> two-level-modeling engine in the world, which will, of course, also 
>> support OpenEHR.
>> So it is not really bad that this happens.
>> When it is ready, I will make an announcement.
>> Best regards and enjoy the weekend.
>> Bert Verhees
>> On 12-02-16 20:49, Bert Verhees wrote:
>>> On 12-02-16 18:26, Erik Sundvall wrote:
>>>> if you are experimenting with open source native XML DBs for 
>>>> openEHR, it preformed well for "clinical" patient-specific querying 
>>>> even though all xml databases we tested were not suitable for ad 
>>>> hoc epidemiological population queries (without query specific 
>>>> indexing)
>>> A very interesting paper. I have some first opinions on that. But 
>>> first I need to explain what I think about the matter.
>>> I have not prepared the story below, so there may be things which I 
>>> write to fast. See it as provisional view, not as a hard opinion.
>>> ---------
>>> There are relational database-configurations for OLAP and for OLTP. 
>>> The combination is hard to find. There are reasons. This is a 
>>> classic problem.
>>> You need specific indexes for data-mining (OLAP), and for every 
>>> extra data-mining query you need more indexes, especially if you 
>>> don't have time to wait a night for the result. Those extra indexes 
>>> stand in the way for transactional processing (OLTP) because they 
>>> need to be updated, and that is unnecessary burden for the 
>>> OLTP-processes, as longer as the database exist, the burden becomes 
>>> heavier.
>>> That is why OLAP and OLTP are not often combined in one configuration.
>>> So many professional databases have extra features for OLAP, I 
>>> worked, years ago with Oracle for a big Dutch television company, 
>>> and my main job was to create indexes for marketing purposes.
>>> We ran those unusual queries during the night and stored the result 
>>> in special tables, Oracle called them "materialized views".
>>> The day after, those views were processed in analyzing software, 
>>> like SPSS, and after that, thrown away.
>>> It was a database with 900,000 persons in it, and every person had a 
>>> lot of history of web-history, personal interests, etc.
>>> "How much interest does a person have for opera, and is he able to 
>>> pay for opera, is it worth to call him for a ticket-offer, we cannot 
>>> call 900,000 persons"
>>> These were complex queries based on things the customer bought, 
>>> television programs he was interested in, web-activities.
>>> That was the kind of thing they did with the database.
>>> So this could well be compared with a medical database, regarding to 
>>> size and complexity.
>>> The same difficulties count for XML databases. That is why XML 
>>> databases have also features for creating extra indexes.
>>> Oracle, by the way, if it knows the structure of XML (via XSD), it 
>>> breaks, underwater, XML into relational data, and store it in a 
>>> relational database. It also converts XQuery to SQL.
>>> In this way, it has the weakness and advantages of a relational 
>>> database, and it needs the extra indexes for unusual queries, but on 
>>> the developer view it is XML.
>>> -------
>>> Comparing XML and relational, for OpenEHR, I favor XML, because it 
>>> can easily reflect the structures which need to be stored. It makes 
>>> the data-processing layer less complex. There is a lot of tooling 
>>> around XML, XML-schema to make the database-structure known to 
>>> Oracle, Schematron to validate against archetypes. This is very 
>>> matured software, and therefor the complexity is solved years ago, 
>>> and well tested. It is hidden complexity, and matured hidden 
>>> complexity is no problem.
>>> --------
>>> And if you want to do data-mining, like epidemiological research, 
>>> and you have the time to plan the research, then the classical 
>>> database, XML or RDB, is OK.
>>> In my opinion, there is not often a real need for adhoc data-mining 
>>> (epidemiological research) queries, with result in a few minutes. 
>>> They are always planned, and creating the indexes and storing the 
>>> result in  "materialized views" are part of the work one has to do 
>>> for data-mining research on data.
>>> So, I don't think there is a real need for this.
>>> --------
>>> Regarding to XML databases, Oracle has a solution, which can perform 
>>> well if it is professionally maintained.
>>> This is often a point, because professional Oracle maintaining 
>>> regarding to advanced use is very expensive.
>>> Another company is MarkLogic. It is said that MarkLogic is better, 
>>> but I don't know that from own experience.
>>> Both are free to use for developers.
>>> You must think of numbers to 35,000 Euro a year for licenses, which 
>>> is not very much for a big hospital, but very much for a small 
>>> health service
>>> The open source XML ExistDB database is not very good for 
>>> data-mining, is my personal experience.
>>> So, we must ask ourselves, are we solving a problem that no one 
>>> experiences?
>>> --------
>>> There are a few advantages to OpenEHR. Data are immutable, never 
>>> changed, never deleted. This makes a few difficult steps unnecessary.
>>> The Dewey concepts look very attractive, although it is also created 
>>> with deleting and changing data in mind.
>>> This is very important for normal company use.
>>> But, as said, we don't have that in OpenEHR. Medical data always 
>>> grow, it are always new data. An event that has passed will never 
>>> change.
>>> The only things that change (in OpenEHR they are versioned, but from 
>>> the user perspective, they change), that are demographic data. And 
>>> one can live with that, create extra provisions for that demographic 
>>> database-section, which is only a small part of the complete database.
>>> Often, the demographic database is external anyway.
>>> So, my thoughts, maybe Dewey is too good.
>>> Path-values (to leaf-nodes) storage is enough
>>> The paths, combined with ID's are keys, and are much alike XPath, so 
>>> it is easy to store XML in a path-value database.
>>> And querying is also easy, because all queries are path based.
>>> I think, for OpenEHR this is the fastest solution. But maybe I 
>>> overlook something.
>>> Maybe I say something stupid. I am not offended if you say so (maybe 
>>> in other words ;-).
>>> We all want and need to learn.
>>> What we still need to do to build this solution is handle the XQuery 
>>> grammar and let it run on path-based-database.
>>> This is not very easy, but also not very hard. Maybe the algorithms 
>>> are already to find.
>>> Like the Dewey algorithm this can run on any database, also on free 
>>> open source databases.
>>> I think you get an excellent performance on Postgres.
>>> A path-value database is easy to index, it only needs a few. The 
>>> inserting will stay very fast, always.
>>> Lets do some calculations, for fun:
>>> How many path-value-combinations do you have to store for an average 
>>> composition?
>>> Maybe 30? How many compositions are for an average patient? 10,000?
>>> So every patient needs 300,000 path-values. So you can store 10,000 
>>> patients in 3 billion records.
>>> This is not much for Postgres, and the simple indexes needed.
>>> When you need to store 900,000 patients you need 90 separate tables.
>>> Very cheap, very fast, also for adhoc queries, and easy to 
>>> accomplish, I think.
>>> I am very interested in opinions on this.
>>> Thanks
>>> Bert
>> _______________________________________________
>> openEHR-technical mailing list
>> openEHR-technical at 
>> <mailto:openEHR-technical at>
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the openEHR-technical mailing list