Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees bert.verhees at rosa.nl
Sat Feb 13 10:24:26 EST 2016


No comments, on the other hand it is Saturday

I had left out some necessary technical details.
I will possible build it and then have possible the fastest 
two-level-modeling engine in the world, which will, of course, also 
support OpenEHR.
So it is not really bad that this happens.

When it is ready, I will make an announcement.

Best regards and enjoy the weekend.
Bert Verhees


On 12-02-16 20:49, Bert Verhees wrote:
> On 12-02-16 18:26, Erik Sundvall wrote:
>> if you are experimenting with open source native XML DBs for openEHR, 
>> it preformed well for "clinical" patient-specific querying even 
>> though all xml databases we tested were not suitable for ad hoc 
>> epidemiological population queries (without query specific indexing)
>
>
> A very interesting paper. I have some first opinions on that. But 
> first I need to explain what I think about the matter.
> I have not prepared the story below, so there may be things which I 
> write to fast. See it as provisional view, not as a hard opinion.
> ---------
> There are relational database-configurations for OLAP and for OLTP. 
> The combination is hard to find. There are reasons. This is a classic 
> problem.
>
> You need specific indexes for data-mining (OLAP), and for every extra 
> data-mining query you need more indexes, especially if you don't have 
> time to wait a night for the result. Those extra indexes stand in the 
> way for transactional processing (OLTP) because they need to be 
> updated, and that is unnecessary burden for the OLTP-processes, as 
> longer as the database exist, the burden becomes heavier.
> That is why OLAP and OLTP are not often combined in one configuration.
>
> So many professional databases have extra features for OLAP, I worked, 
> years ago with Oracle for a big Dutch television company, and my main 
> job was to create indexes for marketing purposes.
> We ran those unusual queries during the night and stored the result in 
> special tables, Oracle called them "materialized views".
> The day after, those views were processed in analyzing software, like 
> SPSS, and after that, thrown away.
>
> It was a database with 900,000 persons in it, and every person had a 
> lot of history of web-history, personal interests, etc.
> "How much interest does a person have for opera, and is he able to pay 
> for opera, is it worth to call him for a ticket-offer, we cannot call 
> 900,000 persons"
> These were complex queries based on things the customer bought, 
> television programs he was interested in, web-activities.
> That was the kind of thing they did with the database.
>
> So this could well be compared with a medical database, regarding to 
> size and complexity.
>
> The same difficulties count for XML databases. That is why XML 
> databases have also features for creating extra indexes.
> Oracle, by the way, if it knows the structure of XML (via XSD), it 
> breaks, underwater, XML into relational data, and store it in a 
> relational database. It also converts XQuery to SQL.
> In this way, it has the weakness and advantages of a relational 
> database, and it needs the extra indexes for unusual queries, but on 
> the developer view it is XML.
> -------
> Comparing XML and relational, for OpenEHR, I favor XML, because it can 
> easily reflect the structures which need to be stored. It makes the 
> data-processing layer less complex. There is a lot of tooling around 
> XML, XML-schema to make the database-structure known to Oracle, 
> Schematron to validate against archetypes. This is very matured 
> software, and therefor the complexity is solved years ago, and well 
> tested. It is hidden complexity, and matured hidden complexity is no 
> problem.
> --------
> And if you want to do data-mining, like epidemiological research, and 
> you have the time to plan the research, then the classical database, 
> XML or RDB, is OK.
>
> In my opinion, there is not often a real need for adhoc data-mining 
> (epidemiological research) queries, with result in a few minutes. They 
> are always planned, and creating the indexes and storing the result 
> in  "materialized views" are part of the work one has to do for 
> data-mining research on data.
> So, I don't think there is a real need for this.
> --------
> Regarding to XML databases, Oracle has a solution, which can perform 
> well if it is professionally maintained.
> This is often a point, because professional Oracle maintaining 
> regarding to advanced use is very expensive.
> Another company is MarkLogic. It is said that MarkLogic is better, but 
> I don't know that from own experience.
> Both are free to use for developers.
> You must think of numbers to 35,000 Euro a year for licenses, which is 
> not very much for a big hospital, but very much for a small health service
> The open source XML ExistDB database is not very good for data-mining, 
> is my personal experience.
>
> So, we must ask ourselves, are we solving a problem that no one 
> experiences?
> --------
> There are a few advantages to OpenEHR. Data are immutable, never 
> changed, never deleted. This makes a few difficult steps unnecessary.
> The Dewey concepts look very attractive, although it is also created 
> with deleting and changing data in mind.
> This is very important for normal company use.
>
> But, as said, we don't have that in OpenEHR. Medical data always grow, 
> it are always new data. An event that has passed will never change.
> The only things that change (in OpenEHR they are versioned, but from 
> the user perspective, they change), that are demographic data. And one 
> can live with that, create extra provisions for that demographic 
> database-section, which is only a small part of the complete database.
> Often, the demographic database is external anyway.
>
> So, my thoughts, maybe Dewey is too good.
> Path-values (to leaf-nodes) storage is enough
> The paths, combined with ID's are keys, and are much alike XPath, so 
> it is easy to store XML in a path-value database.
> And querying is also easy, because all queries are path based.
>
> I think, for OpenEHR this is the fastest solution. But maybe I 
> overlook something.
> Maybe I say something stupid. I am not offended if you say so (maybe 
> in other words ;-).
> We all want and need to learn.
>
> What we still need to do to build this solution is handle the XQuery 
> grammar and let it run on path-based-database.
> This is not very easy, but also not very hard. Maybe the algorithms 
> are already to find.
> Like the Dewey algorithm this can run on any database, also on free 
> open source databases.
> I think you get an excellent performance on Postgres.
>
> A path-value database is easy to index, it only needs a few. The 
> inserting will stay very fast, always.
>
> Lets do some calculations, for fun:
> How many path-value-combinations do you have to store for an average 
> composition?
> Maybe 30? How many compositions are for an average patient? 10,000?
> So every patient needs 300,000 path-values. So you can store 10,000 
> patients in 3 billion records.
> This is not much for Postgres, and the simple indexes needed.
> When you need to store 900,000 patients you need 90 separate tables.
>
> Very cheap, very fast, also for adhoc queries, and easy to accomplish, 
> I think.
>
> I am very interested in opinions on this.
>
> Thanks
> Bert
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20160213/af3a2e18/attachment-0002.html>


More information about the openEHR-technical mailing list