Archetype relational mapping - a practical openEHR persistence solution

Bert Verhees bert.verhees at rosa.nl
Sun Feb 14 06:01:55 EST 2016


On 14-02-16 00:04, Birger Haarbrandt wrote:
>
> Hi Bert,
>
> I'm not arguing that you can represent most data in XML. I'm just 
> concerned that mangling high volume or specialized data like for 
> example sensor data, genom data and geo-spatial data into a document 
> format might not work too well. Also, when the ER-diagram of 
> non-openEHR data is fairly complex, producing a meaningful XSD and XML 
> documents might not be that quick and easy (at least I don't know of a 
> industry-strength tool that can help with this task. However, I may be 
> wrong about this and I'd be happy to learn).
>

I agree, long ranges of data are not well represented in XML. It has too 
much overhead. (Although there are other solutions for that which are 
easy to integrate with XML, but that aside)

So handle XML as an intermediate representation, good for software to 
handle, it can represent objects very good. So it fits good to a Object 
Oriented paradigm. OpenEHR also works along this paradigm.
XML is a format which has good support for validating and it can 
represent objects very good. It is also widely understood, and almost 
every development-environment has standard support for XML.

There are two kind of related matured industries-supports I am looking 
for. That is a good, well defined query language, and as an extension on 
this, a validation environment.
XQuery and Schematron are excellent technologies which fit very good to 
the two-level modeling (OpenEHR) paradigm, because they are path-based.

JSON is also very good, and it is leaner, especially if sender and 
receiver have deep knowledge about the data (which is the case in 
OpenEHR), then JSON is better. But the industry support for JSON is, as 
far as I know, not as good as it is for XML. But on the other hand, it 
is easy to migrate from XML to JSON and vice versa, even without or 
structure data-loss, see for example
http://www.utilities-online.info/xmltojson/

I don't believe that XML-databases actually store XML. Oracle, for 
example, breaks it up in a relational structure. But I don't know the 
internals of others well. The worst solution, however for storing XML 
would be really storing XML.
In the solution I presented in my email. it is not XML in which I want 
to store data, that is path-value combination (in fact, in detail it 
differs somewhat, this is the base idea. The elaborated idea is 10 times 
as efficient.)

Because, regarding to storage, their are other criteria than for 
validating and communicating data. In storage speed and efficiency are 
very important, and also, a very good and fast implementation of AQL (or 
XQuery)
And when data are retrieved, they can be represented in JSON or XML, or 
whatever one likes, even support for native American smoke signals is 
possible, these are again representations.


> Regarding performance, we did some tests on SQL Server 2012 last year. 
> As I have only experience with this particular database, it might well 
> be that my critique does not apply to Oracle or Marklogic!
>

I am not very impressed by these database-tests, there are so many 
side-factors which are not taken into account.
The JDBC-drivers, for example, the used communication-protocols, the 
indexes, the code of the supporting software-layers, the quality of the 
query-engine, the operating system, the file-system, the 
network-card-driver, etc, etc.
You are testing complete different stacks of technologies.
It is like testing chain, and then concluding that the last shackle is 
no good because the chain breaks somewhere in the middle.

But there is indeed a problem with the old database technologies, and 
that is that they are build for data-manipulation. There are good 
reasons to do that, a bank does not want to process every day your 
complete history, but wants to know you current savings and mortgage 
position. So they modify your current data constantly. The Codd 
normalization is also designed for efficiency and integrity in the 
context of datamanipulation.

When you use a database out of the box then you will see features which 
are needed for constant manipulation.
But you don't need them, because medical data are immutable. This is 
very important.

> Just a minute ago I compared a simple SQL Query with an XQuery on our 
> data repository. I simply wanted to get all validated blood pressure 
> values and their corresponding datetimes of a pediatric icu. Using the 
> plain relational representation of the data (we automatically map data 
> from compositions to tables), it takes under 1 second to get all 
> 329.273 rows. Having a full index on the blood pressure fragment of 
> the composition (this is needed to get the internal tabular 
> representation of the data) and a secondary index on the paths, 
> querying of the same rows still takes 30 seconds (without, it would be 
> 2 minutes. No surprise). Additionally, the size of the data increases 
> from 10MB to 270MB.
>
I can assure you that my database storage requires only a few indexes, 
and also very fast indexes, because data are immutable.
The disadvantage of my solution is that it is not out of the box.
The most important job to do is let the query engine work with the 
data-storage, but there are now new ways to work with grammars, and I 
don't think this is very difficult.

W3 has a lot of information for XQuery grammars
https://www.w3.org/TR/xquery-xpath-parsing/
https://www.w3.org/TR/xquery-30/

When this is done, a database-configuration, designed for speed, on 
every RDB-engine can be used to create this data-processing method.

But I see that we are talking indeed in different tracks of approaching 
the problem. You test out of the box solutions, many people do.
And I think that out of the box, nothing is good enough, because they 
were not thinking of OpenEHR but of a million other 
customer-requirements when designing their database.
And how good and how well designed and how professional and well 
maintained, they will not remove those characteristics which stand in 
your way.

> This is the reality we face in out system, therefore, I 
> consider XQuery and XML not an option for us to do analysis in this 
> database layer. As said, this might not apply to a better 
> implementation of XML by other vendors but I'd love to see some 
> real-world numbers.
>
> Just some thoughts and experiences, I'm not a dedicated database 
> expert, therefore, I would not be sad if I'm proven wrong :)
>

Embrace the good news ;-)

Bert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20160214/a3d544d0/attachment-0002.html>


More information about the openEHR-technical mailing list