Archetype relational mapping - a practical openEHR persistence solution
bert.verhees at rosa.nl
Sun Feb 14 06:01:55 EST 2016
On 14-02-16 00:04, Birger Haarbrandt wrote:
> Hi Bert,
> I'm not arguing that you can represent most data in XML. I'm just
> concerned that mangling high volume or specialized data like for
> example sensor data, genom data and geo-spatial data into a document
> format might not work too well. Also, when the ER-diagram of
> non-openEHR data is fairly complex, producing a meaningful XSD and XML
> documents might not be that quick and easy (at least I don't know of a
> industry-strength tool that can help with this task. However, I may be
> wrong about this and I'd be happy to learn).
I agree, long ranges of data are not well represented in XML. It has too
much overhead. (Although there are other solutions for that which are
easy to integrate with XML, but that aside)
So handle XML as an intermediate representation, good for software to
handle, it can represent objects very good. So it fits good to a Object
Oriented paradigm. OpenEHR also works along this paradigm.
XML is a format which has good support for validating and it can
represent objects very good. It is also widely understood, and almost
every development-environment has standard support for XML.
There are two kind of related matured industries-supports I am looking
for. That is a good, well defined query language, and as an extension on
this, a validation environment.
XQuery and Schematron are excellent technologies which fit very good to
the two-level modeling (OpenEHR) paradigm, because they are path-based.
JSON is also very good, and it is leaner, especially if sender and
receiver have deep knowledge about the data (which is the case in
OpenEHR), then JSON is better. But the industry support for JSON is, as
far as I know, not as good as it is for XML. But on the other hand, it
is easy to migrate from XML to JSON and vice versa, even without or
structure data-loss, see for example
I don't believe that XML-databases actually store XML. Oracle, for
example, breaks it up in a relational structure. But I don't know the
internals of others well. The worst solution, however for storing XML
would be really storing XML.
In the solution I presented in my email. it is not XML in which I want
to store data, that is path-value combination (in fact, in detail it
differs somewhat, this is the base idea. The elaborated idea is 10 times
Because, regarding to storage, their are other criteria than for
validating and communicating data. In storage speed and efficiency are
very important, and also, a very good and fast implementation of AQL (or
And when data are retrieved, they can be represented in JSON or XML, or
whatever one likes, even support for native American smoke signals is
possible, these are again representations.
> Regarding performance, we did some tests on SQL Server 2012 last year.
> As I have only experience with this particular database, it might well
> be that my critique does not apply to Oracle or Marklogic!
I am not very impressed by these database-tests, there are so many
side-factors which are not taken into account.
The JDBC-drivers, for example, the used communication-protocols, the
indexes, the code of the supporting software-layers, the quality of the
query-engine, the operating system, the file-system, the
network-card-driver, etc, etc.
You are testing complete different stacks of technologies.
It is like testing chain, and then concluding that the last shackle is
no good because the chain breaks somewhere in the middle.
But there is indeed a problem with the old database technologies, and
that is that they are build for data-manipulation. There are good
reasons to do that, a bank does not want to process every day your
complete history, but wants to know you current savings and mortgage
position. So they modify your current data constantly. The Codd
normalization is also designed for efficiency and integrity in the
context of datamanipulation.
When you use a database out of the box then you will see features which
are needed for constant manipulation.
But you don't need them, because medical data are immutable. This is
> Just a minute ago I compared a simple SQL Query with an XQuery on our
> data repository. I simply wanted to get all validated blood pressure
> values and their corresponding datetimes of a pediatric icu. Using the
> plain relational representation of the data (we automatically map data
> from compositions to tables), it takes under 1 second to get all
> 329.273 rows. Having a full index on the blood pressure fragment of
> the composition (this is needed to get the internal tabular
> representation of the data) and a secondary index on the paths,
> querying of the same rows still takes 30 seconds (without, it would be
> 2 minutes. No surprise). Additionally, the size of the data increases
> from 10MB to 270MB.
I can assure you that my database storage requires only a few indexes,
and also very fast indexes, because data are immutable.
The disadvantage of my solution is that it is not out of the box.
The most important job to do is let the query engine work with the
data-storage, but there are now new ways to work with grammars, and I
don't think this is very difficult.
W3 has a lot of information for XQuery grammars
When this is done, a database-configuration, designed for speed, on
every RDB-engine can be used to create this data-processing method.
But I see that we are talking indeed in different tracks of approaching
the problem. You test out of the box solutions, many people do.
And I think that out of the box, nothing is good enough, because they
were not thinking of OpenEHR but of a million other
customer-requirements when designing their database.
And how good and how well designed and how professional and well
maintained, they will not remove those characteristics which stand in
> This is the reality we face in out system, therefore, I
> consider XQuery and XML not an option for us to do analysis in this
> database layer. As said, this might not apply to a better
> implementation of XML by other vendors but I'd love to see some
> real-world numbers.
> Just some thoughts and experiences, I'm not a dedicated database
> expert, therefore, I would not be sad if I'm proven wrong :)
Embrace the good news ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openEHR-technical