In this blog, we examine how the technologies for securing data being developed by the EU H2020 projects, HEIR and CyberKit4SME, can be combined to provide a highly innovative, non-repudiable log of access to medical FHIR data. This log itself is secured by a fine-grained access policy.
Overview of data protection in HEIR
HEIR (www.heir2020.eu) is a 36-month H2020 project that started on 01/09/2020 and concentrates on the security of healthcare environments. As part of this work, we have developed a Privacy-Aware Framework (PAF) which is able to provide fine-grained, policy-driven access to data from disparate sources. In particular, HEIR targets the protection of FHIR resources, where HL7 FHIR defines a protocol and data model for the exchange of healthcare information across storage systems.
Using the Fybrik open source framework (www.fybrik.io), we have created an environment where all requests to the hospital FHIR server must pass through the HEIR PAF and must be accompanied by a JSON Web Token (JWT) which authenticates the requester’s access group. Access policies typically configured by the hospital’s Data Protection Officer restrict the data returned by the FHIR queries at a fine-grained level; not only can resources be blocked for an access group but also individual attributes within FHIR resources can be redacted.
Comprehensive information about every data transaction is written to a Kafka message queue, which is read by a blockchain-based logging component and stored in a blockchain ledger. A more in-depth description of the Privacy-Aware Framework can be found at https://medium.com/fybrik/using-fybrik-to-create-a-privacy-aware-framework-to-access-fhir-data-245aa1a4a6a4.
Overview of Parquet Modular Encryption in CyberKit4SME
CyberKit4SME (https://cyberkit4sme.eu/) is a 36-month H2020 project that started on 1/6/2020, and it aims to democratize a kit of cyber security tools and methods enabling SMEs/MEs to: increase awareness of cybersecurity risks; monitor and forecast risks; manage risks using various security measures; and collaborate and share information in a collective security and data protection effort. As part of this work, IBM Research initiated and led joint work with the Apache Arrow community to expose the high-level API of Parquet Modular Encryption (PME) in PyArrow, enabling users to address critical issues in securing the confidentiality and integrity of sensitive data, without degrading the performance of analytic systems written in Python.
Apache Parquet is the industry-leading standard for the formatting, storage and efficient processing of big data. Parquet Modular Encryption, also initiated and led by IBM Research as joint work with the Apache Parquet community as part of the H2020 ProTego project (https://protego-project.eu/), encrypts Parquet files module-by-module — the footer, page headers, column indexes, offset indexes, pages, etc. Thus, it not only enables granular control of the data based on access to per-column encryption keys, it also preserves all the benefits of efficient analytics on Parquet. This includes column projection and predicate push-down, where entire file parts can be skipped if the metadata indicates that the part has no matching values. Both Apache Spark and python applications using PyArrow can read and write Parquet files, see https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#columnar-encryption and https://arrow.apache.org/docs/python/parquet.html#parquet-modular-encryption-columnar-encryption respectively.
Utilizing CyberKit4SME in HEIR
The HEIR transaction logs pushed down to the blockchain, however, contain metadata which may be considered sensitive, and should only be released on a need-to-know basis to authorized personnel. We can envision scenarios where different roles in the organization should be allowed to see different portions of the logged metadata. We can therefore take advantage of Parquet Modular Encryption being developed in CyberKit4SME for the non-reputable storage and policy-driven access to this data.
For example, consider a transaction log with a number of attributes, and different roles that should be given access to different combinations of the fields in the transaction logs. Using PME, the transaction log can be saved in Parquet format and each attribute (column) can be independently encrypted with its own key. The transaction logs themselves can be stored in any storage, including public storage.
A key management system (KMS), such as the Hashicorp Vault, can be used to manage the encryption keys and control access to them, with KeyCloak as the OpenID Connect (OIDC) authentication backend. When a user authenticates with HEIR using KeyCloak, the user will get an access token and automatically be assigned his/her role. This token is then used when reading the transaction log using PME. Vault will authorize the user to decrypt fields with the encryption keys assigned to his/her role. However, attempting to decrypt a field using keys outside the role definition will fail.