Data Privacy in SQL Server based on Hippocratic Database Principles

Editor's Note: The following MVP Monday post is by SQL Server MVP Jasmin Azemović, Ph.D. and is part of our special series on SQL Server 2012.

 

Case study: Relational modeling part of eUniversity system

Abstract

Ensuring privacy in modern information systems is of primary importance for the users of these environments. Use and trust of users certainly depends on the degree of privacy.  The solution for the above mentioned problems can be found in application of the “Hippocratic Databases – HDB concept". The idea is inspired by the basic principles of Hippocratic Oath to be applied on the databases in order to provide data privacy and confidentiality

Introduction

The whole idea of Hippocratic databases (HDB) was inspired by the basic principles of the Hippocratic Oath with the purpose of preserving privacy and secrecy in modern information systems. Initial research was presented by (R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases).  Those two goals are one of the important issues in the process of analyzing, designing and projecting. There is research in the area of using this concept in business intelligence systems (Bhatti, et all., 2008) and medical information systems, but not in eLearning systems. The whole idea is that it is possible to implement HDB principles for eLearning systems and all its specific components following ten principles of HDB and recommended standards.

An approach often taken is to enforce privacy policies at the application level. First, the application issues the query to the database and retrieves the result. Then, the application scans the resulting records and filters prohibited information (for example, by setting it to null). However, this approach leads to privacy leaks when applied at the cell level.

Take, for example, student Denis who decided to hide his name from the list of exam results in mathematics and that the system executes the following hypothetical query:

SELECTFirstName, LastName,Score

FROMResults

WHEREsubject = 'Math'

 

In this case the student's name will be hidden, but not his rating because this is not the Privacy Policy. Very quickly it could be found that it was Denis although his name is not on the list. This way of defining the policy for the principal partial disclosure should be more flexible to user needs.

So, we came to the privacy issues, why is it so important to users?

Privacy is the right of every individual: when, how and how much information will be available for storing and exchange between systems. Examples of violating privacy are not rare; on the contrary, the numbers of intentional and unintentional violations are increasing.

Hippoctratic database in eUniversity environment

There is a growing trend of usage of all kinds of eLearning systems (LMS, CMS, eLearning, eUniversity). Those systems are collecting a great amount of data and a big part is very sensitive because it deals with private issues of students which can be misused. New trends require putting three security elements in all phases of developing eLearning information systems. Elements like electronic student files, grades, exams and mobility of study are very complex to build and maintain. That is because we need to protect content, services and personal data from outside intruder and also these systems carry a risk of privacy violation from inside staff (administers and educational staff). Solving this problem in eLearning environments, will provide similar solutions in other areas like: eGovernment, eHealth, eCommerce etc.

One of the solutions can be applying research from Hippocratic database (HDB) areas. The whole idea is inspired with the basic principles of Hippocratic Oath which are applied to the database systems.

HDB is based on ten principles which, if they are used and implemented properly, can provide and guarantee privacy of data.

1.   Purpose Specification

2.   Consent

3.   Limited Collection

4.   Limited Use

5.   Limited Disclosure

6.   Limited Retention

7.   Accuracy

8.   Safety

9.   Openness

10.Compliance

 

Purpose specification

Modeling the first principle means that every record in the database should have very precisely defined purpose. In eLearning environments that can be: reporting, statistical analysis, entering grades, attendance, etc.  In our model, the following principle is implemented through student_purpose table. Objects are set between students and purpose tables and create many-to-many relation. That is necessary when one record needs to have many purposes (Figure 1.)

The information provider, in our case the student, has a full right to give or deny consent for usage of personal data on specific attributes which are not essential for system functionality like: ID attributes, first name, last name.  From the other side, protected data can be: grades, phone number, email, credit card number etc. Proposed model solves this problem using object and attribute_consent tables.  Also, the model gives possibility to provide or deny consent for each attribute in any table (Figure 1.)

Limited collection

clip_image002Minimum requirements for the amount of information about students that are necessary for business processes, are defined by state laws and/or University policies. This principle can’t have precise technical implementation without a universal set of rules about collection of data.  For the  sake of the HDB principle, we propose that storing of data should be minimal. For example, information about places of birth or parents' names is irrelevant from aspect of eLearning system

Figure 1. eUniversity HDB relational model

 

Limited use

Usage of each query and/or stored procedure (or some other object) from data access layer will be defined and tagged in corresponding table with its purpose. That aspect in model is implemented with dataAccess_purpose object (Figure 1.)

Limited disclosure

Generally it is very hard to define what is outside access in eLearning environments. Mostly, they are closed systems in University boundaries. Because of that there is no frequent need for outside access of student’s data. However there are Universities which are connected and open for student, knowledge and educational staff exchange.

Limited retention

The biggest problem in implementing HDB principles is the information retention period in eLearning systems. In a nutshell, this principle defines that records should be erased from databases after fulfilling its purpose. But in eLearning and eUniversity, systems limited retention period is not defined. For example, let’s imagine a situation when a student finishes or leaves.  The university administration staff delete all electronic trails of the education process. That action will neutralize any of the following operations: retroactive analysis, statistical analysis, providing diploma, supplement and any other operations which involved usage of student’s data. The second contribution of this research is to show which can provide a solution for this paradox. (Figure 2.). At the same time, the model keeps data and satisfies the limited retention principle. We suggest that all students’ data should be de-normalized from relational model into data warehouse (DW). That process should be executed after a student finishes or leaves the university. Next step is to use public key cryptography in order to protect data in the DW. One copy of the key should have a student and the other University. So data can be decrypted on the personal request of the student or with their permission. The important thing is that the proposed model allows statistical analysis in the DW on protected data but without revealing student identity or any other personal data. That is possible to do because non-critical and personal identity is not important data and are not encrypted like: StudentID, year of assigning on study, grades on specific subject etc. From that information it is not possible to know: name, address, phone or any other sensitive personal data. (Figure 3.)

clip_image004

Figure 2. . eUniversity rettention period

 

 

Figure 3. Encrypted data into DW

Accuracy

Accuracy of data and preventing tampering on database object and data itself, can be provided using the existing model for tempering detection. That model is a result of previous research (Jasmin Azemović, Denis Mušić, Efficient model for detection data and data scheme tempering with purpose of valid forensic analysis) [6]

Safety

Safety of personal data is provided by using the role access model. Example: financial data should be accessed only by the accounting department or grades from a mathematic course is totally unimportant for users from software engineering. Otherwise, access to non-need to know data can create preconceived opinion about some individual or misusage. Objects authorization and roles are providing this principle of HDB (Figure 1.)

Openness

A student should have, at any time, access to all private, personal and data collected during their study process. This principle should be implemented from the user interface.

Compliance

Whoever provides the information should have insight into usage and be able to  the access history of personal or any other data. With this principle we can provide transparency for students and usage of others HDB principles. This model supports this with student_access_log object that is connected to other tables and collects relevant data (Figure 1.)

Conclusion

Privacy, security and access control are elements which are implemented on the object and row level of database. Current trends and solutions put privacy issues low on the priority list and leave them to company security policy to handle. Examples of privacy violation show how that can finish. Access control and security mechanisms should be parts of technology to provide and keep privacy of the data.

This article shows how to implement ten HDB principles on eUnivesity system.

References

[1].      Alan Westin, Professor Emeritus of Public Law and Government, Columbia University

[2].      P. Ashley and D. Moore. Enforcing privacy within an enterprise using IBMTivoli Privacy Manager for e-business

[3].      R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases. In The 28th International Conference on Very Large Databases (VLDB)

[4].      D.Mušić,J.Azemović, Mohamed El-Zayat, „Component of the efficient eUniversity system“,2009 The 2nd IEEE International Conference on Computer Science and Information Technology

[5].      V.Bevanda, J.Azemović. D.Mušić, „Privacy preserving in eLearning environment (Case of modelling Hippocratic database structure)“, 4th Balkan Conference in Informatics, Thessaloniki, Greece, 2009.

[6].      J.Azemović,D.Mušić, „Efficient model for detection data and data scheme tempering with purpose of valid forensic analysis“, ICCEA 2009, Manila, Philippines.

[7].      D.Mušić,J.Azemović, „Applying Case-based reasoning for mobile support in diagnosing infective diseases“, ICCDA 2009, Singapore.

[8].      D.Mušić,J.Azemović. E.Čatrnja,  „Influence of learning communities and collaborative learning on students’ success“, Chennai, India ICSTE 2009.

[9].      Sabah S. Al-Fedaghi: Beyond Purpose-Based Privacy Access Control, Eighteenth Australasian Database Conference (ADC 2007), Ballarat, Australia. CRPIT, 63. Bailey, J. and Fekete, A., Eds. ACS. 23-32.

[10].   Kristen LeFevre , Rakesh Agrawal , VukErcegovac , Raghu Ramakrishnan , Yirong Xu , David DeWitt, Limiting disclosure in hippocratic databases, Proceedings of the Thirtieth international conference on Very large data bases, p.108-119, August 31-September 03, 2004, Toronto, Canada

[11].   Norjihan Abdul Ghani, Zailani MohdSidek: Hippocratic Database : A Privacy- Aware Database, Proceedings of  World Adademy Of Science, Engineering and Technology Volume 32, August 2008, ISSN: 2070-3740

[12].   Jae-Gil Leey, Kyu-Young Whangy, Wook-Shin Hanz, Il-Yeol Songx: Hippocratic XML Databases: A Model and an Access Control Mechanism, Journal of Computer Systems Science and Engineering, Vol. 21, No. 6, pp. 395 ~ 404, Nov. 2006

 

About the author

Jasmin hails from from Bosnia and Herzegovina. Working with database systems has been his profession since 1996. He has enjoyed working with SQL Server since SQL Server 2000.  His biggest experience with SQL Server was designing, projecting and implementing data layer for eUniversity system. The project was started in 2002 and for whole 10 years has evolved through new technologies and special SQL Server features (mirroring, filestream, encryption etc.)

He is also on the Faculty of Information technology where he teaches database and security based subjects and courses.

Jasmin has a master degree in Forensic analysis and tamper detection in databases and a PhD in creating model for privacy aware eUniversity system. He is also the leader of SQL/Dev User group (F5).  Additionally he talks about SQL more in his SQL Tales on his blog and YouTube videos

About MVP Mondays

The MVP Monday Series is created by Melissa Travers. In this series we work to provide readers with a guest post from an MVP every Monday. Melissa is a Community Program Manager for Dynamics, Excel, Office 365, Platforms and SharePoint in the United States. She has been working with MVPs since her early days as Microsoft Exchange Support Engineer when MVPs would answer all the questions in the old newsgroups before she could get to them.

  clip_image008