Skip to Content

Group 7 - Data privacy in Software Engineering

Mentor and Leader:


  • Jonathan Bell
  • Éva Csernusné Ádámkó
  • Qi Dawei
  • Csaba Nagy
  • Swapneel Sheth

Read and comment in the topic forum. (authorization required)


Different areas of software engineering are rife with interesting and important problems. In particular, software testing is concerned with finding bugs in software programs, and program comprehension addresses congnitive issues that software engineers experience when maintaining existing code. Problems in these and other areas of software engineering are often studied in isolation from problems that belong to other fields, specifically data privacy, since software engineering problems are viewed as orthogonal to data privacy problems. However, as it turns out, software engineering and data privacy are strongly connected, and protecting data affects different solutions in different areas of software engineering. Researchers are beginning to address problems that lie in the intersection of data privacy and software engineering, and studying these problems promises a big impact on how software is designed, built, and deployed.


The goal of this working group is to investigate (i) the how different data privacy approaches interfere with software testing and software maintenance in general, (ii) how research at the intersection of software engineering and data privacy could balance the goals of software practitioners and data loss prevention experts, and (iii) the main open challenges in achieving data privacy without destroying the utility of software engineering tasks.


The study will be conducted by reading papers published in these areas, interviewing experts in data privacy from both academia and industry during the ESEC-FSE conference to collect information aimed at addressing our goal.

To this purpose, the working group should identify a set of questions to be asked, for example:

1) Investigate implicit dependencies between attributes in Database-Centric Applications (DCAs). For example, if accessing values of some attributes depends on values of some other attributes, can this information lead to making better decisions about anonymization?

2) Guide anonymization techniques to suppress and generalize databases so that previously uncovered paths can be triggered with anonymized data.

3) Is it possible to drastically improve both anonymization and test outsourcing processes by pinpointing the database attributes that should be anonymized, based on their effect on corresponding DCAs?

4) How to enable stakeholders to decide how to balance solutions for the conflicting goals of providing a minimum cost anonymization solution while preserving test coverage of DCAs?

[1] M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data privacy
always good for software testing? In ISSRE, pages 368–377,

[2] M. Castro, M. Costa, and J.-P. Martin. Better bug reporting
with better privacy. In Proc. 13th ASPLOS, pages 319–328.
ACM, Mar. 2008.

[3] J. Brickell and V. Shmatikov. The cost of privacy: destruction
of data-mining utility in anonymized data publishing. In
KDD ’08, pages 70–78, New York, NY, USA, 2008. ACM.

[4] A. Budi, D. Lo, L. Jiang, and Lucia. b-anonymity: a model
for anonymized behaviour-preserving test and debugging
data. In PLDI, pages 447–457, 2011.

[5] J. A. Clause and A. Orso. Camouflage: automated
anonymization of field data. In ICSE, pages 21–30, 2011.

[6] K.Taneja, M. Grechanik, R. Gahi, and T. Xie. Testing Software In Age Of Data Privacy: A Balancing Act. In ESEC/FSE, 2011.

[7] ] L. Sweeney. k-anonymity: A model for protecting privacy.
International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 10(5):557–570, 2002.