An Empirical Comparison of Widely Adopted Hash Functions in Digital Forensics: Does the Programming Language and Operating System Make a Difference?

Satyendra Gurjar, Ibrahim Baggili, Frank Breitinger, Alice Fischer

Abstract


Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that needs to be processed when working on cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing processes. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512.

Keywords


Digital forensics, hashing, micro benchmarking, security, tool building

Full Text:

PDF

References


Altheide, C., & Carvey, H. (2011). Digital forensics with open source tools: Using open source platform tools for performing computer forensics on target systems: Windows, mac, linux, unix, etc (Vol. 1). Syngress Media.

Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13 (7), 422–426.

Breitinger, F., & Baier, H. (2012, October). Similarity Preserving Hashing: Eligible Properties and a new Algorithm MRSHv2. 4th ICST Conference on Digital Forensics & Cyber Crime (ICDF2C).

Breitinger, F., Stivaktakis, G., & Baier, H. (2013). Frash: A framework to test algorithms of similarity hashing. Digital Investigation, 10 , S50–S58.

frag_find. (2013). https://github.com/simsong/frag_find. ([Online; accessed Sep-2014])

Gallagher, P., & Director, A. (1995). Secure Hash Standard (SHS) (Tech. Rep.). National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180-1.

Garfinkel, S., Nelson, A., White, D., & Roussev, V. (2010). Using purpose-built functions and block hashes to enable small block and sub-file forensics. digital investigation, 7, S13–S23.

Kornblum, J. D. (2006, August). Identifying almost identical files using context triggered piecewise hashing. In Proceedings of the digital forensic workshop (p. 91-97). Retrieved from http://dfrws.org/2006/proceedings/12-Kornblum.pdf

Menezes, A. J., van Oorschot, P. C., & Vanstone, S. A. (2001). Handbook of applied cryptography (Vol. 5). CRC Press.

RDS Hashsets. (2014). http://www.nsrl.nist.gov/. ([Online; accessed Sep-2014])

Regional Computer Forensics Laboratory. Annual report. (2012). http://www.rcfl.gov/downloads/documents/RCFL_Nat_Annual12.pdf. © 2015 ADFSL Page 67 2015 CDFSL Proceedings An Empirical Comparison of Widely Adopted Hash Functions ... ([Online; accessed Sep-2014])

Roussev, V. (2010a). Data fingerprinting with similarity digests. In Advances in digital forensics vi (pp. 207–226). Springer.

Roussev, V. (2010b). Data fingerprinting with similarity digests. In K.-P. Chow & S. Shenoi (Eds.), Advances in digital forensics vi (Vol. 337, pp. 207–226). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/978-3-642-15506-2_15 doi: 10.1007/978-3-642-15506-2_15

Saleem, S., Popov, O., & Dahman, R. (2011). Evaluation of security methods for ensuring the integrity of digital evidence. In Innovations in information technology (iit), 2011 international conference on (pp. 220–225). spamsum. (2002-2009). http://www.samba.org/ftp/unpacked/junkcode/spamsum/. ([Online; accessed Sep-2014])

Sumathi, S., & Esakkirajan, S. (2007). Fundamentals of relational database management systems (Vol. 1). Springer Berlin Heidelberg.

Tridgell, A. (1999). Efficient algorithms for sorting and synchronization. Australian National University Canberra.

Wang, X., Yin, Y. L., & Yu, H. (2005). Finding collisions in the full sha-1. In Advances in cryptology–crypto 2005 (pp. 17–36).

Wang, X., & Yu, H. (2005). How to break md5 and other hash functions. In Advances in cryptology–eurocrypt 2005 (pp. 19–35). Springer.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

(c) 2006-2015 Association of Digital Forensics, Security and Law