Phishing Intelligence Using the Simple Set Comparison Tool

Jason Britt, Alan Sprague, Gary Warner

Abstract


Phishing websites, phish, attempt to deceive users into exposing their passwords, user IDs, and other sensitive information by imitating legitimate websites, such as banks, product vendors, and service providers. Phishing investigators need fast automated tools to analyze the volume of phishing attacks seen today. In this paper, we present the Simple Set Comparison tool. The Simple Set Comparison tool is a fast automated tool that groups phish by imitated brand allowing phishing investigators to quickly identify and focus on phish targeting a particular brand. The Simple Set Comparison tool is evaluated against a traditional clustering algorithm over a month's worth of phishing data, 19,825 confirmed phish. The results show clusters of comparable quality, but created more than 37 times faster than the traditional clustering algorithm.

Keywords


phishing, phish kits, phishing investigation, data mining, parallel processing

Full Text:

XML

References


APWG, "About the APWG," APWG, 2014. [Online]. Available: https://apwg.org/about-APWG/ . [Accessed 11 December 2014].

PhishTank, "PhishTank FAQ," PhishTank, [Online]. Available: http://www.phishtank.com/faq.php#whatisphishtank. [Accessed 11 December 2014].

Kaspersky Lab, "Kaspersky Lab," Kaspersky Lab, 8 November 2013. [Online]. Available: http://www.kaspersky.com/about/news/virus/2013/Malware_spam_and_phishing_the_threats_most_commonly_encountered_by_companies. [Accessed 11 December 2014].

G. Aaron and R. Manning, "Phishing Activity Trends Report 2nd quarter 2014," 29 August 2014. [Online]. Available: http://docs.apwg.org/reports/apwg_trends_report_q1_2014.pdf. [Accessed 11 December 2014].

J. Han and M. Kamber, Data Mining: Concepts and Techniques, San Diego, CA: Academic Press, 2001, p. 337.

A. Saberri, M. Vahidi and B. M. Bidgoli, "Learn to Detect Phishing Scams Using Learning and Ensemble Methods," in Web Intelligence and Intelligent Agent Technology Workshops, Silicon Valley, CA, 2007.

S. Abu-Nimeh, D. Nappa, X. Wang and S. Nair, "A Comparison of Machine Learning Techniques for Phishing Detection," in eCrime Researchers Summit, Pittsburgh, PA, 2007.

I. Fette, N. Sadeh and A. Tomasic, "Learning to Detect Phishing emails," in Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta Canada, 2007.

B. Gyawali, T. Solorio, M. Montes-y-Gomez, B. Wardman and G. Warner, "Evaluating a Semisupervised Approach to Phishing URL Identification in a Realistic Scenario," in Conference on Email and Anti-Spam, Perth, Australia, 2011.

J. Ma, L. Saul, S. Savage and G. Voelker, "Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs," in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009.

M. Dunlop, S. Groat and D. Shelly, "GoldPhish: Using Images for Content-Based Phishing Analysis," in The Fifth International Conference on Internet Monitoring and Protection, 2010.

R. Basnet, S. Mukkamala and A. H. Sung, "Detection of Phishing Attacks: A Machine Learning Approach," in Studies in Fuzziness and Soft Computing, 2008, pp. 373-383.

R. Suriya, K. Saravanan and A. Thangavelu, "An Integrated Approach to Detect Phishing Mail Attacks A Case Study," in Proceedings of the 2nd International Conference on Security of Information and Networks, North Cyprus, Turkey, 2009.

C. Whittaker, B. Ryner and M. Nazif, "Large-Scale Automatic Classification of Phishing Pages," in Network and Distributed Systems Security Symposium, San Diego, CA, 2010.

G. Xiang and J. Hong, "A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval," in Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 2009.

Y. Zhang, J. Hong and L. Cranor, "CANTINA: A Content-Based Approach to Detecting Phishing Web Sites," in International Conference on World Wide Web, Banff, Alberta, Canada, 2007.

D. Irani, S. Webb, J. Griffon and C. Pu, "Evolutionary Study of Phishing in eCrime Researchers Summit," in eCrime Researchers Summit, Atlanta, GA, 2008.

R. Weaver and M. Collins, "Fishing for Phishes: Applying Capture-Recapture Methods to Estimate Phishing Populations," in Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, Pittsburgh, PA, 2007.

B. Wardman, G. Shukla and G. Warner, "Identifying Vulnerable Websites by Analysis of Common String in Phishing URLs," in eCrime Researchers Summit, Tacoma, 2009.

J. Britt, B. Wardman, A. Sprague and G. Warner, "Clustering Potential Phishing Websites Using DeepMD5," in Proceedings of the 5th USENIX Conference on Large-Scale Exploits and Emergent Threats, 2012.

B. Wardman, G. Warner, H. McCalley, S. Turner and A. Skjellum, "Reeling in Big Phish with a Deep MD5 Net," Journal of Digital Forensics, Security, & Law, vol. 5, no. 3, pp. 33-55, 2010.

B. Wardman, J. Britt and G. Warner, "New Tackle to Catch A Phisher," International Journal of Electronic Security and Digital Forensics, vol. 6, no. 1, pp. 62-80, 2014.

S. Zawoad, A. Dutta, A. Sprague, R. Hasan, J. Britt and G. Warner, "Phish-Net: Investigating Phish Clusters Using Drop Email Addresses," in 2014 APWG eCrime Researchers Summit, San Francisco, 2013.

M. Maischein, "WWW::Mechanize::Firefox," [Online]. Available: http://search.cpan.org/dist/WWW-Mechanize-FireFox/.. [Accessed 2013 01 01].

R. Sibson, "SLINK: An optimally efficient algorithm for the single-link cluster method," The Computer Journal, vol. 16, no. 1, pp. 30-34, 1973.

B. Wardman, T. Stallings, G. Warner and A. Skjellum, "High-Performance Content-Based Phishing Attack Detection," in eCrime Researchers Summit, San Diego, CA, 2011.

A. Rosenberg and J. Hirschberg, "V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure," EMNLP-CoNLL, vol. 7, pp. 410-420, 2007.

B. Wardman, J. Britt and G. Warner, "New Tackle to Catch a Phisher," International Journal of Electronic Security and Digital Forensics, vol. 6, no. 1, pp. 62-80, 2014.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

(c) 2006-2015 Association of Digital Forensics, Security and Law