### HOT ZONE IDENTIFICATION: ANALYZING EFFECTS OF DATA SAMPLING ON SPAM CLUSTERING

#### Abstract

#### Keywords

#### Full Text:

PDF#### References

Arlia, D. & Coppola, M. (2001). Experiments in parallel clustering with dbscan. Euro-Par 2001 Parallel Processing. Lecture Notes in Computer Science, 2150. Springer Berlin Heidelberg, 326–331.

Birant, D. & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering, 60(1), 208 – 221.

Caruana, G. & Li, M. (2008). A survey of emerging approaches to spam filtering. ACM Computing Surveys, 44(2), 9:1–9:27.

Dagon, D., Gu, G., Lee, C., & Lee, W. (2007). A taxonomy of botnet structures. Proceedings of the 23rd Annual Computer Security Applications Conference. ACSAC ’07, 325–339.

Ganti, V., Ramakrishnan, R., Gehrke, J., & Powell, A. (1999). Clustering large datasets in arbitrary metric spaces. Proceedings of the 15th International Conference on Data Engineering (ICDE ’99). IEEE Computer Society, Washington, DC, USA.

Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17, December, 2-3, 107–145.

Hammersley, J. M., Handscomb, D. C., & Weiss, G. (1965). Monte Carlo methods. Physics Today, 18, 55.

Hartigan, J. A. & Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28(1), 100–108.

Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241–272.

Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., & Wu, A. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892.

Knuth, D. E. (2006). The art of computer programming. 4, fascicle 4, 1. print.. Generating all trees. Addison-Wesley.

Koontz, W. L. G., Narendra, P. M., & Fukunaga, K. (1975). A Branch and Bound Clustering Algorithm. IEEE Transactions on Computers, 24(9), 908–915.

Kyriakopoulou, A. & Kalamboukis, T. (2008). Combining clustering with classification for spam detection in social bookmarking systems. Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Discovery Challenge, (ECML/PKDD RSDC ’08), 47–54.

Levchenko, K., Pitsillidis, A., Chachra, N., Enright, B., Halvorson, T., Kanich, C…Savage, S. (2011). Click trajectories: End-to-end analysis of the spam value chain. Proceedings of The IEEE Symposium on Security & Privacy, 431–446.

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady. 10(8 Feb), 707–710.

Matsumoto, M. & Nishimura, T. (1998). Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation, 8(1 Jan), 3–30.

Moore, T., Clayton, R., & Anderson, R. (2009). The economics of online crime. The Journal of Economic Perspectives, 23(3), 3–20.

Nagwani, N. K. & Bhansali, A. (2010). An Email Clustering Model Using Weighted Similarities between Emails Attributes. International Journal of Research and Reviews in Computer Science (IJRRCS), 1, 2.

Nhung, N. P. & Phuong, T. M. (2007). An efficient method for filtering image-based spam e-mail. Proceedings of The 12th international conference on Computer analysis of images and patterns, (CAIP’07). Springer-Verlag, Berlin, Heidelberg, 945–953.

Ono, K., Kawaishi, I., & Kamon, T. (2007). Trend of Botnet Activities. Proceedings of the 41st Annual IEEE International Carnahan Conference on Security Technology, (ICCST) ’07, 243–249.

Ramachandran, A., Feamster, N., & Vempala, S. (2007). Filtering spam with behavioral blacklisting. Proceedings of the 14th ACM Conference on Computer and Communications Security, (CCS) ’07. ACM, New York, NY, USA, 342–351.

Sasaki, M. & Shinnou, H. (2005). Spam detection using text clustering. Proceedings of the International Conference on Cyberworlds, 4(4), p. 319.

Thomas, K., Grier, C., Ma , J., Paxson , V., & Song, D. (2011). Design and evaluation of a real-time url spam filtering service. Proceedings of the 2011 IEEE Symposium on Security and Privacy, (S&P ’11), IEEE, 447–462.

UAB-CIS. (2013). Department of CIS, University of Alabama at Birmingham, UAB Spam Data Mine. Retrieved from http://www.cis.uab.edu/UABSpamDataMine.

Vitter, J. S. (1985). Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1 Mar), 37–57.

Wei, C. (2010). Clustering Spam Domains and Hosts: Anti-Spam Forensics with Data Mining. Ph.D. thesis, University of Alabama at Birmingham.

Wei, C., Sprague, A., & Warner, G. (2009). Clustering malware-generated spam emails with a novel fuzzy string matching algorithm. Proceedings of the 2009 ACM symposium on Applied Computing, (SAC ’09), ACM, New York, NY, USA, 889–890.

Ying, W., Kai, Y., & Zhong, Jian Z. (2010). Using DBSCAN clustering algorithm in spam identifying. Proceedings of the 2nd International Conference on Education Technology and Computer. (ICETC) ’10, 1, 398–402.

### Refbacks

- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 4.0 International License.

(c) 2006-2015 Association of Digital Forensics, Security and Law