Sandeep V. Sabnani (2008) Computer Security: A Machine Learning Approach.
Full text access: Open
In this thesis, we present the application of machine learning to computer security, particularly to intrusion detection. We analyse two learning algorithms (NBTree and VFI) for the task of detecting intrusions and compare their relative performances. We then comment on the suitability of the NBTree algorithm for the intrusion detection task based on its high accuracy and high recall. We finally state the usefulness of machine learning to the field of computer security and also comment on the security of machine learning itself.
This is a Published version This version's date is: 07/01/2008 This item is peer reviewed
https://repository.royalholloway.ac.uk/items/eb400e6b-efbd-8729-78e9-ae1e787835c3/1/
Deposited by () on 24-Jun-2010 in Royal Holloway Research Online.Last modified on 15-Dec-2010
[AKA91] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-basedlearning algorithms. Mach. Learn., 6(1):37–66, January 1991.
[AN07] A. Asuncion and D.J. Newman. UCI machine learning repository.http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.
[AS94] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms formining association rules. In Jorge B. Bocca, Matthias Jarke, andCarlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases,VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994.
[Bac99] Rebecca G. Bace. Intrusion Detection. Sams, December 1999.
[BCH+01] Eric Bloedorn, Alan D. Christiansen, Willian Hill, Clement Skorupka,Lisa M. Talbot, and Jonathan Tivel. Data mining for network intrusiondetection: How to get started. http://citeseer.ist.psu.edu/bloedorn01data.html, Aug 2001.
[BFSO84] Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen.Classification and Regression Trees. Chapman & Hall/CRC, January1984.
[BH95] Philippe Besnard and Steve Hanks, editors. UAI ’95: Proceedingsof the Eleventh Annual Conference on Uncertainty in Artificial Intelligence,August 18-20, 1995, Montreal, Quebec, Canada. MorganKaufmann, 1995.
[BKNS00] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and J¨orgSander. Lof: identifying density-based local outliers. SIGMOD Rec.,29(2):93–104, 2000.
[BNS+06] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph,and J. D. Tygar. Can machine learning be secure? In ASIACCS ’06:Proceedings of the 2006 ACM Symposium on Information, computerand communications security, pages 16–25, New York, NY, USA,2006. ACM Press.[Cen87] Jadzia Cendrowska. Prism: An algorithm for inducing modular rules.International Journal of Man-Machine Studies, 27(4):349–370, 1987.
[CER05] Insider threat study:computer system sabotage in critical infrastructuresectors. http://www.cert.org/archive/pdf/insidercross051105.pdf, 2005.
[CER07] CERT Vulnerability Statistics 1995 - 2006. http://www.cert.org/stats/vulnerability_remediation.html, 2007.
[Coh95] William W. Cohen. Fast effective rule induction. In Armand Prieditisand Stuart Russell, editors, Proc. of the 12th International Conferenceon Machine Learning, pages 115–123, Tahoe City, CA, July9–12, 1995. Morgan Kaufmann.
[Cra06] Jason Crampton. Notes on Computer Security, 2006.
[CW87] David D. Clark and David R. Wilson. A comparison of commercialand military computer security policies. IEEE Security and Privacy,00:184, 1987.
[DG97] Gulsen Demiroz and H. Altay Guvenir. Classification by voting featureintervals. In European Conference on Machine Learning, pages85–92, 1997.
[DL03] Tom Dietterich and Pat Langley. Machine learning for cognitivenetworks:technology assessments and research challenges, Draftof May 11, 2003. http://web.engr.oregonstate.edu/~tgd/kp/dl-report.pdf, 2003.
[EEL+] Levent Ertz, Eric Eilertson, Aleksandar Lazarevic, Pang-Ning Tan,Vipin Kumar, Jaideep Srivastava, and Paul Dokas. Minds -minnesota intrusion detection system. http://www.cs.umn.edu/research/MINDS/papers/minds_chapter.pdf.
[EPY97] Eppstein, Paterson, and Yao. On nearest neighbor graphs. GEOMETRY:Discrete & Computational Geometry, 17, 1997.
[FHSL96] Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji, andThomas A. Longstaff. A sense of self for Unix processes. In Proceedingesof the 1996 IEEE Symposium on Research in Security andPrivacy, pages 120–128. IEEE Computer Society Press, 1996.
[FLSM00] Wei Fan, Wenke Lee, Salvatore J. Stolfo, and Matthew Miller. Amultiple model cost-sensitive approach for intrusion detection. InMachine Learning: ECML 2000, 11th European Conference on MachineLearning, Barcelona, Catalonia, Spain, May 31 - June 2, 2000,Proceedings, volume 1810, pages 142–153. Springer, Berlin, 2000.
[FS99] Yoav Freund and Robert E. Schapire. Large margin classificationusing the perceptron algorithm. Machine Learning, 37(3):277–296,December 1999.
[Ges97] Paul Gestwicki. Id3: History, implementation, and applications.http://citeseer.ist.psu.edu/gestwicki97id.html, 1997.
[Gol99] Dieter Gollmann. Computer Security. John Wiley & Sons, 1999.
[GSS99] Anup K. Ghosh, Aaron Schwartzbard, and Michael Schatz. Learningprogram behavior profiles for intrusion detection. In ID’99: Proceedingsof the 1st conference on Workshop on Intrusion Detection andNetwork Monitoring, pages 6–6, Berkeley, CA, USA, 1999. USENIXAssociation.
[Hal99] Mark A. Hall. Correlation-based Feature Selection for MachineLearning. PhD thesis, University of Waikato, Department of ComputerScience, 1999.
[Hol93] Robert C. Holte. Very simple classification rules perform well on mostcommonly used datasets. Machine Learning, 11(1):63–90, April 1993.
[HS96] M. Hall and L. Smith. Practical feature subset selection for machinelearning. In Proceedings of the Australian Computer Science Conference,1996.
[IYWL06] Doo Heon Song Ill-Young Weon and Chang-Hoon Lee. Effective intrusiondetection model through the combination of a signature-basedintrusion detection system and a machine learning-based intrusiondetection system. Journal of Information Science and Engineering,22(6):1447–1464, 2006.
[JL95] George H. John and Pat Langley. Estimating continuous distributionsin bayesian classifiers. In Proceedings of the Eleventh Conferenceon Uncertainty in Artificial Intelligence, pages 338–345, 1995.
[Ken99] K. Kendall. A database of computer attacks for the evaluationof intrusion detection systems. http://www.kkendall.org/files/thesis/krkthesis.pdf, 1999.
[KM97] Miroslav Kubat and Stan Matwin. Addressing the curse of imbalancedtraining sets: one-sided selection. In Proc. 14th InternationalConference on Machine Learning, pages 179–186. Morgan Kaufmann,1997.
[Koh95] Ron Kohavi. A study of cross-validation and bootstrap for accuracyestimation and model selection. In Proceedings of the FourteenthInternational Joint Conference on Artificial Intelligence, pages 1137–1145, 1995.
[Koh96] Ron Kohavi. Scaling up the accuracy of Naive-Bayes classifiers: adecision-tree hybrid. In Proceedings of the Second International Conferenceon Knowledge Discovery and Data Mining, pages 202–207,1996.
[KT03] Christopher Kruegel and Thomas Toth. Using decision trees toimprove signature-based intrusion detection. http://www.auto.tuwien.ac.at/~chris/research/doc/2003_03.ps, 2003.
[Lan00] Terran D. Lane. Machine Learning Techniques for the computer securitydomain of anomaly detection. PhD thesis, Department of Electricaland Computer Engineering, Purdue University, August 2000.
[LB97a] T. Lane and C. Brodley. Detecting the abnormal: Machine learningin computer security. citeseer.ist.psu.edu/lane97detecting.html, 1997.
[LB97b] T. Lane and C. E. Brodley. An application of machine learning toanomaly detection. In Proc. 20th NIST-NCSC National InformationSystems Security Conference, pages 366–380, 1997.
[Lia05] Yihua Liao. Machine Learning in Intrusion Detection. PhD thesis,University of California (Davis), Department of Computer Science,2005.
[Lit88] Nick Littlestone. Learning quickly when irrelevant attributes abound:A new linear-threshold algorithm. Machine Learning, 2(4):285–318,1988.
[LIT92] Pat Langley, Wayne Iba, and Kevin Thompson. An analysis ofbayesian classifiers. In National Conference on Artificial Intelligence,pages 223–228, 1992.
[LS00] Wenke Lee and Salvatore J. Stolfo. A framework for constructingfeatures and models for intrusion detection systems. Informationand System Security, 3(4):227–261, 2000.
[Mah03] M. Mahoney. A Machine Learning Approach to Detecting Attacksby Identifying Anomalies in Network Traffic. PhD thesis, FloridaInstitute of Technology, 2003.
[Mal06] Marcus A. Maloof, editor. Machine Learning and Data Mining forComputer Security. Springer, 2006.
[Mat00] Jiri Matousek. On approximate geometric k-clustering. Discrete &Computational Geometry, 24(1):61–84, 2000.
[MC02] M. Mahoney and P. Chan. Learning models of network trafficfor detecting novel attacks. http://www.cs.fit.edu/~mmahoney/paper5.pdf, 2002.
[MCM83] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. MachineLearning: An Artificial Intelligence Approach. Tioga PublishingCompany, 1983.
[MH03] Steve Moyle and John Heasman. Machine learning to detect intrusionstrategies. Knowledge-Based Intelligent Information and EngineeringSystems, 2773/2003:371–378, 2003.
[Mit97] Tom M. Mitchell. Machine Learning. McGraw Hill, 1997.
[MKSW99] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel. Performancemeasures for information extraction. http://www.nist.gov/speech/publications/darpa99/html/dir10/dir10.htm, 1999.
[MM01] Ludovic M’e and C’edric Michel. Intrusion detection: A bibliography.Technical Report SSIR-2001-01, Sup’elec, Rennes, France, September2001.
[Mos05] Tim Mose. Oasis, extensible access control markup language, (xacml)version 2.0. http://docs.oasis-open.org/xacml/2.0/access_control-xacml-2.0-core-spec-os.pdf, 2005.
[MX06] Evan Martin and Tao Xie. Inferring access-control policy propertiesvia machine learning. In POLICY ’06: Proceedings of the SeventhIEEE International Workshop on Policies for Distributed Systemsand Networks (POLICY’06), pages 235–238, Washington, DC, USA,2006. IEEE Computer Society.
[Nil96] Nils J.. Nilsson. Introduction to Machine Learning - an earlydraft of a proposed book. http://ai.stanford.edu/~nilsson/MLDraftBook/MLBOOK.pdf, 1996.
[NIS85] NIST. Trusted computer system evaluation criteria (orange book).http://csrc.nist.gov/publications/history/dod85.pdf, 1985.
[OC99a] University Of California. Intrusion detection dataset in machinereadable form. http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names, 1999.
[OC99b] University Of California. The UCI KDD Archive, University ofCalifornia. http://kdd.ics.uci.edu/databases/kddcup99/task.html, 1999.
[OC99c] University Of California. The UCI KDD Archive, Universityof California. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, 1999.
[Pie04] Tadeusz Pietraszek. Using adaptive alert classification to reduce falsepositives in intrusion detection. Recent Advances in Intrusion Detection,3224:102–124, 2004.
[PP03] Charles P. Pfleeger and Shari Lawrence Pfleeger. Security in Computing.Pearson Education, Inc, 2003.
[PP07] Animesh Patcha and Jung-Min Park. Network anomaly detectionwith incomplete audit data. Computer Networks: The InternationalJournal of Computer and Telecommunications Networking,51(13):3935–3955, 2007.
[PT05] Tadeusz Pietraszeka and Axel Tannera. Data mining and machinelearning—towards reducing false positives in intrusion detection. InformationSecurity Technical Report, 10(3):169–183, 2005.
[Qui93] Ross R. Quinlan. C4.5: programs for machine learning. MorganKaufmann Publishers Inc., 1993.
[Ren04] Jason D. M. Rennie. Derivation of the f-measure. http://people.csail.mit.edu/jrennie/writing/fmeasure.pdf, Feb 2004.
[SJS00] Wenke Lee Salvatore J. Stolfo, Wei Fan. Cost-based modeling forfraud and intrusion detection results from the jam project. http://www.cs.columbia.edu/~wfan/papers/costdisex.ps.gz, 2000.
[SL06] Surendra K. Singhi and Huan Liu. Feature subset selection biasfor classification learning. In ICML ’06: Proceedings of the 23rdinternational conference on Machine learning, pages 849–856, NewYork, NY, USA, 2006. ACM Press.
[SO04] Shengli Sheng and Sylvia L. Osborn. A classifier-based approach touser-role assignment for web applications. In Secure Data Management,pages 163–171, 2004.
[SS03] Maheshkumar Sabhnani and Gursel Serpen. Application of machinelearning algorithms to kdd intrusion detection dataset within misusedetection context. In Proceedings of International Conference onMachine Learning: Models, Technologies, and Applications, pages209–215, Las Vegas, Nevada, USA, 2003.
[Sta06] William Stallings. Network Security Essentials: Applications andStandards (3rd Edition). Prentice Hall, 2006.
[TC05] G. Tandon and P. Chan. Learning useful system call attributes foranomaly detection. Proc. 18th Intl. FLAIRS Conf., pages 405–410,2005.
[Tes07] Sebastiaan Tesink. Improving intrusion detection systems throughmachine learning. http://ilk.uvt.nl/downloads/pub/papers/thesis-tesink.pdf, 2007.
[VMV05] Fredrik Valeur, Darren Mutz, and Giovanni Vigna. A learning-basedapproach to the detection of SQL attacks. In DIMVA, pages 123–140,2005.
[WF05] Ian H. Witten and Eibe Frank. Data Mining - Practical MachineLearning Tools and Techniques, Second Edition. Elsevier, 2005.
[WMB99] Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes:Compressing and Indexing Documents and Images. MorganKaufmann Publishers, San Francisco, CA, 1999.
[Wol06] StephenWolthusen. Lecture 11 - Intrusion Detection and Prevention,notes in Network Security, 2006.
[WS02] D. Wagner and P. Soto. Mimicry attacks on host based intrusiondetection systems. http://www.cs.berkeley.edu/~daw/papers/mimicry.pdf, 2002.
[WZA06] Nigel Williams, Sebastian Zander, and Grenville Armitage. A preliminaryperformance comparison of five machine learning algorithms forpractical ip traffic flow classification. SIGCOMM Comput. Commun.Rev., 36(5):5–16, 2006.