fork download
  1. Machine Learning based Anomaly detection system for IOT Devices
  2. Palash Jain
  3. Assistant Professor at GLA University, Mathura ,E-mail ID- palash.jn5@gmail.com
  4.  
  5. Abstract— IoT will transform our lives as it enables billions of things (homes, cars, phones, and wearables etc.) to be connected anytime anyplace to anything anyone. According to (Middleton, Kjeldsen, & Tully, 2013), the Internet of Things will include 26 billion units installed by 2020. Security remains a big issue in iotivity. Recently IOT BOTS created havoc by putting KrebsOnSecurity down. We tested that on tizen TV(2.4), Samsung open source IOT Stack (iotivity.org) does not have any default security measures against application level DOD, DDOS attacks. One of the traditional approaches is to use a rules-based engine, which triggers alerts according to some manually configured thresholds. These systems lack data fusion and learning capabilities and therefore fail to cope with large amounts of complex high dimensional data.In this paper we describe a generic analytics engine which provides robust alerts upon changes and anomalies in sensory data stream working in IOT scenario.
  6. Index Terms— Machine Learning, IoT devices, Anomaly Detection, K-Means, Clustering, Network Security, Tizen.
  7. ——————————  ——————————
  8. 1 INTRODUCTION
  9. The Internet of Things (IoT) is a smart network which connects all things together and to the internet for the purpose of exchanging message information with each other so any device can be accesses anytime by anyone from anywhere. In IoT network, things or objects are wirelessly connected with smart tiny sensors. IoT devices can interact with each other without human intervention. Many cross domains like healthcare, automotive, energy, industrial, retail, smart buildings and homes, etc are wodely using IOT protocol. According to [1](Middleton, Kjeldsen, & Tully, 2013), the Internet of Things will include 26 billion units installed by 2020. IoT product and service suppliers will generate incremental revenue exceeding $300 billion, mostly in services, in 2020.
  10. The huge volume of IoT data generated by devices is extremely dynamic, heterogeneous, imperfect and unprocessed [2]. Furthermore, it usually requires real-time analysis and decision making. Therefore, most existing implementations settle for basic analysis and statistics, or make many assumptions on the collected data, usually relevant to specific domains.
  11. IOT work across different device over cross domain where each device is having its own operating sytem/platform and connecting protocol(Wireless, wired ,Zigbee etc) thus making its difficult to design Intrusion detection system which can cater device specific, platform specific and Operating specific needs effectively. Pre-defined Rule based Intrusion detection engines like snort can detect defined anomalies whose signature is already known. However it fails to identify new unrecognized threats. Moreover in IOT where Hardware devices (like bulb , oven etc) is controlled by other devices and any overuse & misuse can cause fatal physical life threatening harm to device & the person using it[3,4,5,6].
  12. 2 IOT DATA COLLECTION STRATEGY AND APPROACHES
  13. Samsung IOT stack (Iotivity.org) works on Linux, tizen , Android , Windows platforms. There are two parts client & server. In normal scenarios Client device can discover for server devices and can send Rest API calls for getting & setting data elements. One of the beauty IOT stack has is resource model. In resource model, every Hardware device is represented by Software resource and Software resource has standard Rest api(get, set, Post etc) for communicating with other devices. We compiled IOT stack on Tizen 2.4 TV (Jazz M board) and ran it as daemon .Every daemon contains IOT Client and server stack for sending & receiving data respectively. Before sending data , we are required to do is to provision the devices.
  14. 2.2 Data Collection Strategy:
  15. On Tizen TV- Jazz M , we made generic resources which control their respective hardware values. Two types of resources are chosen which are widely used by users:
  16. Dual state resource: It can have two state only (Power, mute).
  17. Multi state resource: can have multiple states (contrast, volume).
  18. We created & exported Volume, Power, Channels, brightness, contrast, connectivity, USB mode, PIP mode, Mute, StandBy mode as a resource on each TV. Here exporting means making resource discoverable by other authorized TV device running IOT stack and any other authorized TV can control these hardware’s.
  19. ————————————————
  20. • Palash Jain is currently an Assistant Professor in computer science engineering department in GLA University,Mathura, India, PH-9406984346. E-mail: palash.jn5@gmail.com
  21. 2.2.1 Collecting Normal data
  22. Now we start doing normal operations ON tv i.e sending “SET, GET” call to other TV with resource name (brightness etc) and its value. This Normal data get logged with time stamp and other attributes.
  23. 2.2.2 Collecting Attack data
  24. We tried various tools like metasploit, BeEf, burpsuite & Skipfish on kali linux to simulate attacks but most these tools are used for device attack (basically at network level) or for browser based web app attacks and were not able to generate Attack on IOT application running on Device. To simulate DOS & DDOS application attacks we created specific applications which are accessing resources by doing Get/Set operations at very frequent level.
  25. We have developed IOT client applications for simulation of Application level DOS attacks on IOT Server running on Tizen TV. We have send high magnitude of “GET” calls from IOT client application (linux pc) to IOT Server TV for getting resource current values and seen that in TV, current value of resources are loaded in memory and it doesn’t retrieve values from hardware. As hardware was not contacted or impacted and GET call was getting satisfied by just reading memory so we didn’t collected samples for GET Call. We collected samples for “SET, CREATE, DELETE, UPDATE” calls for setting hardware resources values rapidly for various mentioned resources.
  26. For Example setting alternate state in Dual State resources like POWER ON – POWER OFF, USB READ –WRITE etc.
  27. Inside Entity handler of every resource, under set get calls we collected data. We have taken few parameters Although this generic framework can easily be extended for other parameters easily.
  28. Source IP: Client application IP
  29. Source Port: Client application Port No
  30. Resource names: Power, Mute, Brightness, contrast
  31. Operation type: Set, get
  32. Absolute value of resources: Current value of resources.
  33. Whenever any set call is fired from client application from client TV for any resource on Server TV, we collected data inside entity handler of that resource for all mentioned resources. Finally we have all resource values at different time stamp. Now we analyze that if multiple SET calls are coming for same resource with in short time period. These multiple calls will be considered one pattern and then we check if this pattern is repeated again and again. For example SET VOLUME came for volume level 10, 12, 14,15, 30 and then this pattern is repeated also.
  34. 3 APPROACH & ANALYSIS
  35. We have chosen CM-SPAM algorithm. Logs collected at every 15 min interval are fetched to it .It could identify Attack Patterns but requires heavy processing and consumes time in analyzing. Due to this shortcoming we could not use it on Embedded constrained devices like TV, oven etc. where memory and processing is a constraint, as we could not afford to use CPU/memory resources continuously for mining patterns. To overcome it we have employed unsupervised learning technique which is fast and effective. First of all we preprocessed the data in terms of variation happened at every next sequence. Then, we clustered the data on the basis of windowing mechanism by the means of K- Means Clustering Algorithm. After clustering the data, we classified those clusters as attacking in which there was very frequent change and the change was very frequently repeating in same manner. So labeling was done automatically using the machine learning approach and algorithms used for fitting the model and for predictions were mainly implemented in python[7,8,9].
  36. 3.1 CLUSTERING APPROACH
  37. As sequential pattern matching consumes CPU cycles and its ongoing process so we took Classification approach. We collected all data and check pattern with in 5 minute window only. K-Means approach has been used from Scikit library[10].
  38.  
  39.  
  40.  
  41.  
  42.  
  43.  
  44.  
  45.  
  46.  
  47.  
  48.  
  49.  
  50.  
  51.  
  52.  
  53.  
  54.  
  55. 4 RESULTS
  56. We training the model with the data collected for around 48 hours, we tested the model accuracy with some seed data and achieved accuracy of about 95% in identification of repeated patterns. So, after getting this much accuracy we can better predict the next data pattern to be attacking or not.
  57. 5 CONCLUSION
  58. We have used K-Means approach for making clusters and then labeled it. We ran this model and achieved 95% accuracy in identifying DOS attacks. This is a generic model and can easily be extended for other parameters.
  59. ACKNOWLEDGMENT
  60. I want to give my warm regards to my parents and friends who supported me throughout my journey of life and they always supported me to keep my moral high and from them I learned to have patience.
  61. REFERENCES
  62. [1] http://w...content-available-to-author-only...g.com/denial-of-service-attacks/iot-ddos-attack-code-released-/d/d-id/1327086
  63. [2] Harjinder Kaur, A Review of Machine Learning based Anomaly Detectionechniques https://w...content-available-to-author-only...e.net/publication/253239339_A_Review_of_Machine_Learning_based_Anomaly_Detection_Techniques,
  64. [3] Colin Gilmore, Anomaly Detection and Machine Learning Methods for Network Intrusion Detection http://w...content-available-to-author-only...s.com/proc/p2016/SAM9741.pdf?)
  65. [4] Le, Anh, et al. "Fast kernel-based method for anomaly detection." Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.
  66. [5] Kumar, Sanjay, Ari Viinikainen, and Timo Hamalainen. "Machine learning classification model for Network based Intrusion Detection System." Internet Technology and Secured Transactions (ICITST), 2016 11th International Conference for. IEEE, 2016.
  67. [6] Jain, Raj, and Hitesh Shah. "An anomaly detection in smart cities modeled as wireless sensor network." Signal and Information Processing (IConSIP), International Conference on. IEEE, 2016.
  68. [7] Zhang, Like, and Gregory B. White. "Anomaly detection for application level network attacks using payload keywords." Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on. IEEE, 2007.
  69. [8] Ashfaq, Rana Aamir Raza, et al. "Fuzziness based semi-supervised learning approach for intrusion detection system." Information Sciences 378 (2017): 484-497.
  70. [9] H. Goto, Y. Hasegawa, and M. Tanaka, “Efficient Scheduling Focusing on the Duality of MPL Representation,” Proc. IEEE Symp. Computational Intelligence in Scheduling (SCIS ’07), pp. 57-64, Apr. 2007, doi:10.1109/SCIS.2007.367670. (Conference proceedings)
  71. [10] Lane, Ben, et al. "Using Machine Learning for Advanced Anomaly Detection and Classification." Advanced Maui Optical and Space Surveillance Technologies Conference. 2016.
  72.  
  73.  
  74.  
Success #stdin #stdout 0.02s 25908KB
stdin
Standard input is empty
stdout
Machine Learning based Anomaly detection system for IOT Devices
Palash Jain
Assistant Professor at GLA University, Mathura ,E-mail ID- palash.jn5@gmail.com

Abstract— IoT will transform our lives as it enables billions of things (homes, cars, phones, and wearables etc.) to be connected anytime anyplace to anything anyone. According to (Middleton, Kjeldsen, & Tully, 2013), the Internet of Things will include 26 billion units installed by 2020. Security remains a big issue in iotivity.  Recently IOT BOTS created havoc by putting KrebsOnSecurity down.  We tested that on tizen TV(2.4), Samsung open source IOT Stack (iotivity.org) does not have any default security measures against application level DOD, DDOS attacks. One of the traditional approaches is to use a rules-based engine, which triggers alerts according to some manually configured thresholds. These systems lack data fusion and learning capabilities and therefore fail to cope with large amounts of complex high dimensional data.In this paper we describe a generic analytics engine which provides robust alerts upon changes and anomalies in sensory data stream working in IOT scenario.
Index Terms— Machine Learning, IoT devices, Anomaly Detection, K-Means, Clustering, Network Security, Tizen.  
——————————      ——————————
1	INTRODUCTION                                                                      
The Internet of Things (IoT) is a smart network which connects all things together and to the internet for the purpose of exchanging message information with each other so any device can be accesses anytime by anyone from anywhere. In IoT network, things or objects are wirelessly connected with smart tiny sensors. IoT devices can interact with each other without human intervention. Many cross domains like healthcare, automotive, energy, industrial, retail, smart buildings and homes, etc are wodely using IOT protocol. According to [1](Middleton, Kjeldsen, & Tully, 2013), the Internet of Things will include 26 billion units installed by 2020. IoT product and service suppliers will generate incremental revenue exceeding $300 billion, mostly in services, in 2020.
The huge volume of IoT data generated by devices is extremely dynamic, heterogeneous, imperfect and unprocessed [2]. Furthermore, it usually requires real-time analysis and decision making. Therefore, most existing implementations settle for basic analysis and statistics, or make many assumptions on the collected data, usually relevant to specific domains.
IOT work across different device over cross domain where each device is having its own operating sytem/platform and connecting protocol(Wireless, wired ,Zigbee etc) thus making its difficult to design Intrusion detection system which can cater device specific, platform specific and Operating specific needs effectively. Pre-defined Rule based Intrusion detection engines like snort can detect defined anomalies whose signature is already known. However it fails to identify new unrecognized threats. Moreover in IOT where Hardware devices (like bulb , oven etc) is controlled by other devices and any overuse & misuse can cause fatal physical life threatening harm to device & the person using it[3,4,5,6].
2 IOT DATA COLLECTION STRATEGY AND APPROACHES
Samsung IOT stack (Iotivity.org) works on Linux, tizen , Android , Windows platforms. There are two parts client & server. In normal scenarios Client device can discover for server devices and can send Rest API calls for getting & setting data elements. One of the beauty IOT stack has is resource model. In resource model, every Hardware device is represented by Software resource and Software resource has standard Rest api(get, set, Post etc) for communicating with other devices. We compiled IOT stack on Tizen 2.4 TV (Jazz M board) and ran it as daemon .Every daemon contains IOT Client and server stack for sending & receiving data respectively. Before sending data , we are required to do is  to provision the devices. 
2.2 Data Collection Strategy:
On Tizen TV- Jazz M , we made generic resources which control their respective hardware values. Two types of resources are chosen which are widely used by users:
Dual state resource: It can have two state only (Power, mute).
Multi state resource: can have multiple states (contrast, volume).
We created & exported Volume, Power, Channels, brightness, contrast, connectivity, USB mode, PIP mode, Mute, StandBy mode as a resource on each TV. Here exporting means making resource discoverable by other authorized TV device running IOT stack and any other authorized TV can control these hardware’s. 
————————————————
•	Palash Jain is currently an Assistant Professor in computer science engineering department in GLA University,Mathura, India, PH-9406984346. E-mail: palash.jn5@gmail.com
2.2.1 Collecting Normal data
Now we start doing normal operations ON tv i.e sending “SET, GET” call to other TV with resource name (brightness etc) and its value. This Normal data get logged with time stamp and other attributes.
2.2.2	Collecting Attack data
We tried various tools like metasploit, BeEf, burpsuite & Skipfish on kali linux to simulate attacks but most these tools are used for device attack (basically at network level) or for browser based web app attacks and were not able to generate Attack on IOT application running on Device. To simulate DOS & DDOS application attacks we created specific applications which are accessing resources by doing Get/Set operations at very frequent level.
We have developed IOT client applications for simulation of Application level DOS attacks on IOT Server running on Tizen TV. We have send high magnitude of “GET” calls from IOT client application (linux pc) to IOT Server TV for getting resource current values and seen that in TV, current value of resources are loaded in memory and it doesn’t retrieve values from hardware. As hardware was not contacted or impacted and GET call was getting satisfied by just reading memory so we didn’t collected samples for GET Call. We collected samples for “SET, CREATE, DELETE, UPDATE” calls for setting hardware resources values rapidly for various mentioned resources.
For Example setting alternate state in Dual State resources like POWER ON – POWER OFF, USB READ –WRITE etc.
Inside Entity handler of every resource, under set get calls we collected data. We have taken few parameters Although this generic framework can easily be extended for other parameters easily.
Source IP: Client application IP
Source Port: Client application Port No
Resource names: Power, Mute, Brightness, contrast 
Operation type: Set, get
Absolute value of resources: Current value of resources.
Time stamp: Current time
Whenever any set call is fired from client application from client TV for any resource on Server TV, we collected data inside entity handler of that resource for all mentioned resources. Finally we have all resource values at different time stamp. Now we analyze that if multiple SET calls are coming for same resource with in short time period. These multiple calls will be considered one pattern and then we check if this pattern is repeated again and again. For example SET VOLUME came for volume level 10, 12, 14,15, 30 and then this pattern is repeated also. 
3	APPROACH & ANALYSIS
We have chosen CM-SPAM algorithm. Logs collected at every 15 min interval are fetched to it .It could identify Attack Patterns but requires heavy processing and consumes time in analyzing. Due to this shortcoming we could not use it on Embedded constrained devices like TV, oven etc. where memory and processing is a constraint, as we could not afford to use CPU/memory resources continuously for mining patterns. To overcome it we have employed unsupervised learning technique which is fast and effective. First of all we preprocessed the data in terms of variation happened at every next sequence. Then, we clustered the data on the basis of windowing mechanism by the means of K- Means Clustering Algorithm. After clustering the data, we classified those clusters as attacking in which there was very frequent change and the change was very frequently repeating in same manner. So labeling was done automatically using the machine learning approach and algorithms used for fitting the model and for predictions were mainly implemented in python[7,8,9].
3.1	 CLUSTERING APPROACH
As sequential pattern matching consumes CPU cycles and its ongoing process so we took Classification approach. We collected all data and check pattern with in 5 minute window only. K-Means approach has been used from Scikit library[10]. 

















4	RESULTS
We training the model with the data collected for around 48 hours, we tested the model accuracy with some seed data and achieved accuracy of about 95% in identification of repeated patterns. So, after getting this much accuracy we can better predict the next data pattern to be attacking or not.
5	CONCLUSION
We have used K-Means approach for making clusters and then labeled it. We ran this model and achieved 95% accuracy in identifying DOS attacks. This is a generic model and can easily be extended for other parameters.
ACKNOWLEDGMENT
I want to give my warm regards to my parents and friends who supported me throughout my journey of life and they always supported me to keep my moral high and from them I learned to have patience.
REFERENCES
[1]	http://w...content-available-to-author-only...g.com/denial-of-service-attacks/iot-ddos-attack-code-released-/d/d-id/1327086
[2]	Harjinder Kaur, A Review of Machine Learning based Anomaly Detectionechniques https://w...content-available-to-author-only...e.net/publication/253239339_A_Review_of_Machine_Learning_based_Anomaly_Detection_Techniques,
[3]	Colin Gilmore, Anomaly Detection and Machine Learning Methods for Network Intrusion Detection              http://w...content-available-to-author-only...s.com/proc/p2016/SAM9741.pdf?)
[4]	Le, Anh, et al. "Fast kernel-based method for anomaly detection." Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016. 
[5]	Kumar, Sanjay, Ari Viinikainen, and Timo Hamalainen. "Machine learning classification model for Network based Intrusion Detection System." Internet Technology and Secured Transactions (ICITST), 2016 11th International Conference for. IEEE, 2016.
[6]	Jain, Raj, and Hitesh Shah. "An anomaly detection in smart cities modeled as wireless sensor network." Signal and Information Processing (IConSIP), International Conference on. IEEE, 2016.
[7]	Zhang, Like, and Gregory B. White. "Anomaly detection for application level network attacks using payload keywords." Computational Intelligence in Security and Defense Applications, 2007. CISDA 2007. IEEE Symposium on. IEEE, 2007.
[8]	Ashfaq, Rana Aamir Raza, et al. "Fuzziness based semi-supervised learning approach for intrusion detection system." Information Sciences 378 (2017): 484-497.
[9]	H. Goto, Y. Hasegawa, and M. Tanaka, “Efficient Scheduling Focusing on the Duality of MPL Representation,” Proc. IEEE Symp. Computational Intelligence in Scheduling (SCIS ’07), pp. 57-64, Apr. 2007, doi:10.1109/SCIS.2007.367670. (Conference proceedings)
[10]	Lane, Ben, et al. "Using Machine Learning for Advanced Anomaly Detection and Classification." Advanced Maui Optical and Space Surveillance Technologies Conference. 2016.