Things used in this project

Hardware components:
Ardgen 101
Arduino 101 & Genuino 101
×1
Intel NUC
×1
Pi 3 02
Raspberry Pi 3 Model B
×1
Amazon tap
Amazon Alexa Amazon Tap
×1
Honeywell Smart Thermostat
×1
Software apps and online services:
Dp image kit 02
Amazon Alexa Alexa Skills Kit
Honeywell Developer/ Lyric
Ibm watson avatar pos
IBM Watson
Las Vegas Open Data Portal
Intel Arduino Firmata and Alexa SDK
Hand tools and fabrication machines:
Windows Laptops

Code

Watson Datascience NotebookJavaScript
Introduction
This Notebook will help you to identify anomalies in your historical timeseries data (IoT data) in simple steps. Also, derive the threshold value for your historical data. This threshold value can be used to set rules in Watson IoT Platform, such that you get an alert when your IoT device reports an abnormal reading in the future.
Accepted file format
Note that, this Notebook accepts the CSV file in one of the following file formats:
2 column format: <Date and time in DD/MM/YYYY or MM/DD/YYYY format, Numeric value>
1 column format: <Numeric value>

1. Sample data
In case if you dont have any file, try downloading the sample file from this link. The sample file contains a temperature data updated for ever 15 minutes. Also, the sample data contains spikes and dips to demonstrate the danger situation.
2. Load data
You can add the CSV file that you downloaded to this notebook by dragging the file to the Data Source pane in the notebook Palette in the notebook. The data file is listed on the Data Source pane and is saved in the Object Storage instance associated with your Analytics for Apache Spark service. The progress bar below the file in the Data Source panel indicates the status of the load process. In the next step, you will learn how to access files in Object Storage.
3.Access data
pandas is a useful Python package for managing structured data. The two primary data structures of pandas are Series (1-dimensional) and DataFrame (2-dimensional).
In the following steps, you will load the csv file into a DataFrame in memory.
To run code cells in a notebook, click Run Cell () in the notebook toolbar. While the code in the cell is running, a [*] appears next to the cell. After the code has run, the [*] is replaced by a number indicating that the code cell is the Nth cell to run in the notebook.
Import packages to the notebook
Import the requests, base64, StringIO, pandas and re packages to use in the notebook. The pandas package is traditionally imported as pd:
In [4]:
import requests, StringIO, pandas as pd, json, re
Access Object Storage
Because the csv file is located in Object Storage, you need to define a helper function to access the data file that you loaded. Run the following cell to define the method get_file_content():
In [5]:
def get_file_content(credentials):
    """For given credentials, this functions returns a StringIO object containing the file content."""

    url1 = ''.join([credentials['auth_url'], '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
            'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
            'password': credentials['password']}}}}}
    headers1 = {'Content-Type': 'application/json'}
    resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
    resp1_body = resp1.json()
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'):
            for e2 in e1['endpoints']:
                if(e2['interface']=='public'and e2['region']==credentials['region']):
                    url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']])
    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    resp2 = requests.get(url=url2, headers=headers2)
    return StringIO.StringIO(resp2.content)
Insert data source credentials
To add the credentials to access the csv file which is stored in Object Storage, click the next code cell and select the Insert to code function below the data file in the Data Source pane in the notebook palette.
The credentials for accessing the csv file are added to the cell as a Python dictionary. With these credentials, you can use the helper function to load the data file into a pandas.DataFrame.
Note: When you select the Insert to code function, a code cell with a dictionary is created for you. Adjust the credentials in the Python dictionary to correspond with the credentials inserted by the Insert to code function and run the dictionary code cell. The access credentials to the Object Storage instance in the dictionary are provided for convenience for later usage
In [6]:
# @hidden_cell
credentials_1 = {
  'auth_url':'https://identity.open.softlayer.com',
  'container':'vegaslights',
  'tenantId':'undefined',
  'filename':'1Vegas_Street_Light_Energy_Usage.csv'
}
Load data into pandas DataFrame
Run the next cell to load the data into a pandas DataFrame:
In [7]:
content_string = get_file_content(credentials_1)
pandaDF = pd.read_csv(content_string)
4. Explore data
Now that your data is in memory, you can begin to explore and manipulate it.
Show the first five and the last five rows of the data by using the head() and tail() methods. Run each code cell:
In [8]:
pandaDF.head()
Out[8]:
Start Date Time	Usage
0	10/14/2014 0:00	0.647
1	10/14/2014 0:15	0.647
2	10/14/2014 0:30	0.646
3	10/14/2014 0:45	0.647
4	10/14/2014 1:00	0.647
In [9]:
pandaDF.tail()
Out[9]:
Start Date Time	Usage
69969	10/11/2016 22:45	0.659
69970	10/11/2016 23:00	0.660
69971	10/11/2016 23:15	0.659
69972	10/11/2016 23:30	0.659
69973	10/11/2016 23:45	0.660
Similarly to count the number of rows in the csv file (Pandas dataframe) run the following code,
In [10]:
pandaDF.count()
Out[10]:
Start Date Time    69974
Usage               69974
dtype: int64
Enter the following commands in the next cell to set timestamp as the index if its present and click Run,
In [11]:
# change index to time if its present
header_list = pandaDF.columns.values
valueHeaderName = 'value'
timeHeaderName = 'null'

if (len(header_list) == 2):
    timeHeaderName = header_list[0]
    valueHeaderName = header_list[1]
else:
    valueHeaderName = header_list[0]
 
# Drop the timestamp column as the index is replaced with timestamp now
if (len(header_list) == 2):
    pandaDF[timeHeaderName] = pd.to_datetime(pandaDF[timeHeaderName])
    pandaDF.index = pandaDF[timeHeaderName]
    pandaDF = pandaDF.drop([timeHeaderName], axis=1)
    # Also, sort the index with the timestamp
    pandaDF.sort_index(inplace=True)
 
pandaDF.head(n=5)
Out[11]:
Usage
Start Date Time	
2014-10-14 00:00:00	0.647
2014-10-14 00:15:00	0.647
2014-10-14 00:30:00	0.646
2014-10-14 00:45:00	0.647
2014-10-14 01:00:00	0.647
5. Calculate z-score
We detect the anomaly events using z-score, aka, a standard score indicating how many standard deviations an element is from the mean.
Enter the following commands to calculate z-score for each of the values and add it as a new column in the same DataFrame,
In [12]:
# calculate z-score and populate a new column
pandaDF['zscore'] = (pandaDF[valueHeaderName] - pandaDF[valueHeaderName].mean())/pandaDF[valueHeaderName].std(ddof=0)
pandaDF.head(n=5)
Out[12]:
Usage	zscore
Start Date Time		
2014-10-14 00:00:00	0.647	0.547456
2014-10-14 00:15:00	0.647	0.547456
2014-10-14 00:30:00	0.646	0.545170
2014-10-14 00:45:00	0.647	0.547456
2014-10-14 01:00:00	0.647	0.547456
6. Plot Anomalies
When we work in notebooks, we can decide how to present your anlysis results and derived information. So far, we have used normal print functions, which are informative. However, we can also show the results in a visual way by using the popular matplotlib package to create plots.
Enter the following snippet of the code in the next cell to view the anomaly events in your data and click Run. Observe that the values for which the z-score is above 3 or below -3, marked as abnormal events in the graph shown below,
In [13]:
# ignore warnings if any
import warnings
warnings.filterwarnings('ignore')

# render the results as inline charts:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

'''
This function detects the spike and dip by returning a non-zero value 
when the z-score is above 3 (spike) and below -3(dip). Incase if you 
want to capture the smaller spikes and dips, lower the zscore value from 
3 to 2 in this function.
'''
def spike(row):
    if(row['zscore'] >=3 or row['zscore'] <=-3):
        return row[valueHeaderName]
    else:
        return 0
    
pandaDF['spike'] = pandaDF.apply(spike, axis=1)
# select rows that are required for plotting
plotDF = pandaDF[[valueHeaderName,'spike']]
#calculate the y minimum value
y_min = (pandaDF[valueHeaderName].max() - pandaDF[valueHeaderName].min()) / 10
fig, ax = plt.subplots(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')
ax.set_ylim(plotDF[valueHeaderName].min() - y_min, plotDF[valueHeaderName].max() + y_min)
x_filt = plotDF.index[plotDF.spike != 0]
plotDF['xyvaluexy'] = plotDF[valueHeaderName]
y_filt = plotDF.xyvaluexy[plotDF.spike != 0]
#Plot the raw data in blue colour
line1 = ax.plot(plotDF.index, plotDF[valueHeaderName], '-', color='blue', animated = True, linewidth=1)
#plot the anomalies in red circle
line2 = ax.plot(x_filt, y_filt, 'ro', color='red', linewidth=2, animated = True)
#Fill the raw area
ax.fill_between(plotDF.index, (pandaDF[valueHeaderName].min() - y_min), plotDF[valueHeaderName], interpolate=True, color='blue',alpha=0.6)

# Label the axis
ax.set_xlabel("Sequence",fontsize=20)
ax.set_ylabel(valueHeaderName,fontsize=20)

plt.tight_layout()
plt.legend()
plt.show()

As shown, the red marks are the unexpected spikes and dips whose z-score value is greater than 3 or less than -3. Incase if you want to detect the lower spikes, modify the value to 2 or even lower and run. Similarly, if you want to detect only the higher spikes, try increasing the z-score value from 3 to 4 and beyond.
7. Derive thresholds
Enter the following command into the next cell to derive the Spike threshold value corresponding to z-score value 3 and click Run.
In [14]:
# calculate the value that is corresponding to z-score 3
(pandaDF[valueHeaderName].std(ddof=0) * 3) + pandaDF[valueHeaderName].mean()
Out[14]:
1.719845702482371
Similarly, Enter the following command into the next cell to derive the dip threshold value corresponding to z-score value -3.
In [15]:
# calculate the value that is corresponding to z-score -3
(pandaDF[valueHeaderName].std(ddof=0) * -3) + pandaDF[valueHeaderName].mean()
Out[15]:
-0.9048056876197077
This threshold value can be used to create a rule in the Watson IoT Platform to detect anomalies in the current IoT device events in realtime. This will create an alert in realtime when the current sensor reading crosses the threshold value.
8. Summary
This Notebook showed how to use the z-score to detect anomalies in the historical timeseries data in simple steps. Also, showed how one can derive the threshold values for the given historical data to create rules accordingly in IBM Watson IoT Platform. Developers can consider this Notebook as a template for detecting anomalies in their historical IoT data and modify the python code depending upon the use case.

Credits

Thanks to Steve Mylroie.

Replications

Did you replicate this project? Share it!

I made one

Love this project? Think it could be improved? Tell us what you think!

Give feedback

Comments

Similar projects you might like

J.A.R.V.I.S. : A Virtual Home Assistant
Advanced
  • 4,348
  • 76

Full instructions

J.A.R.V.I.S. is a personal home automation assistant for controlling electrical home appliances integrated with an augmented reality app.

Biomed Shield for Arduino 101 Powered By Intel
Advanced
  • 4,987
  • 15

Work in progress

Allows students, educators, and hobbyists to learn about biomedicine by monitoring heart rate, temperature, and other physiological metrics.

Smart Mirror - Personal Home Hub Assistant
Advanced
  • 923
  • 8

Work in progress

Smart mirror & Android remote with voice/face recognition, light/window/boiler control, security control, and multimedia station with news.

Wolf: Search and  Rescue Multi-Terrain Robot
Advanced
  • 2,467
  • 26

Full instructions

Wolf is an IoT multi-terrain robot running on the Arduino 101 platform.

 Traffic Light Information System
Advanced
  • 1,891
  • 15

Full instructions

Traffic Light Information System that lets the driver know at which speed he needs to drive in order to pass the green light.

Sweep 3D Scanner
Advanced
  • 2,934
  • 39

Work in progress

A complete hardware and software kit for creating full spherical 3D scans with a Sweep scanning LiDAR sensor.

Sweep 3D Scanner

Team Scanse

ProjectsCommunitiesContestsLiveAppsBetaFree StoreBlogAdd projectSign up / Login
Respect project
Feedback