Friday, April 27, 2018

What is "The Internet of Things"

And How Do I Use It?

The "Internet of Things", or "IoT" is becoming a word or concept we hear about every day.  There is a fast growing industry of companies that sell IoT components, companies that research how to implement IoT, and people that want to jump in with their own ideas and products to mine a share of the future promise.  To be honest though, I still don't know exactly what IoT is, and how my customers can or can't use it to achieve their monitoring goals. This blog is my attempt to explain IoT to myself and determine how it is different or similar to what we currently do.

In a way the "Internet of Things" has been around since we started making network connections over the Internet between devices.

History

WYSIWYG

Eyasco started delivering data to clients using the Internet in 2003, and the principles of the company were dabbling in the concept even before then.  Back then we just wanted data - in either graphical or tabular format - to be available on a browser.  Our scheme was to use a database to collect and store the data, and to generate graphics and .csv files that could be simply referenced in a web page.  There was no database interaction - just refreshed graphics or tables each time we collected data.  We still use this concept today for publicly accessible pages where there is no authentication and no need for direct interaction with the database.  It is efficient and uses very little bandwidth - but it is non-interactive.  What you see is what you get.

Basic web-based "infopage" without database connection


M2M

At that time the SCADA industry was already beginning to offer Human-Machine-Interfaces (HMIs) via web servers built into their existing HMI software.  Initially the data was brought back to a central controller and then an online HMI was delivered via a web server.  Then the manufacturers began putting web servers in the controllers and using cellular modems to provide access to the web HMI.  This is what we call a wireless "Machine-to-Machine" application where two devices - a plc and a computer - are talking to each other over the cellular network.  From the perspective of the user they are seeing real-time conditions and able to manage and change settings from a remote location.


Real-time web-based HMI display of canal check and flow


The addition of the "Internet" as a communications pathway allows SCADA network to grow beyond what is connectable via a hardwired LAN or within "line-of-sight" in a radio network. Many utilities are managed by systems such as this, but there are serious security considerations when implementing this type of solution.  We'll have a blog about that at a later time.


IoT

This is really the extension of what was first realized in SCADA networks - the convenience of using not only the Internet, but also other wireless networks to create connections between devices. Each wireless network extends the number of connections and number of devices behind routers and firewalls (hopefully).  For example:
  • Wireless LAN to connect our computer or computer related equipment
  • Bluetooth networks to connect computer and printers, or car audio system and cell phones.
  • ZigBee networks for low-power extensible sensor networks
The IoT engineer is potentially dealing with a many layers of communication on different networks - each with their own characteristics and communication protocols to create a web of connected devices.



Manufacturers of Internet connected devices have found data collection easy and "transparent" so they can improve performance and supposedly tailor future products to customer needs.  The number, type and amount of data collected by manufacturers is probably unknown, but I read an article in the Financial Times that says the manufacturers of robotic floor vacuums is collecting data about floor plans (and might sell it - which is a whole other subject).  If robotic vacuum cleaners are collecting data about floor plans - you can imagine the scope of what else is being collected is quite large.  The application of IoT quickly moves into the realm of "Big Data" as more and more sensors are connected and a bigger and bigger part of the job becomes managing and interpreting the large amounts of data.

The "Smart" Revolution

The ubiquitous presence of wireless networks has led to an explosion of companies making devices that can take measurements and communicate with cloud servers with little or no programming or setup by the potential user.  This has led to lower and lower sensor prices with increased competition and focus on easy data collection and delivery.  The quality of a measurement is secondary to the ease with which data is collected - and it has really started a revolution in monitoring for the so-called purpose of making us all smarter.  More and more companies are specializing in the end use - developing custom applications and algorithms for specific applications such as agriculture, parking, the quality of our cities air, etc. 

Range of Smart Monitoring Systems by Libelium in Zaragosa Spain 

We are seeing "into" our environment through networks of sensors and becoming "smarter" about the world around us.  The potential bad side of this explosion of sensors is that the ability to manage the quality of the information we are gathering is fast slipping out of our financial and intellectual capabilities.  More and more sensors require more and more calibration and maintenance which we are not prepared for.  Big data sets and analysis algorithms are supposed to overcome the occasional bad sensor.  The pace of technological change also creates hardware that is not designed for anything other than a short lifetime.  This makes sense in a fast-changing technology environment.  Remember the whole IoT revolution is based on sensors being embedded in every Internet connected product.  The "long-term" means until the next iteration of technological species arrives on the shore.


Friday, December 23, 2016

Monitoring System Topology

I recently had a discussion with a client in Taiwan regarding the methods used to collect data from remote monitoring stations.  My client does a lot of landslide monitoring, and their technique is to install a cellular modem in every station and collect data from each.  This is a "Multi-Point Collection" network where data is collected separately from each station. Because the terrain is not only remote but also rugged and steep, they are always fighting with cell signal issues.  I have been trying to convince my client to use radios at sites with poor cellular signal, and route the data to a single station with a good signal strength.  For purposes of discussion we'll call this a "Single-Point" collection scheme - where a host station is responsible for polling and collecting data from remote stations over a local wireless network, and data is collected from only the Host station(s).  This is the technique we have incorporated into our wireless EmbankNet ™ dam monitoring system, and we have found it very efficient and cost effective for remote sites.  This blog will describe and attempt to point out the relative benefits of each of these approaches.

SCADA vs. Data Logging

Before we discuss the different forms of data collection, I should say that our discussion is centered around a methodology of monitoring called "data logging" as opposed to SCADA systems that use Programmable Logic Controllers (PLCs).  Data loggers typically are low-power devices designed to be connected to sensors and deployed in remote locations for extended periods, usually using a battery and a solar panel as a power source.  PLCs are found on the factory floor where there is abundant power and where they are dedicated to measuring and controlling things in real time.

Due to technological advances, the line between these two concepts is definitely blurring.  But for purposes of this discussion, a SCADA system is dedicated to controlling processes in real time without humans collecting and analyzing the data.  Data logging is dedicated to collecting the  sensor data for analysis, modeling and reporting.  It's important to point out that data may be collected in a SCADA system, and real-time control may be implemented in a data logging system.  But the hardware and software in each has evolved from a different core purpose.

So our discussion is about how we can collect data from data loggers using different network topologies using modems and/or radios.

Multi-Point Collection

This type of data collection using telemetry is the simplest to implement as it basically collects data from each monitoring station without reference to others.  Each station must have it's own telemetry that connects directly to the collection point at the home or office.  With this topology we periodically connect to each station directly with a server (or have each station connect to the server) and we retrieve the stored data. 

Multi-Point Collection

In the diagram above each station is connected to the Server through the Internet.  We tend to use this technique when our stations are spread over a wide area and there is no "line-of-sight" between them. The benefit of this approach is ease of deployment.  Each station stands alone and only needs some form of communication to access it.  It used to be a phone line, but now it's usually either a satellite or cellular modem.  The major limitation to this approach is that Internet connectivity is not always available at remote locations.  There is also a significant management and cost factor over the long term as service plans must be procured and maintained for each modem.  This is not a one-time occurrence or cost as technology and data plans are changing constantly and service providers may not have the same sensitivity to the importance of your data.  Service providers are creating machine-to-machine (M2M) data plans which makes management and provisioning easier.  But with over 15 years of experience using this approach I can tell you that change is the constant in this industry.

Single-Point Collection

This form of data collection involves using radios and modems in tandem to adapt to local conditions, and to consolidate data into fewer number of data collection points - called Host stations.

Single-Point Collection


This technique requires more up-front programming to configure as remote stations must send data to the Host either on a pre-determined schedule, or in response to a request from the Host. This means that radio configuration, synchronizing clocks and managing connection failures has to be added to the standard data logger programming.  Many people think that radio communications are problematic, but we have not found this to be the case.  In fact, properly programmed and configured, radio communications can be extremely robust and reliable.  And the brand of radio you use does matter.  Adding radios is a technological challenge that has to be overcome.  But it's a one-time cost in labor, and once you develop a system you can use it again in other systems you build.

The advantages to using this data collection method is more flexibility in design and layout, less long-term cost, and it's your network in that you are less dependent on a third party provider.  The disadvantages are increased complexity in programming.  You also need at least two communications ports (one for modem and one for radio) on at least the Host data logger.

Hybrid Systems

Hybrid systems use a combination of radio and cellular networks to adapt to local conditions and extend a monitoring network over a large area.  Where Internet connections are not available, or where we have a quantity of stations over a relatively small areal extent, we use a radio network to connect remote stations to a Host.

Hybrid System - Point and Multi-Point


Host stations are generally located where we have higher quality cellular service.  In this manner information can be consolidated for data collection and also shared from one Host station to another throughout the extended monitoring network.  Enhanced data visualization techniques, like web-based HMIs, can also be used at a Host station to provide real time access to data from almost anywhere. This type of hybrid data logging system with data being freely shared between separate monitoring systems starts to resemble a factory-floor SCADA system - but with a lower rate of data throughput.  Measured points are taken at a frequency appropriate for the purpose of recording the data, but the data is shared to enhance operations and improve awareness.  This will also improve the quality of the collected data as more stakeholders take an interest in data integrity.


Tuesday, November 22, 2016

Rating Curve Management and Display Tool

SiteHawk Rating Curve Tool
This blog is about a specific software tool that Eyasco has developed to aid some customers that have to measure river and stream flow.  This is particularly important for maintaining minimum flows where fish migrate up fresh water streams to spawn. 

What Is a Rating Curve?
A rating curve is a relationship between stage (water level) and flow or discharge at a cross section of a river. Stage is the height of a water surface above an established point and flow (or discharge) is the volume of water moving down a stream or river per unit of time. Flow values can be obtained by applying these rating curve formulas to stage measurements.




In the example above the discharge is 40 cubic feet per second (cfs) when the stage equals 3.3 feet. The dots on the curve represent measurements (concurrent) of stage and discharge used to develop the curve.

To develop a rating curve flow measurements are taken manually at different stage levels.  This is because the slope or curve of a river bank can change enough with water depth so that the relationship between water height and flow is non-linear.  This process is called "rating" the stream.  Historically log relationships were used to create the rating curves because they resulted in a near straight line on a log plot.  However, in more recent years polynomial equations have gained favor due their ability to better handle low-flow conditions.

If the channel geometry changes enough then more than one rating curve at a specific location may be necessary. For example, the picture below illustrates how there can be different volumes of water depending on the shape of the river bank, and there may be one rating curve for water heights less than 2 feet, and another for water levels greater than 2 feet.


Change in channel geometry also creates different stage-flow relationships
If the geometry of the river channel changes due to erosion or deposition of sediment at the stage measurement site then a new rating curve has to be developed.  Over time then a specific rated location will have many rating curves.  Deriving historic or real-time flow from these curves requires application of the correct rating based on the stage values and/or the date and time.

The Rating Curve Tool

The Rating Curve tool included with SiteHawk allows users to enter and manage rating curves and to derive plots of flow based on application of these curves to stage measurements. 




Each rating curve entry allows a user to define a date range for application.
 


And a minimum and maximum stage level



The application will apply rating curves based on their applicability to a specific level measurement, and generate continuous, historically accurate flow data.  The tool can chart both the stage and flow values. Changing the polynomials on the application will reflect instantaneously on the chart the next time it is generated. Graphs can be generated for viewing or export from the user interface.


Applying the Rating Curve In Real Time
Once a user has entered the rating curves for a specific rated stream location, they can generate plots of flow manually, or decide to use the curves to generate flow data in real time  - as if the flows were actually being calculated or measured in the field!  The benefit of using this approach is that flows can be displayed in real time on charts (Infopages) and maps (SiteHawk).  


The Rating Curve tool is a powerful application that allows users to manage and organize their rating curves, generate flow calculations, visualize their calculations with graphs, and automate flow calculations for real-time monitoring.

Tuesday, September 27, 2016

Big Springs Ranch Salmon Restoration Project

In 2009 the Nature Conservancy (TNC) began an effort to restore the coho salmon populations in the Klamath basin with the Shasta Big Springs Ranch Project. The Klamath River was once a major salmon producing. Changes to Klamath River flows and habitat loss due to human activities and development in the last 150 years have caused declines in the salmon runs. The Shasta River is a tributary to the Klamath River and was historically a major salmon producing stream.  Shasta Big Springs Ranch, located in the upper portion of the watershed, is a critical area where springs maintain both flow and cool temperatures.  TNC acquired the ranch and adjacent lands several years ago with the aim of improving habitat for coho and Chinook salmon and steelhead trout.
Shasta Big Springs Ranch Study Site (Reference Note 1)
Land and water use changes through time have led to water temperatures in the  Shasta River basin that do not support all life stages of these cool water fishes.   Critical to salmon and steelhead survival are appropriate water temperatures for summer rearing and spring-time juvenile migration.

Big Springs Creek’s water source provides water in the 10C-12C temperature range year-round and has flow rates that seasonally range from 40 to 80 cfs. Therefore, to create and maintain a suitable habitat for the Salmon along Big Springs Creek and downstream of the creek, maintaining the flow and these lower water temperatures provided by Big Springs Creek is critical.

Elevated water temperatures on Big Springs Creek were caused by low water levels and lack of shade due to loss vegetation, and inflows of irrigation return water. In 2009 livestock were prevented from entering Big Springs Creek which caused an increase in the aquatic vegetation including extensive emergence vegetations. This added vegetation increased water depth (via flow resistance) and shade, both aided in reducing heating of the water in Big Springs Creek. The added vegetation also provided cover for juvenile salmon and formed a basis of the food web (primary production and invertebrate populations) that support young salmon.
Year-on-year temperature measurements from Big Springs Creek and nearby irrigation canal

TNC was targeting a living landscape approach the would support both instream flow and habitat as well as providing a means to support agricultural practices.  In 2010 Eyasco was contracted to install a network of wireless solar-powered monitoring stations that would collect temperature and flow data and display real-time data on a web site that could share the data with ranch managers and staff. 

Automated temperature and flow monitoring stations on Shasta Big Springs Creek
The concept was that ranch managers could observe water temperatures at multiple locations on the creek and in diversion canals and return flow facilities, and use this information to operate in a manner that would minimize temperature impacts due to irrigation return flow.  Differences between irrigation water in off-stream canals and the water in Big Springs Creek in real time were accessible to managers, allowing them to release water back into the creek when temperature differences were within acceptable limits.
Daily temperature swings in creek and canal
 
Before the Shasta Big Springs Ranch project, only 30-60 feet of Shasta Big Springs Ranch had suitable habitat for salmon. After the introduction of better management practices including real-time temperature monitoring suitable habitat was increased to 10 miles with an overall 7.2 degrees C drop in the water temperature during summer. Eyasco’s real time low-power monitoring system continues to provide critical data year-round information for ranch managers.

Monday, June 6, 2016

Satellite M2M Communications - An Expensive Lesson


Satellite modem traffic (in Mbytes) at 6 sites over a 4 month period

Many of our monitoring systems use either cellular or satellite technology to transmit data from our remote sites to a computer tasked with managing the data.  This use of the technology is called "machine-to-machine" or M2M, and there are many companies now offering service plans tailored for this application. Cellular bandwidth fees are pretty inexpensive - and our data requirements are pretty low compared to the average consumer.  But satellite bandwidth fees are quite a bit more expensive and restrictive in the sense that maximum usage is capped at levels much lower than cellular 'limits'.  For example - a typical cellular usage plan might be something like $40 for 5GB of data a month where a satellite plan might be $44 for 2MB.  This simple fact taught us a valuable lesson about something that is happening on the Internet that most of us never see.  It's amazing really and it has me thinking seriously about the efficacy of connecting our infrastructure through the Internet.

The graph above shows monthly data usage at 6 sites using Galaxy Communications BGAN/M2M service.  The same amount of data collected each month from each site, yet the usage in the first two months is 3-10 times greater than in the last two months.  What gives?

One word - FIREWALL.  During the first two months shown on the chart there was no firewall enabled which allowed any IP access to the modem.  There was no real security vulnerability to the connected devices - the attached measurement controllers were not connected to any other infrastructure and there were no control capabilities built into them.  What was really surprising was analyzing the packets to see what other IPs were accessing or trying to access the modems.
  • Egypt
  • Philippines
  • China
  • Hungary
  • Japan
  • Greece
  • Russia
It was only through diligence and persistence of Eyasco employees that this was even discovered.  It took many hours over several months polling through packet reports to determine the cause of the extra usage over that anticipated for data collection.  Approximately 85% of the bandwidth usage without the firewall restricting traffic to a single IP is from "non-native" IPs.  Good for the satellite company as this resulted in "Out-of-Bundle" usage fees of over $1000.

It bears repeating that while this level of extra-curricular traffic is huge and costly for the satellite modems - it would probably not even be noticed on a cellular modem. The satellite modems above have monthly plans of 2Mbytes each. We have a cellular plan that includes 250Mbytes for any number of modems and we rarely go over.  It takes some serious IP camera viewing or web HMI viewing to jack the costs over the limit.  Even then the penalty is on the order of $50 rather than $1000.

And the conclusion seems to be that there is a significant amount of effort being expended world-wide to hack into any public-facing unprotected access point!




Thursday, May 19, 2016

Multi-Site Management with Merlin Enterprise


Eyasco started as a business building monitoring systems in the geotechnical and drinking water industries (another blog perhaps on "what is geotechnical monitoring?").  Our goal was to build monitoring systems that 'included' data management and display.  From day one we wanted a true end-to-end solution, the kind of thing you connect sensors to and view data on a web browser or a smartphone.  It sounds common place today - but we started in 2003.

 The innovative concept my partner and I came up with was to embed the data that was being collected with enough information that our data collection software would know exactly what to do with it.  In other words, instead of collecting the data in spreadsheets or in a database and cutting and pasting (in the case of the spreadsheet), or programming (in the case of the database), our software would collect the data and it would be ready for display because of the bits of information (metadata) we embedded in the data stream.  Our thinking was everyone who makes one of these monitoring systems has to program them – so that’s a given.  If we could eliminate the so-called “middle-tier” programming then we would create a fast-track to presenting data on the internet.  We built the monitoring systems in my garage, and were serving data on the Internet with a server in my bedroom.   
 





















First we monitored things like flow, water level, turbidity (water clarity) pH and conductivity at "mountain" spring sites.  After we got really good at these low-power systems, we started doing other types of monitoring, not only adding other sensors, but other types of monitoring for control and security.  We love the challenge of designing new systems, adding new technology and sensors, and integrating new types of telemetry.  But we rarely have to work on our software – unless a client requests something new.

The web display part has credentials and role assignment built into it, so access to data and web parts if controlled through a credential manager. 



So what makes our approach so good?  Imagine your business is sending water treatment systems all over the world.  You want to monitor the health of all of those systems, and you want to give each end-user the ability to monitor their own system.  Our approach would be to connect the sensor outputs to one of our QuB monitoring systems - which include Campbell Scientific measurement and control units (MCU).  We would program the MCU for the number and types of sensors.  Once the unit was deployed and connected to telemetry (cellular modem, iphone, whatever), we would see it and download not only the data, but it’s location.  It would show up on a map and all the data from the sensors would be visible in tabular and graphical form.  All of this without any programming on the data collection side.  The only configuration that is necessary is for the admin user to log in and define who gets to see what.  This is all done by creating and assigning roles through the web interface. 

 

This works very well for a small company like us.  Programming and configuration is largely confined to the controller - which we cannot avoid anyway.  But once that is complete the display pretty much comes with deployment.  When our field crew is finished with an install, the data is available on a password-protected web page before they get in the truck to leave.  All our clients know is that - they get their data.

Wednesday, August 26, 2015

The Business Model and Database Design

What is a "relational database"? You can look it up on Wikipedia:

A relational database is a digital database whose organization is based on the relational model of data, as proposed by E.F. Codd in 1970.[1] This model organizes data into one or more tables (or "relations") of rows and columns, with a unique key for each row. Generally, each entity type described in a database has its own table, the rows representing instances of that type of entity and the columns representing values attributed to that instance. Because each row in a table has its own unique key, rows in a table can be linked to rows in other tables by storing the unique key of the row to which it should be linked (where such unique key is known as a "foreign key"). Codd showed that data relationships of arbitrary complexity can be represented using this simple set of concepts.

The definition goes on to explain the differences with hierarchical data structures, etc.  Perhaps technically correct but doesn't tell the whole story.  To me the relational database is used to define how logical subsets of data are related and how they defend the integrity of the business model the database supports. 

Data Tables

In general the tables in a database should be "as little as possible" consisting of the least number of columns possible to define a unique record.  The link to other tables defines fundamental relationships between the data - like "parent-child" for example.  When constructed the entire database defines not only these relationships, but how data flows through the business process that supports it.  A well designed database defines the entire business model and can accommodate changes and additions with minor modifications.  This can happen when the designer spends enough time with his or her feet on the ground to understand the business process, and creates data structure that is granular - almost molecular - in it's composition. This takes the most time to create, but also creates the most flexible and long-lasting structure.  There are many other considerations, but there is no substitute for the really hard work of defining the business model with the database. 

A relational database is not a spreadsheet - or a collection of spreadsheets.  A spreadsheet makes sense for 2-dimensional representation of data and is used primarily to inform the human eye.  It works great for the human eye because we can quickly relate to the two-dimensions and peer down into the individual data pieces.  But it's not efficient for a computing engine - something that is designed to find and extract pieces of information as quickly as possible.  The example below shows how the eye can quickly find a measurement by triangulating between dates in the rows and the instrument in the columns. Suppose you were asked to find OW-12 on 7/20/94:


While the eye can do this in an instant, it's very inefficient form of data storage.  One way
to understand why is to look at the column headers.  They are unique for each instrument so they essentially require a custom data structure.  If you add a new measurement, you change the table structure.  Every time you search you don't know which column the result will be in.  Compare that structure with the following.


The table above would be used in a relational database to store the data shown in the spreadsheet. The data has been 'normalized' to minimize data redundancy and provide an efficient search path.  It consists of only three columns no matter how many instruments you have.  The first two columns define a unique record - the date and the sensor name. To find the record we found in the spreadsheet we work from left to right to find the day, then sensor, and then the value.  Not as easy for the eye perhaps, but easier for a database.

So how is this structure used to define a Business Model?  By first defining what constitutes a unique record in a table (called Primary Keys), and then creating relationships between tables, you define how data will be used to define your business model.

Relationships

Example - Customers, Contracts and Plants

Business Model 1

Lets assume your database is defining a customer, the contracts you have with that customer, and the plant where the work will be done.  Maybe you first consider a simple business model like:

" Plants and Contracts belong to a Customer.  Multiple Projects can be grouped under Contracts"

This can be represented with a simple organization chart as follows:



Business Model 2

What if another customer then presented you with another business model scenario, like:
 
"Project Numbers are specific to Plants, with the possibility of multiple projects under a single contract"
 
You need to be able to define something like the structure shown in the figure below where Plants are related to Projects:
 
 
 

Business Model 3

Then another customer presents you with another business model:
 
"Two separate customers with their own contracts and projects are using the same plant."
 
 
 
 
Unless you want to spend all your time programming, you want your database design to be able to represent and enforce the integrity of ALL the business models you have to support. The figure below shows an actual database design that supports the above scenarios.
 
 
 
 
It's not as complicated as it looks.  It almost looks like the data structure for Business Model 1.  The key is that Plants is not connected directly to Customers.  It is linked to both Projects (Contract_Projects) and to Customers.  The link to Customers is not direct either.  It is using a "relationship-table" where the relationships between plants and customers is defined.   All the above relationships defined by business models 1 through 3 are supported by this structure - without data redundancy.