CC401 :: Self healing and redundancy in virtual appliances

This is the dissertation from my MSc in Embedded Systems and Robotics at the University of Essex. The original report is available for download in PDF format. This online version has had some sections omitted for space considerations (these include UPnP interface descriptions, Java class descriptions and the appendecies).
virtual-appliances-self-healing.pdf (1,766.74 Kb)

I. Abstract

Virtual Appliances is the de-composition and re-composition of modern-day appliances in exciting new user-lead ways. The goal is to extend the functionality of existing appliances and even build new ‘virtual’ appliances from a set of distributed services. Previous work demonstrated the concept of Virtual Appliances and laid the technological groundwork. In this project the previous work is extended to a number of areas that had not previously been examined. In particular this project focuses on how these Virtual Appliances can self-heal in the case of some sort of failure and how the configuration data that defines the Virtual Appliance can be distributed to allow redundancy but without resulting in a single point of failure.

Universal Plug and Play (UPnP) is at the core of the vision of Virtual Appliances providing service discovery on ad-hoc networks. Some of the limitations of UPnP shall be examined in this project along with possible extensions and modifications to the protocol to enable quick and reliable failure detection and self-healing.

This project successfully implements a failure detection and self-healing mechanism that can recover from device failures in a timely manner. A replacement device is chosen such that it meets a given criteria and does not interfere with any other virtual appliances on the network. Configuration data is automatically restored onto the replacement device allowing the virtual appliance to continue to function.

1. Introduction

1.1. Overview

In the future our homes will be filled with computers. These computers will not be the kind that sit on our desks but instead are microprocessors that will be built in to all of our everyday appliances. Everything from the toaster in the kitchen to the television and DVD recorder in the lounge will be augmented with microprocessors for the sole purpose of improving our quality of life. Mark Weiser [31] described this vision fourteen years ago when he came up with the term "ubiquitous computing" and told how computers would be integrated into the living spaces of the future empowering the users and aiding them in their everyday lives.

All of the appliances in our homes today already contain embedded processors of some description in order to control their operation, but what they don’t currently have is the ability to share their resources and services with other appliances in home. Imagine if all of the appliances in your home were decomposed into their core components and that these components are able to communicate with the components of other appliances over some kind of network (Ethernet, WiFi, Bluetooth, Zigbee, etc). You would then have the ability to extend the functionality of existing appliances and even build new "virtual" appliances from all of these components and services.

This is the vision of Virtual Appliances and is a completely new and novel idea. It has the potential to massively empower users by allowing them to completely re-configure appliances and build new ones from a wide selection of "building blocks". In essence it allows appliances to become more than the sum of their parts, and could even be considered to be a paradigm shift in terms of pervasive and distributed computing.

The process of creating a virtual appliance involves creating simple event rules to associate devices. For example, to build a simple light switch two components are needed; a light and a switch. A rule is then needed to associate these two components to create the light switch such as IF switch.state==1 THEN light.state==1 ELSE light.state==0. Obviously the more complicated the virtual appliance the more rules that are needed. This interaction between components relies on an underlying distributed event system such as those provided by service discovery protocols (UPnP[18], Jini[8], Salutation[12], etc.).

Due to the nature of pervasive computing available services will be distributed throughout the home by many appliances. As such it makes sense that the rules that make up the virtual appliances be distributed throughout the appliances on the network. This gives rise to a completely decentralised system. One of the main problems with pervasive and distributed systems is that they are highly volatile in nature. The failures that occur in distributed systems are partial failures caused by some of the components failing while others continue to function. These partial failures can be a result of actual device failure, the user unplugging or moving devices, or failure in network links (in particular wireless networks where devices could go out of range or interferance causes devices to become disconnected).

1.2. Scenario

Lets assume that we want to extend the functionality of our television so that if the phone rings or the door bell is rung the TV is "paused". In order to achieve this the following services are required:

  • TV Tuner
  • Video Display
  • Digital Storage
  • Telephone
  • Door Bell

Event rules are then automatically created and distributed amongst the devices so that if telephone or door bell’s state changes to ringing then the digital storage service starts recording the video stream from the TV tuner service. The video display then switches so that it displays the video file now being dumped into the digital storage service. When the telephone and/or door bell’s state changes to back to not ringing the video display service changes mode to play. From then on instead of watching the live TV feed, you watch the video feed from the digital storage service.

The event rules that make up the virtual appliance are stored on the actuator devices, or the devices that need to respond to an event signal from another device (e.g. the digital storage service). To allow for redundancy, a backup copy of the event rules is stored on the device that generates the event notification (e.g. the telephone).

The leasing and subscription mechanism implemented by the service discovery protocol allows all the devices to detect when any of the services listed in the event rules fail. When this occurs the device then begins searching the network for a replacement service. If one can be found then the backup copy of the rules stored on the device are sent to the replacement service. For example, if the digital storage service fails, then the telephone service will realise that the digital storage service did not renew its event subscription. The telephone service will therefore make the assumption that the digital storage service has failed and begin searching the network for a replacement digital storage service. Once one has been found the telephone service then uploads its copy of the rules that were stored on the original digital storage service to the newdigital storage service. Provided that the lease periods are short enough and that a replacement service can be found, then the user should not notice that anything has failed.

1.3. Project Goals

Aim: My undergraduate degree project [22] laid the technological groundwork and successfully demonstrated the concept of creating, configuring and managing virtual appliances. The Virtual Appliances vision encompasses a wide variety of problem domains, as such not all of these issues could be addressed in the time available. This project aims to focus on the distributed nature of virtual appliances and in particular how they can be made to self-heal in the event of failure and how the configuration details that define the virtual appliances can be distributed throughout the network in such a way that it is resilient to failure and data loss.

Objectives:

  1. Complete research into suitable strategies for self-healing and rule redundancy.
  2. Design a mechanism to quickly detect device failure to allow for quick recovery.
  3. Add self-healing mechanism to existing system so that suitable replacement services can be located.
  4. Add rule redundancy mechanism so that event rules get stored on multiple devices.
  5. Add rule restoration mechanism so that event rules get restored onto replacement devices in the self-healing process.
  6. Test and evaluate system.

Further Objectives:

  1. Create a new user interface allowing the user to manage the virtual appliances.
  2. Create some real UPnP devices.
  3. Create a "proxy" device to allow non-UPnP devices or UPnP devices that do not support rule storage services

2. Background

2.1. Universal Plug and Play

Virtual Appliances require a service discovery protocol to enable the automatic discovery of other devices on the network and to provide a standard interface for obtaining state information and controlling the devices. UPnP was chosen for the original implementation of the virtual appliances system due to its industry support and it being an open standard. Unlike some other service discovery protocols UPnP also provides a form of RPC allowing devices to invoke remote actions and subscribe to events offered by remote services. This section gives a brief overview of the UPnP protocol.

UPnP stands for Universal Plug and Play and is an open standard backed by the likes of Microsoft and Intel along with 760 other vendors, including industry leaders in consumer electronics, computing, home automation, home security, appliances, printing, photography, computer networking, and mobile products [18].

UPnP uses TCP/IP and employs existing Internet technologies such as HTTP, XML, SOAP and GENA allowing discovery and control of devices regardless of the operating system or programming language.

Figure 1: UPnP devices in the digital home

Figure 1: UPnP devices in the digital home

UPnP devices are logical containers with specified device types. Each UPnP device contains a number of related services each with its own unique service type. Each of these services contains a set of state-variables, reflecting the various states of the service, as well as actions (similar to remote procedure calls) allowing control point to control certain aspects of the device [27]. To aid interoperability the UPnP Forum has drawn up a number of device and service definitions giving specific types of device a standard interface.

Figure 2: UPnP Control Points, Devices and Services

Figure 2: UPnP Control Points, Devices and Services

UPnP devices are passive entities that can be discovered and controlled by UPnP control points. Devices periodically announce their presence to the network and also respond to search requests from control points. Once a device is discovered control points can then invoke actions on the device as well as retrieve state information by querying the service’s state variables. A service’s state variables can also be evented allowing control points to subscribe to the service and receive notifications whenever the state of the variables change [27].

Figure 3: Steps to UPnP networking

Figure 3: Steps to UPnP networking

There are six steps to UPnP networking as shown in Figure 3. The first step is for a device to obtain an IP address on the network either via DHCP or Link-Local addressing. Once the device has configured itself with an IP address the next step is to announce its presence on the network allowing control points to discover it. Device discovery employs SSDP (Simple Service Discovery Protocol) which sends HTTP style messages over UDP Multicast (this is known as HTTPMU). Included in the devices announcement is a lease time. Control points can assume that if the device has not re-advertised itself before the lease time in the advertisement has expired then the device has been removed from the network. As such active devices will periodically announce their presence on the network. If a device wishes to leave the network then it should cancel its advertisement allowing control points to be informed immediately that the device is leaving.

The discovery step also deals with control points searching for devices. Control points can search for all devices, devices of a specific type, or a specific device. The search query is handled by SSDP and sent using HTTPMU. All devices to whom the search query applies will then respond to the control point using HTTPU (unicast direct to the control point). Typically a control point will only need to search for all devices when it first starts up in order to receive a list of devices currently on the network. After that the control point will then receive device advertisements and advertisement cancellations as devices come and go.

Once a control point has discovered a device it is interested in then it can query the device for a detailed description and a list of services that it provides. The device’s advertisement contains a URL to an XML description document which the control point can then request using a regular HTTP request. Once the control point has the description of the device it can then invoke actions on the services offered by the device. Invoking an action is similar to a remote procedure call and may take a number of arguments as well as return a number of values. Action requests are handled by SOAP (Simple Object Access Protocol) which is an XML based protocol normally used by web servers in a decentralised, distributed environment. Actions may take a number of arguments.

Some state variables offered by a service may be evented. In this case a control point can subscribe to a service hosted by a device. Once subscribed, whenever the value of any of the service’s state variables is updated a notification is sent to all subscribed control points. The event notification mechanism utilises an extension to HTTP known as GENA (General Event Notification Architecture). When a control point subscribes to a service it includes a lease time in its subscription request. The service will then continue to send event notifications to the control point until the subscription expires. If the control point wishes to continue receiving event notifications then it must renew its subscription before it expires. Additionally, if a control point no longer wishes to receive event notifications then it may cancel its subscription. The final step to UPnP networking is presentation. This is simply a web page hosted on the device that can be viewed in normal web browser and reflects the state of the device and may even allow the device to be controlled. However, the UPnP Forum places no requirements on the presentation page other than it be written in HTML and delivered over normal HTTP.

2.2. Failure Detection in Distributed Systems

A common mechanism used to detect failure in distributed systems is through the use of leases whereby a service grants a client access to its resources for a limited period. If the client requires access beyond the original lease period then the client must renew its lease. The service assumes client failure and terminates lease if the client fails to renew its lease before it expires.

As described in the previous section, UPnP uses leases in two ways. Firstly, when a UPnP device is started it announces its presence on the network, but this advertisement is only valid for a specified duration. If the advertisement is not renewed before it expires then control point on the network can assume the device has failed. If the device wishes to leave the network then it must revoke its advertisement. The second way in which leases are used is when a control point wishes to subscribe to the state variables on another device. The control point makes a request for a subscription and includes a lease time in its request. The device then grants the subscription and has the option of changing the granted lease time, though typically this does not happen. During the lease period the device will send event messages to the control point. If the control point wishes to receive event messages beyond the period of the lease then it must renew its subscription otherwise the device will assume the control point has failed and terminate the subscription. [16]

Obviously there is a trade-off between bandwidth utilisation and system responsiveness when deciding the duration of the lease. If the lease period is too short then the client will be constantly renewing its lease. If there are many clients then the network will become flooded with lease renewals. If the lease period is too long then it could be a considerable length of time before the service realises that a client has failed resulting in degraded system responsiveness. Bowers et al.[21] describe two solutions to this problem by using self-regulating algorithms to adapt the lease period to achieve the best possible response time whilst respecting resource constraints. Although they studied the algorithms using the Jini service discovery protocol they expect that their algorithms will work with similar leasing systems such as that used in UPnP.

2.3. Distributed Database Systems

In the Virtual Appliance vision rules are used to link individual components together and define the virtual appliance. Because the services themselves are distributed the rules that make up the virtual appliance should also be distributed with multiple copies to allow for redundancy. The section examines how similar data is distributed throught a system.

2.3.1. Domain Name System

The Domain Name System (DNS) is a distributed database of host information that is responsible for translating names into addresses. The domain name space is divided up into subdomains, for example, .uk, .ac.uk, essex.ac.uk. Decentralised administration of the DNS system is achieved through delegation whereby organisations are responsible for their domain, and those domains may be broken up further into subdomains which are managed by different departments within the organisation. DNS servers are the programs that store information about the domain name space. The DNS specification defines two types of servers; primary master and secondary master (also known as a slaves). [20]

The primary master servers load their zone information from files on disk, whilst the secondary load their information directly from the primary master servers. The secondary master servers are provided for redundancy and in the event of failure the secondary servers will be queried. The primary master server’s zone files contain TTL fields which state how long secondary servers are allowed to cache host information for and an incremental serial number allows secondary servers to insure they have the latest copy of the zone information. Zone information is updated on the primary master server and secondary master servers periodically update their zone information

2.3.2. Distributed Shared Memory

Distributed Shared Memory (DSM) is an abstraction used for sharing data between computers that do not share physical memory. Processes access the DSM through reads and updates and an underlying run-time system ensures that all processes observe the updates made by one another. The current popular DSM implementations include TSpaces[15] and JavaSpaces[9] which both use tuple space based coordination languages based around the original Linda System developed by David Gelernter and Nicolas Carriero at Yale University in the 1980s.

Using DSM to enable distributed agent communication has the advantages that the system will be robust as a single agent failing will not cause the whole system to fail. The replication and mirroring of persistent data enables communication regardless of system or partial network failure. Additionally, the use of persistent spaces allows processes to communicate even if the various processes are not running at the same time.

Transactions have been widely adopted by DSM systems, including TSpaces and JavaSpaces, to provide fault tolerance and ensure consistent data structures if a process fails. However, there is still a problem regarding the correctness of the information in the space after a failure. According to Rowstron [30] "In a DSM system, although it is possible for the underlying infrastructure to detect that an agent has stopped responding, it is not easy for the agents to decide that another agent has failed. How this can be detected varies on the particular attributes of the DSM implementation. For example, in most tuple space based DSMs this is difficult/impossible to detect". Rowstron describes a solution to this problem of fault tolerance to sudden agent failure in tuple space languages through the use of "agent wills". An agent will is essentially a piece of code that is passed to the run-time supporting the DSM that should be executed in the event that the system believes the agent has failed. This effectively allows agents to ensure the shared data structures are application consistent even after the agent has failed.

2.4. Self-Healing Systems

2.4.1. AutoHAN (University of Cambridge)

The Home Area Networking (HAN) Group was set up in September 1995 at the University of Cambridge Computer Laboratory. One of their main projects is AutoHAN [25] in which the group are trying to "solve the basic problems of home control where a multitude of devices must interact with each other and the residents in a sensible manner". The middleware at the core of the project uses standard internet technologies including XML, HTTP and GENA. This is similar to UPnP, however AutoHAN includes a form of access control which is handled by a registry service.

An AutoHAN consists of all AutoHAN compliant devices connected to a multicast group that spans the home. These devices can be dumb devices served by proxies, AutoHAN servers which provide services such as registration, encryption, video storage, and general purpose execution resources, and an AutoHAN active home server which is the master home server. Any AutoHAN server can act as a master home server, the main feature of which is that it is aware of resource allocation in all parts of the network and can allocate new requests to available resources.

The AutoHAN system therefore has more centralised control than that of virtual appliances where everything is distributed. Self-healing in AutoHAN involves a master server being elected from all available AutoHAN servers. Unfortunatly it is not clear if or how the resource allocation information that is held by the master server is communicated to other master servers in the event of failure.

2.4.2. Using JavaSpaces to Create Adaptive Distributed Systems (Adger University College, Norway)

Engelhardsten and Gagnes [24] describe the creation of an adaptive distributed system that usesthe JavaSpaces implementation for a DSM for communication. Their space-based architectureutilises three specialised agents known as ActorAgents, ProtocolAgents and Role&RoutingAgents.

ActorAgents have the ability to play different roles (specific behaviours) within a domain and some may represent specific entities in the real world such as WAP servers or SMS servers, etc. Some ActorAgents may be able to host more general service logic and will exist in a pool waiting to be give roles. This agents are known as Generic ActorAgents and aid the system to adapt to changes in demand. Role Repositories contain the execution code for various roles to be played out in a domain and these roles a assigned to Generic ActorAgents as required.

Role&RoutingAgents are present in each domain and are aware of capabilities of the ActorAgents within their domain, and therefore the capabilities of their domain. If the Role&RoutingAgents cannot find an agent to perform a required role in their own domain then they forward the request onto another Role&RoutingAgent. ProtocolAgents simply perform a mapping between protocols allowing domains to communicate over different protocols (e.g. SOAP, Bluetooth, etc).

Figure 4: Engelhardsten and Gagnes' space-based agent architecture

Figure 4: Engelhardsten and Gagnes’ space-based agent architecture

The Jini distributed transaction model ensures consistent execution of actions based on the use of the two-phase commit protocol. This allows transactions between agents to be fully completed or to be completely aborted ensuring consistency in the system. If an agent were to fail during an operation the transactions will ensure consistency, but this may not be enough to reconstruct the application state. ActorAgents are able to establish sessions across JavaSpaces but an issue arises in how to recover when an ActorAgent fails and a session is lost. This is a general problem associated with distributed computing and not one specific to space-based communication. As previously described, it is difficult, though not necessarily impossible, to detect the failure of agents in a DSM system, and the approach taken by Engelhardsten and Gagnes is to use the notion of agent wills as described in section 2.3.2. Then if a space detects than anagent has failed, the agent’s will is executed. Engelhardsten and Gagnes suggest that wills could be in their system by roles with a special will behaviour and if an ActorAgent fails its will role is executed by another Generic ActorAgent.

2.5. Supporting Heterogeneous Networks

A truly pervasive and distributed environment will consist of numerous devices of differing types, size, processing power, amount of memory, etc. The Virtual Appliances architecture employs UPnP for automatic service discovery and control, however not all devices, especially small devices such as switches and sensors, will have the processing power or memory to run a UPnP stack, and if the processing power could be added it would simply be uneconomical. Additionally, there are many existing devices that do not even use TCP/IP or Ethernet for network communication, e.g. X10[19] and Lonworks[10]. Finally, standard UPnP devices will not be able to support the distributed event rule system that comprises a Virtual Appliance and therefore another device will have to store and handle the rules for that device.

2.5.1. iDorm (University of Essex)

The iDorm at the University of Essex[29] is an intelligent dormitory modelled on typical student accommodation at the university. The iDorm may look like normal student room at first, but it is in fact filled with a variety of sensors and various devices in the room such as the telephone and the window blinds have been augmented with microprocessors to monitor and control them. Although the focus of their work is on creating intelligent environments, the focus here is on their network architecture.

The iDorm consists of three networks supporting different devices; 1-Wire, Lonworks, and IP [26]. The 1-Wire (Figure 5b) network consists of a large number of temperature sensors and a door security lock device. All of these devices are connected to a TINI board (Tiny INternet Interface) which runs a Java virtual machine and can communicate with an Ethernet network using TCP/IP. The TINI board can then run a program to proxy control and status messages from the IP network to the 1-Wire network and vice-versa. The Lonworks network (Figure 5c) consists of a wider variety of devices which can communicate with the IP network via the iLon Server.

Figure 5: Different networks and devices in the iDorm

Figure 5: Different networks and devices in the iDorm

Figure 5a shows how the 1-Wire and Lonworks network are conencted to the IP network. The IP network also contains the multimedia PC in the dormitory, some embedded agents and a web interface onto the devices in the iDorm. The iDorm Gateway provides a standard interface onto the various subnetworks in the iDorm and communicates via standard XML messages over HTTP.

3. Requirements

This section lists the functional and non-functional requirements for the system.

3.1. Functional Requirements

Outlined below is the functionality that the system should provide.

  1. Devices members of virtual appliances must be able to detect failure of other member devices.
  2. Upon detection of failure a suitable replacement device should be searched for.
  3. If a suitable replacement device matching a specified criteria is found then the virtual appliance should use the new device and continue to operate as before.
  4. The system should provide the ability to specify a specific device as preferred and make best efforts to switch back to that device upon its return.
  5. The system should provide the ability to allow or deny other virtual appliances from using devices currently in use by another virtual appliance.

3.2. Non-Functional Requirements

The constraints on the design of the system are outlined below.

  1. The program code should be platform independent.
  2. The system should be designed such that is does not require any network infrastructure to be in place (e.g. DNS, DHCP, routers, gateways, etc).
  3. The system should be as decentralised as possible.

4. System Design

4.1. Lease Periods in UPnP

Tests were conducted to determine the size and number of messages that a single UPnP device and control point sends in the certain situations. The UPnP device used in these tests contained three services. A packet sniffer was used to see the size and contents of the messages being sent. It is important to note that two announcements are sent for the device itself plus one announcement for each of the services that the device offers. Additionally when the device starts up it sends advertisement cancellation messages to ensure control points have no reference to an old instance of the device. As with the announcement messages, two cancellation messages are sent for the device itself plus one for each of the services that the device offers. Table 1 shows the size of each of the messages, and the contents of the messages is available in Appendix A.

Table 1 – UPnP messages sizes
Message Type Message Size
ssdp:byebye rootdevice 195 bytes
device 247 bytes
service 249 bytes
ssdp:alive rootdevice 287 bytes
device 339 bytes
service 341 bytes
Subscribe 416 bytes
Renew 387 bytes
Notify 531 bytes

From the information in Table 1 it can be seen that when a device starts up (assuming it contains three services) it will send 10 messages totalling roughly 2,838 bytes. Assuming the device’s lease time is 30 minutes, which is the value recommended by the UPnP Forum, then every half hour another 5 messages will be sent totalling approximately 1,649 bytes. Table 2 shows what the theorteical maximum number of devices a network with a specified bandwidth can support for varying lease periods.

Table 2 – Maximum number of devices that can be supported based on bandwidth and lease periods assuming each periodic advertisement totals 1,649 bytes
Bandwitdh: 100mbit (Ethernet)
Lease Period: 1800 seconds Max Devices: 13,644,633
Lease Period: 600 seconds Max Devices: 4,548,211
Lease Period: 60 seconds Max Devices: 454,821
Bandwitdh: 11mbit (WiFi – 802.11b)
Lease Period: 1800 seconds Max Devices: 1,500,910
Lease Period: 600 seconds Max Devices: 500,303
Lease Period: 60 seconds Max Devices: 50,030
Bandwitdh: 1mbit (HomePNA)
Lease Period: 1800 seconds Max Devices: 136,446
Lease Period: 600 seconds Max Devices: 45,482
Lease Period: 60 seconds Max Devices: 4,548
Bandwitdh: 250kbit (ZigBee)
Lease Period: 1800 seconds Max Devices: 34,112
Lease Period: 600 seconds Max Devices: 11,371
Lease Period: 60 seconds Max Devices: 1,137

The figures in Table 2 only take into account device advertisements and do not include subscription renewals or event notifications. Obviously these extra messages will limit the maximum number of devices, but even with devices renewing every 10 minutes an 11mbit WiFi network can support approximately half a million devices! The choice of lease time is clearly a trade-off between system responsiveness and network utilisation. Also, the frequency with which devices are expected to fail will affect the choice of lease period. The minimum lease time recommended by the UPnP forum is 30 minutes, however they do state that mobile devices may require shorter leases [17]. Another important consideration is the amount of other traffic on the network which, in the homes of the not-to-distant future, could well include high-bandwidth applications such as multiple radio streams, numerous video streams, online gaming, and general internet browsing.

4.2. Location Information

Rooms are a natural way of partitioning people’s behaviour and activity, and in order to make the user experience of creating and configuring virtual appliances as simple as possible, some services will need the ability to announce their location. The concern here is how to broadcast this information once it is known. Additionally, not all devices are static and some devices may be moved around. Therefore the location of certain devices and services are likely to change and the system will need to be informed of these changes.

The location of services is not necessarily important for all devices however. For example, if the user wishes to create a motion triggered light that keeps a light on for a certain time after motion has been detected, then they will want to specify the location of the sensor and the light, but they probably do not care where the timer service is located. The location of these devices is also important to self-healing. If the motion sensor or light fail, then it is desirable for a replacement to be found in the same area, However, if the timer fails it does not matter where the timer service is located as long as a replacement can be found.

The UPnP specification makes no provision for a device or service to announce its location in its initial and periodic advertisements and neither is there provision for this information to be included in the XML description documents. As such, location information would need to be implemented as a separate service offered by the device, however there is no standardised UPnP Location service yet. One major disadvantage of implementing location information as a UPnP service is that it is very difficult for a UPnP Control Point to quickly determine the location of a UPnP device. The control point would have to iterate over every device of a particular type that it is interested in and send a query to each of the location services. This method is obviously time consuming and wasteful of network resources.

A solution to this problem is suggested by Kutter et al.[28]. They suggest extending the SSDP messages, sent out by devices to advertise their presence, by adding two extra headers "Mobility" and "Locality" which advertise the mobility and locality of the device, respectively. This method provides the UPnP Control Point with the knowledge of the devices location information with its advertisement negating the need to query the device for its location information. A UPnP Location service can still be provided for backwards compatibility allowing normal UPnP Control Points to obtain the devices location information. In the case of mobile devices a new SSDP message can be broadcast whenever the devices location changes and the location variable of the location service can be evented.

4.3. Rules

Rules are used to create associations between devices, and it is these associations that form virtual appliances. UPnP uses an event based notification scheme whereby subscribers are notified of changes in the state of a service. The rules used in virtual appliances are event based and are triggered when an event notification is received. As the services themselves are distributed then the rules should also be distributed amongst the devices that make up the virtual appliance. This avoids having a centralised system with a single point of failure. To allow for redundancy multiple copies of the rules need to be stored so that in the event of a device failure a replacement device can be located and a backup copy of the rules restored onto the replacement device.

The first version of the virtual appliances system used simple IF…THEN event rules that were triggered when an event notification was received that matched the condition clause of the rule. If the condition evaluated to true then the corresponding action was fired. These rules were transferred between devices using a UPnP service and the various parts of the rule were passed as state variables. This decision was taken due to its simplicity and low computational requirements when compared to parsing XML documents or other languages that could be used to express event rules. However, it was discovered that low-power embedded boards such as TINI and SNAP boards [6] are not suitable for running a UPnP enabled device due to the limited processing power and memory.

This time the event-rules will be defined using XML. This will provide much more flexibility over the design and complexity of the rules. One of the major drawbacks of using state variables to pass the various sections of the event rules over UPnP actions is that the number of arguments is fixed, not dynamic, and only primitive data types can be sent (strings, ints, floats, booleans, etc). Additionally, UPnP already uses XML for the device and service description documents and so parsing XML event rules will not require any extra processing power.

As well as the condition and action parameters for the rule, additional information will also be required in order to identify which virtual appliance this rule is part of, a unique ID for the rule within the virtual appliance’s rule base, and a version number for the rule. This will allow the rules to be identified and allow rule updates. Extra features are also included to allow the user to specify in which locations devices should be used and to specify a preferred device to use. Additionally some virtual appliances may require the use of a device that cannot be shared with other virtual appliances, this will typically include services such as timers, tv tuners, and possibly display devices. Devices such as motion sensors and cameras should be able to be shared by several virtual appliances. In the case where exclusive use is required a flag will be set in the rule stating this is the case. When the rule is submitted to the device it will check whether it can offer exclusive use to the rule depending on whether it is already in use by another virtual appliance.

The event rules will need to contain the following information:

  • Name of the virtual appliance this rule belongs to
  • The ID of the rule (unique within this virtual appliance)
  • The version number of the rule
  • Rule condition

    • device type
    • device location
    • ID of preferred device
    • can any other virtual appliances share this device?
    • service type
    • state variable name
    • state variable value
  • Rule action

    • device type
    • device location
    • ID of preferred device
    • can any other virtual appliances share this device?
    • service type
    • condition==true

      • action name
      • action arguments
    • condition==false

      • action name
      • action arguments

4.4. Virtual Appliance Manager

A Virtual Appliance Manager is needed to allow the user to create virtual appliances and submit the rules to the devices. The Virtual Appliance Manager is therefore responsible for constructing the necessary rules to create the virtual appliance desired by the user and to then find suitable devices on the network to build the appliance from. It is important to note that although the Virtual Appliance Manager is required to create and manage the virtual appliance, its presence is not required for the successful operation of the virtual appliances or for the self-healing and redundancy mechanisms.

4.4.1. Designing a Virtual Appliance

The original Virtual Appliance Manager only allowed virtual appliances to be created by manually writing each of the rules required to create the virtual appliance. This is clearly not user-friendly and would completely baffle any user not familiar with the inner workings of the virtual appliance system. A more user-friendly approach would be to use virtual appliance "templates". These templates define the devices required to build the virtual appliance along with the necessary rules. All that the user is required to do is select the location of the devices that should be used and, if necessary, specify any parameters specific to the virtual appliance (e.g. timer durations, start and stop times, etc).

4.4.2. Creating the Virtual Appliance

Once the virtual appliance has been designed and the rules created, suitable devices need to be located on the network and the rules submitted to them. To make best use of UPnP’s subscription and notification abilities the rules will be submitted to the appropriate output/actuator devices (e.g. lights, timers, displays, etc) as specified in the action clause of the rule. These devices will then subscribe to the input devices specified in the condition clause of the rule. When subscribing to the device a copy of the rule will also be submitted for redundancy purposes. In this way there are at least two copies of every rule and each device only carries a copy of the rules specific to itself (because it is listed in either the condition or action clause of the rule).

When submitting rules to many devices it is important that all the devices successfully accept the rule. If any of the devices fail to accept the rule then a replacement must be found. If none can be found then the operation should be aborted. In effect an update is being made to a distributed database (the rule bases held by each of the devices). In distributed database systems a two-phase-commit protocol is used to ensure that data can be written successfully to all locations, and that all operations can be rolled back if one or more of the operations fails. Due to the event-based nature of virtual appliances there is a chain of events and actions that are triggered when the device at the beginning of the chain changes state. The Virtual Appliance Manager submits the rules to the devices according to their order in the chain as each device in the chain needs to know exactly which device it needs to subscribe to for input. The user guides this selection process by specifying the location of certain devices, but it is the Virtual Appliance Manager that chooses the specific devices based on availability. The rules are submitted to each of the devices on a temporary basis. Once all the rules have been successfully submitted the Virtual Appliance Manager then requests each device commit the rules. If any device fails to accept a rule and a suitable replacement device cannot be found then the Virtual Appliance Manager requests all devices to delete the temporary rules and the entire operation is aborted.

The possible reasons for failure to submit a rule include:

  • device type not present on the network
  • device type not present in the location specified by the user
  • rule requires exclusive use of a device and there are no spare devices in the required location that can offer exclusive use
  • invalid rule (caused by design error in template creation, all rules are validated against a schema to ensure correctness)

4.4.3. Example – Motion Triggered Light

Lets assume that the user wishes to create a motion triggered light. When motion is detected the user wishes the light to stay on for a specified amount of time before turning off. This virtual appliance requires three devices; a motion sensor, a timer, and a light. Two rules are required; one for the timer telling it to subscribe to the motion sensor, and another for the light telling it to subscribe to the timer. One rule is submitted to the timer, and the other to the light. In turn, the timer submits a backup copy of its rule to the motion sensor, and the light submits a backup copy of its rule to the timer. This behaviour is shown in Figure 6.

Figure 6: Virtual Appliance Manager submiting rules to devices

Figure 6: Virtual Appliance Manager submiting rules to devices

4.5. Association Service

All of the UPnP devices to be used in creating virtual appliances require an association service to enable them to accept and parse rules. The association service is responsible for processing these rules, subscribing to relevant devices, and firing corresponding actions for event notifications from input devices. The association service is also responsible for detecting the failure of a device to which it is subscribed and finding a suitable replacement. All devices regardless of whether they are an input device or an output device will have an association service, however the association service on the output devices will subscribe to the appropriate input devices and handle event notifications and rule firing. Also, some devices can be considered as both an input and an output device. For example, a timer can provide event notifications when it stops and starts, but an also be controlled by other devices requesting it to start and set the duration. Similarly a camera can provide event notifications when it has taken a picture, and it can also be asked to take a picture. These devices will therefore implement both association services.

4.5.1. Rule Submission

As previously described, rules are submitted to the output devices by the Virtual Appliance Manager. These rules are checked for validity by an XML Schema document and then further checks are performed to ensure that all the state variables and actions specified in the rule exist on the device. If the rule fails any of these validity tests then the rule is rejected. Additionally, if a rule requires exclusive use of a device then it will be granted if:

  • there are no other virtual appliances using this device that require exclusive access
  • there are other virtual appliances using this device that require exclusive access but do not list the device as their preferred device and this new rule does list the device as preferred

If the new rule requires exclusive use, lists the device as preferred, and there are other virtual appliances using the device that don’t list the device as preferred, then the rules for those other virtual appliances will be removed causing those virtual appliances to find a replacement device.

Once the rule has been validated and accepted the association manager of the output device will then subscribe to the input device specified in the condition clause of the rule and submit a backup copy of the rule. The association manager on the input device will then validate the rule and again check whether exclusive use can be granted if required. If the association manager on the input device rejects the rule the output device will have to search for a replacement input device.

When rules are submitted to the output device on a temporary basis by the Virtual Appliance Manager, then if the input device rejects the rule the output device will also reject the rule. This allows the Virtual Appliance Manager to find replacement input and output devices preventing the chain from being broken whilst the all rules are being submitted and the virtual appliance created.

4.5.2. Failure Detection

The association managers of both the input and output devices are responsible for detecting the failure of any device they are dependant upon for any of the virtual appliances that they are a part of. The methods used for failure detection are:

  • devices cancelling their advertisement
  • devices failing to renew their advertisement
  • control points cancelling their subscription
  • control points failing to renew their subscription
  • devices moving out of the location required by the virtual appliance

The responsiveness of the failure detection is dependent upon the lease periods as described in section 4.1.

Input Device Failure
If the association manager on the output device detects the failure of the input device it will immediately search for a replacement device of the same type and in the location specified by the rule. If a suitable replacement is found it will submit a copy of the rule to the device as its subscribes. If the rule is rejected then the output device will search for another suitable replacement. If none can be found then it will wait for a suitable replacement device to appear on the network.

Figure 7: Output device searches for replacement in the event of the input device failing

Figure 7: Output device searches for replacement in the event of the input device failing

If the output device is currently subscribed to a replacement input device (i.e. the current input device is not the preferred device), and the preferred device re-appears on the network then the output device will submit a copy of the rule to the preferred input device as it subscribes. If the rule submission was accepted then the output device cancels its subscription to the replacement input device and revokes the rule from it.

Figure 8: Preferred input device returns, output device subscribes to preferred device and cancels subscription to replacement device

Figure 8: Preferred input device returns, output device subscribes to preferred device and cancels subscription to replacement device

Output Device Failure
If the association manager on the input device detects the failure of the output device it will immediately search for a replacement device of the same type and in the location specified by the rule. If a suitable replacement is found it will submit a copy of the rule to the device. If the rule is rejected then the input device will search for another suitable replacement, otherwise the output device will then subscribe to the input device. If no replacement device can be found then it will wait for a suitable replacement device to appear on the network.

Figure 9: Input device searches for replacement in the event of the output device failing

Figure 9: Input device searches for replacement in the event of the output device failing

If the input device is currently subscribed to a replacement output device (i.e. the current output device is not the preferred device), and the preferred device re-appears on the network then the input device will submit a copy of the rule to the preferred output device. If the preferred output device accepts the rule it will subscribe to the input device and in doing so will cause the input device to revoke the rule from the replacement output device.

Figure 10: Preferred output device returns, input device submits rule to preferred device and<br />revokes rule from replacement device

Figure 10: Preferred output device returns, input device submits rule to preferred device and
revokes rule from replacement device

Keeping The Chain Intact
As previously described, virtual appliances consist effectively of a chain of event rules being fired. This poses a problem in relation to how devices should find replacement devices in the event of a failure that isn’t immediately clear from the simple examples on the previous pages. Figure 11 shows a more typical virtual appliance and the problem related self-healing becomes apparent. If "Timer 1" were to fail it is important that both "Motion Sensor 1" and "Light Bulb 1" find the same replacement timer device otherwise the chain is broken.

Figure 11: If  "Timer 1" fails, how do "Motion Sensor 1" and "Light Bulb 1" find the same replacement timer device?

Figure 11: If “Timer 1″ fails, how do “Motion Sensor 1″ and “Light Bulb 1″ find the same replacement timer device?

One or other of the remaining devices ("Motion Sensor 1" and "Light Bulb 1") will detect the failure of "Timer 1" first and begin searching for a replacement device as previously described. Once a suitable replacement has been found and the rule successfully submitted the device then broadcasts a Multicast message stating the name of the Virtual Appliance it belongs to, the ID of the relevant rule, the ID of the failed device and the ID of the replacement device. All devices on the network receive this broadcast and if they are a member of that virtual appliance, and are currently using the device that is said to have failed, then they will switch to the replacement device stated in the message.

In Figure 12 "Motion Sensor 1" has detected the failure of "Timer 1" first. It searches for a suitable replacement and successfully submits a rule to "Timer 3". "Motion Sensor 1" then send a Multicast message instructing any other devices that a part of the same virtual appliance and currently using "Timer 1" to switch to "Timer 3". "Light Bulb 1" receives this message and switches to "Timer 3".

Figure 12: "Motion Sensor 1" announces it has found a replacement, and then both "Motion Sensor 1" and "Light Bulb 1" switch to the specified replacement timer

Figure 12: “Motion Sensor 1″ announces it has found a replacement, and then both “Motion Sensor 1″ and “Light Bulb 1″ switch to the specified replacement timer

The multicast healing messages are always sent by every device when switching to a replacement device, or even back to the preferred device. This insures that a chain of devices is never broken.

4.6. Supporting Incompatible Devices

Not all devices will support the rule submission and location services required by devices to participate in a virtual appliance. Some small devices may not even be capable of running UPnP. In order to incorporate these devices and other already existing UPnP devices into a virtual appliance a "proxy" device will be needed to give dumb devices (e.g. iButtons [5]) a UPnP presence and to store and handle rules for devices that do have a UPnP presence but can’t store or handle rules themselves.

Figure 13: "Proxy" device allowing dumb devices to appear as UPnP devices and take part ina virtual appliance

Figure 13: “Proxy” device allowing dumb devices to appear as UPnP devices and take part ina virtual appliance

An embedded board powerful enough to run Java and UPnP could be used to support dumb devices and this would run the rule management and association code. However, for other already existing UPnP devices that are unable to handle rules then the proxy service could be provided by a number of virtual appliance UPnP devices. In these situations the rule handeling is slightly centralised but this cannot be avoided and there are still backup copies of all the rules involved.

5. Implementation

5.1. Introduction

This section covers the implementation of the self-healing and rule redundancy mechanisms for the Virtual Appliance system. This includes the implementation of a number of simulated UPnP devices, the implementation of the rule submission and association services, and the virtual appliance manager. As previously stated the UPnP service discovery protocol will be used to enable the automatic discovery and control of devices on the network. The Virtual Appliance Manager will be written in two parts; a Java Servlet to manage the appliances and return XML, and the user interface which will style the XML from the servlet to create an graphical interface for the user.

All the program code will be written in Java for platform independence. The UPnP stack that is to be used is the Cyberlink UPnP Stack for Java v0.7. All coding is carried out using the freely available and open source Eclipse project and all the code is self-authored (i.e. no auto-generated or wizard-generated code has been used).

5.2. Devices

Details of the UPnP interfaces and and descriptions of Java classes have been omitted from this version
because of space considerations, however it is available in the original PDF version of this report.

5.2.1. Binary Light

Figure 14: UPnP Binary Light

Figure 14: UPnP Binary Light


The UPnP Binary Light is a representation of a light bulb that can be turned on and off. This device conforms to the standardised Binary Light device description document [1] and implements the standardised Switch Power [14] service as published by the UPnP Forum.

5.2.2. Switch

Figure 15: UPnP Switch

Figure 15: UPnP Switch


The UPnP Switch is a representation of a toggle switch having two states. When the switch in the GUI is clicked the state of the switch is toggled.

5.2.3. Timer

Figure 16: UPnP Timer

Figure 16: UPnP Timer


The UPnP Timer represents a simple timing device that countsdown in seconds. The duration of the timer can be set and the timer stopped and started. Event messages are sent whenever the timer stops and starts. This device is both an input device and an output device as it can deliver event notifications when its states changes, and because it can also be controlled by other devices.

5.2.4. Motion Sensor

Figure 17: UPnP Motion Sensor

Figure 17: UPnP Motion Sensor


The UPnP Motion Sensor simulates a real life motion sensor. The sensor detects the mouse moving across the window and sets a UPnP state variable when motion is detecteed. After a few seconds the state variable is reset.

5.2.5. Digital Security Camera

Figure 18: UPnP Digital Security Camera

Figure 18: UPnP Digital Security Camera


The Digital Security Camera is the first "real" upnp device that has been created. This device uses a webcam attached to a computer and conforms to the standard Digital Security Camera specification from the UPnP Forum [3]. The device specification states that the device must implement the Digital Security Camera Settings service [4] that would typically allow things such as the camera’s position, rotation, white balance, colour saturation, etc to be adjusted. The device may then optionally implement a motion image service and/or a still image service.

This particular implementation utilises a USB webcam accessed using the Java Media Framework. As such the camera settings service will be present for conformity but will not allow any settings to be adjusted. The still image service will also be implemented to allow display devices to show still images captured from the camera.

5.2.6. Image Display

Figure 19: UPnP Display

Figure 19: UPnP Display


The UPnP Image Display device is a basic representation of a network enabled display device that is capable of display images, videos and even sound. This implementation is a simple simulation that displays images in a window on a computer screen by providing the device with a URL to an image, however in real-life examples of such a device would include TV screens, computer screens, digital photo frames, etc. Virtual appliances that could incorporate this device include security appliances (consisting of motion sensors and cameras), entertainment appliances (consisting of tuner devices, dvd players, video storage devices, etc). The display could also be used in an appliance that notifies the user of important information (house security status, weather information, breaking news headlines, emails, etc).

5.2.7. Vibration Sensor

Figure 20a: Vibration sensor

Figure 20a: Vibration sensor and receiver


Figure 20b: Vibration receiver

Figure 20b: Vibration receiver


The UPnP vibration sensor is the second "real" UPnP device that has been created. The vibration sensors that have been used were developed at BT by the Sensor Networks research group. These sensors send a radio message when they are triggered which can then be received by a computer with a receiver board connected to the serial port. These sensors have been UPnP enabled by decoding the radio messages and providing event notifications to subscribed control points whenever the sensor is triggered. One instance of the UPnP Vibration Sensor device is required for each vibration sensor.

5.2.8. Ambient Interface

Figure 21a: The "Orb" ambient interface

Figure 21a: The “Orb” ambient interface

Figure 21b: The "Orb" ambient interface

Figure 21b: The “Orb” ambient interface


The final UPnP device that has been implemented is an ambient interface. Ambient devices are a way of making information pervasive and ever-present. Instead of having to go to a computer and check share prices or check Teletext for the latest weather forecast the information can be communicated unobtrusively to the user without them having to explicitly request it.

The ambient interface used here, known as "The Orb", was made by myself over a year ago and consists of a number of high-brightness red, green and blue LEDs, a PIC microprocessor and a radio receiver for communication with a computer. The Orb can be made to change to any colour by sending it an RGB colour code along with a number stating how quickly to change from the current colour to the new one. By sending the Orb two RGB colour codes and a duration the Orb will constantly fade between the two colours until another command is received. Commands are queued up and once the duration in the command is up the next command is executed. Commands can be made to jump to the front of the queue by setting a flag in the message.

As with the vibration sensors, this device has been UPnP enabled by creating a UPnP interface that allows control points to specify a colour, this request is then sent to the Orb via a radio transmitter connected to the computer’s serial port.

5.3. Services

All of the UPnP devices described in the previous section have implemented a Location and an Association service. These services are required for devices to take part in a Virtual Appliance. The implementation and operation of these services will now be described.

5.3.1. Location Service

As described in Section 4.2 a location service is required to determine the location of devices within the environment. All of the devices here implement this location service although it is not required. For example, the location of devices such as timers which do not require user interaction is not really important. To enable control points to quickly determine the location of devices the suggestion by Kutter et al.[28]. to add two extra headers to the SSDP device announcement message has been implemented as well as a regular UPnP service allowing standard control points to query the device for its location and mobile status.

Listing 1 shows a typical SSDP device announcement, and Listing 2 shows the modified version for the same device. Although a device only needs to re-advertise its presence just before its current lease is about to expire the CyberLink UPnP Stack has also been modified to broadcast an announcement whenever the device changes location. This allows control points to receive a notification as soon as the device changes location without having to subscribe to the standard location service.

NOTIFY * HTTP/1.1
Server: Linux/2.6.10-callisto UPnP/1.0 CyberLink/1.7
Cache-Control: max-age=60
Location: http://10.0.0.20/description.xml
NTS: ssdp:alive
NT: urn:schemas-upnp-org:device:BinaryLight:1
USN: uuid:40cf2b43-511c-43eb-9e16-eee791b1ff33::urn:schemas-upnp-org:device:BinaryLight:1
HOST: 239.255.255.250:1900
Listing 1: Normal SSDP message
NOTIFY * HTTP/1.1
Server: Linux/2.6.10-callisto UPnP/1.0 CyberLink/1.7 (Virtual Appliances - uk_dave)
Cache-Control: max-age=60
Location: http://10.0.0.20/description.xml
NTS: ssdp:alive
Locality: Living Room
Mobility: true
NT: urn:schemas-upnp-org:device:BinaryLight:1
USN: uuid:40cf2b43-511c-43eb-9e16-eee791b1ff33::urn:schemas-upnp-org:device:BinaryLight:1
HOST: 239.255.255.250:1900
Listing 2: Modified SSDP messages to include location and mobility information

Using room names for location purposes is more meaningful to the user than a set of coordinates. There are a couple of drawbacks with this solution though. Firstly, the user or system installer will have had to configure the system so that it knows where each room is. This issue is not dealt with in this project, however another problem that is a potential problem is to do with different people giving the different names to the same room. For example different people may call the same room the "living room", the "lounge", or the "front room". A solution to this is to give the user a list of pre-defined names and then allow them to add their own if if necessary. The UPnP location service defines the following list location names as a minimum that must be recognised by the system.

DEFAULT Living Room Dining Room Bedroom 1 Bedroom 2
Bedroom 2 Bedroom 3 Bedroom 4 Bedroom 5 Kitchen
Utility Room Toilet 1 Toilet 2 Bathroom 1 Bathroom 2
Hall Landing Porch Conservatory Front Garden
Back Garden  

5.3.2. Rule Submission Service

All UPnP devices that are part of a Virtual Appliance will have rules submitted to them, as such a service is required to accept those rules and handle them. As described in the design section, the input and output devices handle the rules differently, however both device types still need to accept the rules and parse them. As the rules are described using XML and are typically 2,500 – 3,000 bytes in size, submitting the rules using a UPnP action is not the most appropriate method of submission. However, all UPnP devices implement a basic HTTP server through which they serve their XML device and service description documents as well as handling the HTTPMU device announcements and GENA event notifications. Therefore a better approach to the event rule submission is to extend the HTTP server to handle HTTP POST requests.

Using the Cyberlink UPnP stack this is simply a matter of extending the httpRequestRecieved(HTTPRequest httpReq) method of the Device class and then checking the requested URL. If the URL matches the rule submission URL then the HTTP request is passed to the com.ukdave. vapp.rules.XMLRuleSubmissionHandler class for handling otherwise it is passed back to the original httpRequestRecieved(HTTPRequest httpReq) method of the Device class. The default URL for rule submission is http://[deviceIP]:[port]/service/association/ ruleSub. Extra parameters must be appended to this URL to indicate the required action:

  • ?action=submit
    Submit a rule to the device. The rule must be submitted using HTTP POST with the parameter
    name "rule".
  • ?action=tempsubmit
    Used by the Virtual Appliance Manager to submit a rule on a temporary basis. The rule
    must be submitted using HTTP POST with the parameter name "rule". The rule is validated
    but no vurther action is taken until it is commited or removed.
  • ?action=commitrule&vappname=[VAPP NAME]&ruleid=[RULE ID]
    Commits the temporary rule belonging to the specified virtual appliance and with the specified
    rule ID.
  • ?action=remove&vappname=[VAPP NAME]&ruleid=[RULE ID]
    Removed the rule belonging to the specified virtual appliance and with the specified rule ID.
  • ?action=getrules
    Returns an XML document containing all the rules held by the device.

5.3.3. Output Device Association Manager

Rule Submission
Rules are initially submitted to output/actuator devices (e.g. lights, displays, timers, etc). Once accepted by the com.ukdave.vapp.rules.XMLRuleSubmissionHandler class the rule is then passed to the com.ukdave.vapp.associationmanager.OutputDeviceAssociationManager class for handling. The first step taken by the OutputDeviceAssociationManager is to check that the action clause of the rule applies to the device. This includes checking the device-type and service type in the action clause match that of the current device, and that all of the actions and arguments exist. If the rule is not valid then it is rejected with an error. If the rule is valid, the second step checks whether the device already holds a rule for the same virtual appliance and with the same rule id. If so, then the version number is then checked and:

  • if the newly submitted rule is older than the one currently held by the device then the current copy of the rule is sent back to the device that just submitted the rule,
  • if the newly submitted rule is the same version as the one currently held by the device then no further action is taken,
  • if the newly submitted rule is newer than the one currently held by the device then the subscriptions in use by the current rule are cancelled, rules from the input devices revoked, and the new rule added.

After rule validation and version checks, one last check is performed to see if the rule requires exclusive use of the device and whether or not this can be honoured, or whether another virtual appliance is currently using this device and requires exclusive use of it. The following tests are performed at this stage:

  • if another rule currently on the device requires exclusive access to the device and this device is the preferred device for that rule and the newly submitted rule does not belong to the same virtual appliance, then the rule is rejected
  • if there are other rules currently on the device who list this device as preferred, then regardless of whether they require exclusive access if the newly submitted rule required exclusive access then it is rejected
  • if there are other rules currently on the device using it as a replacement device, and this newly submitted rule also wishes to use the device a replacement but also requires exclusive access then the rule is rejected
  • finally, if the newly submitted rule lists this device as preferred and requires exclusive access, and the other rules on the device that belong to other virtual appliances are using this device as a backup, then the input devices listed in the current rules are requested to find replacement devices and the new rule is accepted.

By this stage the newly submitted rule has either been rejected or accepted. If accepted then the rule is added to the rule base and a search is performed for the preferred input device. If the preferred input device is present on the network then a copy of the rule is submitted and a subscription request is made. If either the preferred input device cannot be found or the rule submission or subscription request fail, then a search is performed for a replacement device of the same type in the location specified in the rule. Rule copies and subscription requests are submitted to each suitable input device until one accepts the submissions. Once subscribed the UUID of the input device is noted in the rule. If no suitable input device can be found then the system waits for a device announcement from a suitable device.

As the Output Device Association Manager implements a UPnP Control Point it receives notifications of devices that are added, removed, expired, mobile devices that change location, and event notifications from devices to which it is subscribed.

Event Notification
When an event notification is received the eventNotifyReceived(String sid, long seq, String name, String value) method is called. The only information as to which device and service sent the notification is the sid (subscription ID). This must first be matched to a device and service by iterating through all the devices and services known by the control point. Once the sending device and service have been identified they can then be matched against the condition clauses of the rules in the rulebase. Once a match is found the specific state variable and value must be matched against the rule. An instance of the RuleFiringThread class is created for each rule that matches. This class is responsible for invoking the actions specified in the matching rules on the local device. It is this behaviour that marks the major difference between the input device and out device association managers.

Device Addition
When a UPnP device is added to the network the deviceAdded(Device device) method is called. The first task is to scan through the rule base for all rules that currently lack a subscription to a suitable input device. If this newly added device is suitable then a copy of the rule and a subscription request are submitted to the device. The second task is to check for rules who are currently subscribed but not to their preferred device. If this new device is preferred then a copy of the rule and a subscription request are submitted to the device and if successful then the rule is revoked and the subscription cancelled from the original device.

Device Removal and Expiration
When a UPnP device removes itself from the network or a device’s lease expires without it being renewed the deviceRemoved(Device device) is called. This method searches through the rule base for any rules that were subscribed to this device and then calls the findAndSubscribeToDevice(Rule rule, String deviceUDN) method to search for suitable replacement. In turn, this method calls the submitRuleAndSubscribe(Rule rule, Device device, Service service) method in order to submit a copy of the rule and a subscription request. If unsuccessful, the findAndSubscribeToDevice continues to search for another suitable replacement. If, however, the subscription request was successful a multicast healing announcement is made using the com.ukdave.vapp.associationmanager.MulticastHealingAnnouncer class. This notifies to all devices on the network that are part of the same virtual appliance and that were using the device the device that just failed to switch to the new replacement device that has just been found.

Device Location
Change Mobile devices are required to re-advertise themselves when their locality changes. When such an advertisement is received the deviceLocalityChanged(Device device) method is called. This method checks for rules that are currently using that device and require it to be in a specific location. If the new location is unsuitable then as with the device removal process, a search beings for a replacement device. The method also checks for rules that currently lack a subscription to a suitable input device and if this device is now in a suitable location a copy of the rule and a subscription request are submitted to the device.

Subscription Renewal Failure
The org.cyberspace.upnp.control.SubscriptionRenewalTask class (a modification to the CyberLink UPnP Stack – see Section 5.5) automatically renews subscriptions before they are due to expire. If the renewal request fails then an attempt is made to request a new subscription. If this new request also fails then the subscriptionRenewalFailed(Device device, Service service) method of the Output Device Association Manager is called. If the subscription request is failing then it is almost certain that the device has failed and that its lease hasnt yet expired as there is no reason for a device to reject a subscription (it may reject a rule submission, in which case a subscription request is not made, but a subscription request shouldn’t be rejected). In this case the deviceRemoved(Device device) method is called and a search begins for a suitable replacement.

Multicast Healing Announcement Received
Multicast healing messages are sent whenever a device detects that a device has failed or is no longer suitable (i.e. changed location) and has switched to a replacement device. As described in Section 4.5.2 all devices using the failed/unsuitable device must find the same replacement device in order to keep the chain of events intact. The com.ukdave.vapp.associationmanager.MulticastHealingReceiver class listens for multicast messages and upon receiving a healing announcement calls the multicastHealingMsgReceived( String previousUDN, String replacementUDN) method of the Output Device Association Manager. This method checks for rules currently using a device identified by previousUDN and then submits a copy of the rule and a subscription request to the device identified by replacementUDN. The method will also attempt to revoke the rule from the previous device in the case that the device is still present but no longer deemed suitable for the virtual appliance.

Exclusive Use Rejection Announcement Received
As previously described if a rule is added to a device that lists that device as preferred and it also requires exclusive access. Then if there are no other rules on that device listing it as preferred and requiring exclusive access then requests to the appropriate input devices will be sent asking them to find a replacement device. When one of these messages is received the multicastRejectionMsgReceived(String senderUDN) method is called. This method then revokes the rule from the input device and searches for a replacement device by calling the findAndSubscribeToDevice(Rule rule, String deviceUDN) method.

5.3.4. Input Device Association Manager

Rule Submission
Input devices are devices that cannot be controlled by other devices (e.g. switches, sensors) and simply provide event notifications. Some devices (e.g. timers) are both an input and an output device as they provide event notifications and can also be controlled. Input devices implement the com.ukdave.vapp.associationmanager.InputDeviceAssociationManager class and receive rule submissions from output devices. As with the output device association manager, the com.ukdave.vapp.rules.XMLRuleSubmissionHandler handles the actual submission and if the rule is marked as a backup it is passed to the addRule(Rule newRule) method of the InputDeviceAssociationManager class. This method performs the same checks as the output device association manager. It first checks the condition clause of the rule matches the local device (i.e. device type, service type, and state variables match). A version check is then performed on the rule and if the rule is completely new or a newer version of an existing one then a check is performed for exclusive use. These checks are the same as those described in the previous section. Once the rule has passed all the checks it is added to the backup rule base and the manager then waits to receive a subscription request from the output device.

Subscription Request
When a subscription request is received the subscriptionAdded(Subscriber sub) method is called. This method searches the rule base for rules that do not yet have a subscription from an output device and matches the new subscription to the appropriate rule. An problem was discovered here involving the information passed by UPnP in a subscription request. A typical subscription request is shown in Listing 3.

SUBSCRIBE /service/switch/eventSub HTTP/1.1
HOST: 10.0.0.20:4004
CALLBACK: <http://10.0.0.202:8058/eventSub>
NT: upnp: event
TIMEOUT: Second-120
Connection: close
Listing 3: Normal UPnP subscription request

The problem is that the subscription request contains no information identifying the control point other than its IP address making it impossible to match a subscription request to a rule in the rulebase. Therefore a modification was made to the CyberLink UPnP stack to include the UDN of the device making the request as well as the name of the virtual appliance and rule ID that the subscription request relates to. This modification simply adds two new headers to the request and the modifications still work with regular UPnP devices and control points. The modified subscription request is shown in Listing 4.

SUBSCRIBE /service/switch/eventSub HTTP/1.1
HOST: 10.0.0.20:4004
CALLBACK: <http://10.0.0.202:8058/eventSub>
NT : upnp:event
TIMEOUT: Second-120
USN: uuid:dcd1c340-cc99-46cb-b19b-fd00fa25c460::urn:schemas-upnp-org:device:BinaryLight:1
VAPP_RULE_USN: Light Switch::1
Connection: close
Listing 4: Modified UPnP subscription request

We can now see that, in this particular example, the subscription request came from a "BinaryLight" device with a UDN of dcd1c340-cc99-46cb-b19b-fd00fa25c460 and that the rule this subscription request relates to is rule 1 of the "Light Switch" virtual appliance.

Once the subscriber has been identified the method then checks whether the rule already has a subscription or not. If it doesn’t then the subscription is simply added. If there is already a subscription, then if the current subscription is to a replacement device and the subscription request is from the rule’s preferred device then the subscription is granted and the rule revoked from the previous device. Vice-versa, if the rule has already has an subscription from a preferred device and a subscription is received from another device then the subscription is rejected and the rule is revoked from the replacement device.

Device Addition
When a UPnP device is added to the network the deviceAdded(Device device) method is called. This method scans through the any rules that list this new device as preferred but are currently subscribed to a replacement device. In this case the rule is submitted back to the preferred output device and revoked from the replacement. This method also checks for rules that currently have no active subscription and submits the rule to the device if it is suitable (i.e. correct type and location, etc).

Device Removal and Expiration
When a UPnP device removes itself from the network or a device’s lease expires without it being renewed the deviceRemoved(Device device) is called. This method searches through the rulebase for any rules that had a subscription from that device and then calls findAndSubmitRuleToDevice(Rule rule, String deviceUDN) to find a suitable replacement. Once found this method in turn calls submitRule(Rule rule, Device device) to submit the rule to the potential replacement device. If unsuccessful then the findAndSubmitRuleToDevice method continues to search for another suitable replacement. If, however, the rule submission was successful the operation completes and the manager then waits for a subscription request from the device.

Device Location Change
As with the output device association manager, when a device changes location the deviceLocalityChanged(Device device) method is called. This method then checks all rules that have a subscription from this device and require it to be in a specific location. If the device is no longer suitable then the rule is revoked from the device and findAndSubmitRuleToDevice(Rule rule, String deviceUDN) is called to find a suitable replacement. This method also checks for rules that do not currently have an active subscription and if this device is suitable the rule is submitted to the device.

Subscription Cancelation and Timeout
On the output device, the org.cyberspace.upnp.control.SubscriptionRenewalTask class (a modification to the CyberLink UPnP Stack – see Section 5.5) automatically renews subscriptions before they are due to expire. If however the device has failed then its lease will expire and the subscriptionTimeout(Subscriber sub) method will be called. Similarly if a device cancels its subscription the subscriptioncancelled(Subscriber sub) method is called. In both cases the deviceRemoved(Device device) method is called to find a suitable replacement device.

Multicast Healing Announcement Received
As previously described, whenever a device switches to use another device it broadcasts a multicast message stating the original device ID, the new device ID and which virtual appliance rule this relates to. Upon receiving one of these messages the multicastHealingMsgReceived(String previousUDN, String replacementUDN) method is called. This method checks for rules currently using a device identified by previousUDN and then submits a copy of the rule to the device identified by replacementUDN. The method will also attempt to revoke the rule from the previous device in the case that the device is still present but no longer deemed suitable for the virtual appliance.

Exclusive Use Rejection Announcement Received
The multicastRejectionMsgReceived(String senderUDN) is called if an output device that one of the rules currently has an active subscription from receives a rule that requires exclusive access. This method then revokes the rule from the output device and a call to the findAndSubscribeToDevice(Rule rule, String deviceUDN) method is made to find a suitable replacement device.

5.4. Virtual Appliance Manager

The Virtual Appliance Manager is the interface through which the user manages the virtual appliances. To manage the virtual appliances the Manager incorporates a UPnP control point that maintains a list of all active devices on the network and the rules stored on those devices. To allow for flexibility and future customisations the Virtual Appliance Manager is split into two halves; the manager itself which communicates with the devices, and a web interface onto the manager. The manager will be implemented as a Java servlet, and the web interface will communicate with the manager through XML. This allows the user interface to be customised, or even new ones to be built without having to modify the manager code.

Figure 22: Architecture of the Virtual Apliance Manager

Figure 22: Architecture of the Virtual Apliance Manager

5.4.1. Manager Servlet

When the manager servlet is implemented in the Manager class. When the servlet starts up it scans it scans the templates folder and loads in all the virtual appliance templates which are written in XML. Once loaded, the control point, implemented in the VappControlPoint class, then begins searching for all UPnP devices currently on the network. As devices are found and as new devices announce their presence the deviceAdded(Device device) method is called. This method simply subscribes to the Association service of each device. Upon subscribing an initial event notification is received which is handled by the eventNotifyReceived(String sid, long seq, String name, String value) method. The only events sent by the Association service are when the number of rules in a devices rule base changes. If the event notification is from an Association service then the sender device is determined and the downloadRulebase(Device device) method is called to retrieve a copy of all the rules stored on the device. Whenever devices are added or removed from a device the manager servlet receives a notification causing it to maintain an accurate copy of all the rules stored on all the devices on the network.

As mentioned in Section 4.4 the Virtual Appliance Manager is not required for the operation of the virtual appliances – it is simply there to allow the user to manage the virtual appliances. The previous section described how the UPnP subscription request message had been modified to add two new headers in order to identify the subscriber so that the subscription request could the case of the Virtual Appliance Manager it is simply a standard UPnP control point as such its subscription request headers are unmodified. The UPnP devices will still accept the subscription but in the event of the Virtual Appliance Manager cancelling its subscription or failing to renew it in time no action will be taken.

The manager does however read the two extra headers for locality and mobility added to the SSDP advertisement messages. This means that every time a device re-advertises itself the location information is automatically updated so the manager does not have to subscribe to the Location service on each of the devices. The Cyberlink UPnP Stack automatically maintains a list of active devices on the network and updates this list as devices are added and removed or fail to renew their leases, etc.

The job of the VappControlPoint class, besides maintaing a list of active devices and rule, is to submit new rules to devices in order to create virtual appliances. As previously described, appliance templates are used to allow the user to create new virtual appliances as easily as possible. These templates contain a list of required devices and the event rules that make up the appliance. All the user is required to do is select which type of appliance they wish to create and then select the location of the individual devices if required. When a request to create a new virtual appliance is received the pageCreateAppliance(PrintWriter out, VirtualApplianceTemplate template, String newApplianceName, Hashtable deviceLocalities) method of the Manager class is called. This method fills in the appliance template with the device locations and virtual appliance name as specified by the user. The rules from the appliance are then passed to the submitRuleBase(Vector ruleBase) method of the VappControlPoint class which is responsible for actually submitting the rules to the devices.

The submitRuleBase(Vector ruleBase) method submits the event rules to the devices in the order of the chain of events starting with the first device. This is necessary as the next device in the chain needs to know exactly which device comes before it in the chain so it can subscribe to the correct device. The findAndSubmitTempRuleToOutputDevice(Rule rule) method is called to find a suitable device as specified by the in the action clause of the event rule. The rule is then submitted to the rule on a temporary basis meaning that all the usual checks are performed on the rule by the device, but no subscriptions are made. If the rule is rejected then the method searches for another suitable device. This method is called for each rule as listed in the virtual appliance template. Once all the rules have been submitted successfully the commitTempRule(Rule rule) is called for each rule to commit into the device’s rule base and make it subscribe to the specified input device. If any of the rule submissions failed or a required device was not present on the network, then the revokeTempRule(Rule rule) method is called on each successfully submitted rule to remove it from the device. If the entire operation was successful then the submitRuleBase method returns true, otherwise false is returned and an error given to the user.

As previously mentioned, the manager servlet received requests from the web interface, described in the next section, and returns XML documents to be styled and presented to the user. The following URLs are supported by the servlet:

  • ?page=devicelist
    Returns a list of all the UPnP devices on the network.
  • ?page=devicelist&locality=[LOCATION]
    Returns a list of all the UPnP devices in the specified location.
  • ?page=devicelist&devicetype=[DEVICE TYPE]
    Returns a list of all the UPnP devices of the specified type.
  • ?page=deviceinfo&udn=[DEVICE UDN]
    Returns detailed information on the UPnP device with the specified UDN.
  • ?page=vapplist
    Returns a list of all the virtual appliances on the network.
  • ?page=vappinfo&vappname=[VAPP NAME]
    Returns detailed information about the specified virtual appliance including rules and devices
    in use.
  • ?page=templatelist
    Returns a list of virtual appliance templates.
  • ?page=templateinfo&vapptype=[VAPP TYPE]
    Returns detailed information about the specified virtual appliance template including the
    required devices and the rules that make up the appliance.
  • ?page=createappliance&vapptype=[VAPP TYPE]&newvappname=[VAPP NAME]&[DEVICE ID]=[LOCATION]
    Creates a new virtual appliance of the specified type with the given name. Locations for devices
    that the user may specify must be given in the URL.

Examples of the XML documents returned by the servlet are available in Appendix C.

5.4.2. Web Interface

The web interface has been generated using OpenLaszlo [11] which is an open source platform for developers to generate web applications with rich user interfaces consisting of standard GUI widgets including windows, buttons, checkboxes, drop-down lists, etc. OpenLaszlo applications are created by writing an XML file defining the layout of the various windows and components that make up the application. Events such as button clocks are handled by writingJavaScript methods to handle the events. OpenLaszlo also includes support for XML dataset and running xpath queries on the dataset. The XML data can be read from a static file or from a URL. Form data can also be collected and sent back to a script running on the web server. OpenLaszlo works by generating Macromedia Flash files from the application XML file. The application then runs completely in the users’ web browser.

Figure 23: Screenshot of the Virtual Apliance Manager web interface

Figure 23: Screenshot of the Virtual Apliance Manager web interface

Figure 23 shows a screenshot of the web interface onto the Virtual Appliance Manager. Each of the windows can be independently moved and resized within the browser window and each window is individually refreshed at regular intervals without the entire page having to be re-loaded. This provides a very rich and intuitive interface for the user to manage the virtual appliances. Some PDAs are capable of running Flash in their web browsers and although not done here it would be very easy to build a new OpenLaszlo application designed to fit on a PDA screen without having to touch the manager servlet code. In fact, OpenLaszlo doesnt have to be used at all – for mobile phones a completely WAP based interface could be generated from the XML documents returned by the servlet.

5.5. Modifications Made to UPnP Stack

The UPnP stack used during the development of this project was the open source CyberLink Java UPnP Stack v0.7 [2]. During development it was found that a number of features had not been implemented in the stack which were required for this project. These updates will be submitted back to the project to further its development. Additionally some other modifications were made to add extra headers to some of the UPnP messages as described earlier in this section. This section describes the modifications made to the stack, and a complete change log is available in Appendix D.

Lease Renewal Frequency
The CyberLink stack implements support for the Intel Networked Media Product Requirements (NMPR). Intel’s NMPR is "a set of guidelines that references existing standards and guidelines and defines additional capabilities that may enhance the consumer experience when a PC, set top box, or other computing device is available on the home network" [7]. It is not entirely clear what modifications NMPR makes to UPnP as only members are able to view detailed information on the specifications, however one of the modifications is reduced lease times for UPnP devices. Although the CyberLink stack provides a method to enable and disable NMPR mode it was discovered during leasing experiments that regardless of whether NMPR mode was enabled device advertisements were being renewed every 25%-50% of the of the lease time. Therefore if a device was programmed to have a lease time of 30 minutes it would actually renew its advertisements anywhere between 7.5-15 minutes. This was fixed by modifying the org.cybergarage.upnp.device.Advertiser class to renew advertisements every 90% of the of the periodic notification cycle when NMPR mode was not enabled.

Lease Expiry Checks and Subscription Renewals
Another problem that was discovered when experimenting with UPnP leases was that control points didn’t always remove a device from its list of active devices until quite some time after the device’s advertisement has expired without being renewed. Upon investigation it was discovered that the CyberLink stack uses a single thread to check the expiry times on all devices at regular intervals. This means that, depending on the intervals of the expiry checks, a device’s advertisement could have expired for some time before a check is made the and the device removed.

A further problem was discovered relating to subscription requests. The subscribe(Service service) method of the org.cybergarage.upnp.ControlPoint class generated subscription requests with an infinite lease time. This means that subscriptions will never expire and must be cancelled by the control point. Although the subscription will be automatically cancelled when a device is removed (either through expiry or cancelling its advertisement) it is important that all failures are detected as quickly as possible to allow for recovery of a virtual appliance. To solve this, the subscribe(Service service, long timeout) method is provided allowing a subscription length to be specified. Unfortunately it was discovered that the CyberLink stack does not automatically renew subscriptions causing the device to automatically cancel the subscription and therefore the control point to no longer receive event notifications when a renewal is not received.

Later, another problem was discovered relating to the checking subscription expires. The CyberLink stack only checks for expired subscriptions when it send an event notification. Obviously if the interval been event notifications is long then expired subscriptions will not be detected soon enough to allow for a timely recovery.

The solution to all three of these problems was to create a task scheduler class. This class can then accept scheduled task objects and execute them at the appropriate time. The org.cybergarage.util.TaskScheduler class stores all ScheduledTask objects in a Vector. The task scheduler allows scheduled tasks to be added, re-scheduled, removed, and retrieved. All ScheduledTask objects contain a time at which their code should be run. As scheduled tasks are added they are stored in the array in order of their execution time. The main code for the task scheduler runs as a separate thread which sleeps until it is time to execute the scheduled task at the front of the list. The contents of the ScheduledTask’s run() method is executed method is executed as a separate thread to prevent a long scheduled task delaying others in the list.

The ScheduledTask is actually just an interface which is implemented by the DeviceExpiryTask, SubscriptionExpiryTask, and SubscriptionRenewalTask classes. DeviceExpiryTasks are added whenever a device advertises or re-advertises itself. This task is scheduled to run shortly after the device’s advertisement is stated to expire and when run the task adds the length of the advertisement to the time it was originally received and if greater than the current time the device is removed. If in the mean time the device re-advertises itself then its advertisement time is updated and so the original task will not remove the device.

The SubscriptionExpiryTask works in a very similar manner to the DeviceExpiryTask and is created whenever a new subscription is requested or renewed. The task is scheduled to run shortly after the expiry time for the subscription and when run the task adds the length of the subscription lease to the time the subscription was requested/renewed and if greater than the current time the subscription is cancelled. As before, if the subscription is renewed in the mean time the original task won’t cancel the subscription as the subscription request/renewal time will have been updated. A org.cybergarage.upnp.event.SubscriptionListener interface was also added so that the device could be notified when a subscription had not been renewed. This is used to indicate failure of the control point and cause the virtual appliance to initiate recovery.

Lastly, the SubscriptionRenewalTask is added when the control point successfully requests a subscription. The task is scheduled to run shortly before the subscription is due to expire and the run method simply requests the subscription to be renewed. If the renewal request fails a new subscription is requested. If these requests are successful then a new SubscriptionRenewalTask is added. If both requests fail then the subscriptionRenewalFailed method of the OutputDeviceAssociationManager is called.

Location Information
As previously described the SSDP messages sent by devices when they announce their presence were modified to include two new headers to advertise the device’s location and mobility status. This modification involved making changes to the org.cybergarage.upnp.Device class to provide get and set methods for the locality and mobility variables. Modifications also have to be made to the org.cybergarage.upnp.ssdp.SSDPPacket, org.cybergarage.upnp. ssdp.SSDPRequest, org.cybergarage.upnp.ssdp.SSDPResponse and org.cybergarage.http.HTTP classes to add support for these two new headers. Additionally a org.cybergarage.upnp.device. DeviceChangeListener interface was added that is used when the location of a device changes and allows the control point to be informed.

Identifying a Subscriber
Two new headers were also added to the GENA subscription request message to allow a device to identify the device and virtual appliance rule that a subscription request was related to. This modification involved making changes to the org.cybergarage.upnp.ssdp.event.SubscriptionRequest class to add get and set methods for the UDN of the device making the subscription and for the identify of the related virtual appliance rule. Changes also had to be made to the org.cybergarage.upnp.event.Subscriber and org.cybergarage.upnp.Device classes to handle the new headers. A org.cybergarage.upnp.event.SubscriptionListener interface was also added to allow devices to be notified when subscriptions were added, cancelled and expired.

HTTP Server
One final problem, though only a minor one, was found involving the HTTP server built into the CyberLink UPnP stack. The description for a UPnP device allows an icon to be specified so that control points may show a image to represent the UPnP device. This is used by some generic UPnP "browsers" and by the web interface for the Virtual Appliance Manager. Unfortuantely the HTTP server in the CyberLink stack would simply return an empty HTML file when a request was received for the icon. This was fixed by modifying the httpGetRequestReceived() method of the org.cybergarage.upnp.Device class to check if the name of the requested file ended in .jpg, .gif, or .png and then send the contents of the requested file back to the client.

6. Testing

6.1. Introduction

The system comprises of a large number of individual UPnP devices, some self-healing and distribution code, a Virtual Appliance Manager servlet, and a web interface. Each of these components will be tested in isolation using pre-defined test cases. Once each component has been tested the system as a whole will then be tested using integration testing. Black box testing will be used during testing of the individual components and the overall system.

6.2. Component Testing

As described above, each of the individual components will be tested separately. First, the UPnP devices will be tested to ensure they function correctly (i.e. all of the state variables contain the correct data, and the actions perform correctly as specified). Next the failure detection routines will be tested to ensure the UPnP devices correctly detect device failure and that this is also detected in a timely fashion. Next the self-healing mechanism will be tested in a variety of scenarios with a number of different virtual appliance configurations. Finally the Virtual Appliance Manager and web interface will be tested to determine whether it can detect the UPnP devices and existing virtual appliances correctly.

6.2.1. Individual UPnP Device Testing

For these tests each UPnP device will started up in isolation and a generic UPnP browser / control point used to test the functionality of the devices and services. Each of the device?s service will be tested to ensure the state variables contain the correct data and the actions behave correctly. Note that the location and association services will be testing later.

6.2.2. Device Failure Detection

All the UPnP devices utilise the same UPnP stack code which is the CyberLink Java UPnP v0.7 stack with the modifications described in Section 5.5. The tests here are to confirm that the devices detect device failure through device?s saying they are leaving the network and by deliberately being unplugged or forcefully killed so that they do not have chance to notify the network that they are leaving. Subscription renewals are also checked.

These tests will be conducted by running the UPnP light bulb and the UPnP switch. A rule will be manually submitted to the light causing it subscribe to the switch. No other UPnP devices will be present on the network during the tests. The devices will be left running on separate PCs (one running Windows and the other running Linux) connected via a 100mb desktop switch. The UPnP lease durations are set to 120 seconds.

Test: Advertisement and Subscription Renewals
Description: A packet sniffer is left running for ten minutes to ensure that device advertisements and subscriptions renewals are sent approximately every 108 seconds(90% of specified lease period). If any message is sent sooner than 100 seconds or after 120 seconds the test fails.
Pass/Fail:
 
Test: Device Location Change
Description: This test is similar to the previous test. In this test both devices will be set to mobile so that their locations can be changed whilst they are running. When the location is changed the device will broadcast a new advertisement with its new location. The other device must detect the location change to pass the test. The locations of both devices will be changed several times during the test.
Pass/Fail:
Both devices successfully received all device advertisements containing the new location of the devices.
 
Test: Subscription Duration Test
Description: This test involves leaving the light bulb and switch subscribed for as long as possible. The switch will be ?clicked? periodically to ensure the subscription is still active. Additionally if a subscription renewal or device advertisement is missed then regardless of whether the devices re-detect and subscribe to each other again then the test fails.
Pass/Fail:
The devices were left subscribed and were still working with no missed advertisements or subscription renewals after 24 hours.
 
Test: Device Removal Test
Description: In this test the light bulb and switch will be running and subscribed to each other as before. Each of the devices (only one at a time) will be stopped and after a short time restarted. The devices will be stopped in such a way that they can announce their removal to the network. The remaining device should detect the removal of the other device and redetect when it returns. The important point here is the devices detect the removal and return the devices – whether or not the subscriptions are re-started and is not important in this test. If any device removal or initial advertisement messages are missed then the test will fail. This test will be conducted several times.
Pass/Fail:
The devices immediately detected the removal of the other device and detected immediately when the device restarted. No messages were missed.
 
Test: Device Removal Test 2
Description: This test is similar to the previous test, except instead of stopping the devices a network lead will be removed. The devices will then be separated and they should detect that each other has failed when the advertisements are not renewed. The failure must be detected with 10 seconds of the lease expiring. Once both devices have detected the failure, the network cable will be reconnected. Both UPnP devices will still be running and sending their advertisements, so both devices should re-detect each other at the next advertisement interval. Lease durations are set at 120 seconds and so both devices must re-detect each other within 2 minutes of the network cable being reconnected. If these times are not met or the devices fail to detect the lease expiry or re-appearance of each other then the test fails. This test will be conducted several times.
Pass/Fail:
Both devices detected the lease expiration within 5 seconds of the lease actually expiring. Once the network cable was reconnected both devices redetected each other at the next announcement approximately 2 minutes later.

6.2.3. Self-Healing (Simple Virtual Appliance)

In this test the UPnP light bulb and switch will be used again with the same rule as before that causes the light to toggle when the switch is toggled. This virtual appliance consists of a single rule that is stored on the light bulb and submitted to the switch as a backup. The light bulb subscribes to the switch to receive event notifications when the switch changes state. This time there will be additional UPnP light bulbs and switches present on the network so that in the event of a failure the virtual appliance can self-heal. The virtual appliance has been configured to require a light bulb and a switch to be located in the living room. The diagram in Figure 24 shows the set up for this test.

Figure 24: Configuration for testing the self-healing ability of the simple virtual appliance

Figure 24: Configuration for testing the self-healing ability of the simple virtual appliance

Test: Stop "Switch 1"
Description: In this test "Switch 1" will be stopped so that it can announce its removal to the network. "Light 1" must receive this notification, find "Switch 2" as a suitable replacement, submit the rule to this device and subscribe. The light switch virtual appliance must continue to function using "Switch 2" and "Light 1" and the change must start as soon as "Switch 1" is removed.
Pass/Fail:
"Light 1" detected immediately the removal of "Switch 1", successfully switched over to "Switch 2" in a matter of seconds, and the virtual appliance continued to function.
 
Test: "Switch 1" Recovers
Description: "Switch 1" will now be restarted. As the rule states that "Switch 1" is the preferred device, "Light 1" must detect the return of "Switch 1". "Light 1" must then submit a copy of the rule to "Switch 1" and re-subscribe. Upon successful subscription, "Light 1" must then revoke the rule and unsubscribe from "Switch 2".
Pass/Fail:
"Light 1" immediately detected the return of "Switch 1" and switched over from "Switch 2". Rules were correctly submitted to "Switch 1" and revoked from "Switch 2".
 
Test: Stop "Light 1"
Description: In this test "Light 1" will be stopped so that it can announce its removal to the network. "Switch 1" must receive this notification, find "Light 2" as a suitable replacement, submit the rule to this device and then receive a subscription request from "Light 2". The light switch virtual appliance must continue to function using "Switch 1" and "Light 2" and the change must start as soon as "Light 2" is removed.
Pass/Fail:
"Switch 1" detected immediately the removal of "Light 1", successfully switched over to "Light 2" in a matter of seconds, and the virtual appliance continued to function.
 
Test: "Light 1" Recovers
Description: "Light 1" will now be restarted. As the rule states that "Light 1" is the preferred device, "Switch 1" must detect the return of "Light 1". "Switch 1" must then submit a copy of the rule to "Light 1". Upon receiving a subscription request from "Light 1", "Switch 1" must then revoke the rule from "Light 2".
Pass/Fail:
"Switch 1" immediately detected the return of "Light 1" and switched over from "Light 2". Rules were correctly submitted to "Light 1" and revoked from "Light 2".

The previous four tests will now be repeated, however instead of removing the devices so that they can announce their removal to the network, they will simply be unplugged and then later reconnected. It has already been proved that devices can successfully detect lease timeouts, but now the self-healing mechanism will be tested with lease timeouts.

Test: Unplug "Switch 1"
Description: In this test "Switch 1" will be unplugged from the network. "Light 1" must detect the lease expiry for "Switch 1", find "Switch 2" as a suitable replacement, submit the rule to this device and subscribe. The light switch virtual appliance must continue to function using "Switch 2" and "Light 1" and the change over must start as soon as "Switch 1"?s lease expires.
Pass/Fail:
"Light 1" successfully detected the expiry of "Switch 1" and switched over to "Switch 2" leaving the virtual appliance to continue functioning.
 
Test: "Switch 1" Reconnected
Description: "Switch 1" will now be reconnected to the network. As the rule states that "Switch 1" is the preferred device, "Light 1" must detect that "Switch 1" has returned when it sends its next advertisement. "Light 1" must then submit a copy of the rule to "Switch 1" and re-subscribe. Upon successful subscription, "Light 1" must then revoke the rule and unsubscribe from "Switch 2".
Pass/Fail:
"Light 1" successfully detected the return of "Switch 1" and switched over from "Switch 2". Rules were correctly submitted to "Switch 1" and revoked from "Switch 2".
 
Test: Unplug "Light 1"
Description: In this test "Light 1" will be unplugged from the network. "Switch 1" must detect the lease expiry for "Light 1", find "Light 2" as a suitable replacement, submit the rule to this device and then receive a subscription request from "Light 2". The light switch virtual appliance must continue to function using "Switch 1" and "Light 2" and the change over must start as soon as "Light 2"?s lease expires.
Pass/Fail:
"Switch 1" successfully detected the expiry of "Light 1", and switched over to "Light 2" leaving the virtual appliance to continue functioning.
 
Test: "Light 1" Reconnected
Description: "Light 1" will now be reconnected to the network. As the rule states that "Light 1" is the preferred device, "Switch 1" must detect that "Light 1" has returned when it sends its next advertisement. "Switch 1" must then submit a copy of the rule to "Light 1". Upon receiving a subscription request from "Light 1", "Switch 1" must then revoke the rule from "Light 2".
Pass/Fail:
"Switch 1" successfully detected the return of "Light 1" and switched over from "Light 2". Rules were correctly submitted to "Light 1" and revoked from "Light 2".

The last set of tests will involve changed the location of the devices. The rule for the virtual appliance states that both devices must be located in the living room. All four devices are set to mobile so that their location can be changed whilst the devices are running. The locations of the devices shall then be changed. The other device should then detect the location change and immediately switch over to an alternative device.

Test: Change location of "Switch 1"
Description: The location of "Switch 1" will be changed so that it is no longer located in the living room. "Light 1" must receive a notification immediately, find "Switch 2" as a suitable replacement, submit the rule to this device and subscribe. The light switch virtual appliance must continue to function using "Switch 2" and "Light 1" and the change must start as soon as "Switch 1"?s location is changed.
Pass/Fail:
"Light 1" immediately detected the change of location of "Switch 1", successfully switched over to "Switch 2" in a matter of seconds, and the virtual appliance continued to function.
 
Test: Move "Switch 1" back to the Living Room
Description: "Switch 1" will now have its location set back to the living room. As the rule states that "Switch 1" is the preferred device, "Light 1" must detect the location change of "Switch 1" and realise that is it now suitable again. "Light 1" must then submit a copy of the rule to "Switch 1" and re-subscribe. Upon successful subscription, "Light 1" must then revoke the rule and unsubscribe from "Switch 2".
Pass/Fail:
"Light 1" immediately detected the change of location of "Switch 1" and changed over from "Switch 2". Rules were correctly submitted to "Switch 1" and revoked from "Switch 2".
 
Test: Change location of "Light 1"
Description: The location of "Light 1" will be changed so that it is no longer located in the living room. "Switch 1" must receive a notification immediately, find "Light 2" as a suitable replacement, submit the rule to this device and then receive a subscription request from "Light 2". The light switch virtual appliance must continue to function using "Switch 1" and "Light 2" and the change must start as soon as "Light 2"?s location is changed.
Pass/Fail:
"Switch 1" immediately detected the change of location of "Light 1", successfully switched over to "Light 2" in a matter of seconds, and the virtual appliance continued to function.
 
Test: Move "Light 1" back to the Living Room
Description: The location of "Light 1" will now be set back to the living room. As "Light 1" is preferred, "Switch 1" must detect the location change of "Light 1". "Switch 1" must then submit a copy of the rule to "Light 1". Upon receiving a subscription request from "Light 1", "Switch 1" must then revoke the rule from "Light 2".
Pass/Fail:
"Switch 1" immediately detected the change of location of "Light 1" and switched over from "Light 2". Rules were correctly submitted to "Light 1" and revoked from "Light 2".

6.2.4. Self-Healing (Complex Virtual Appliance)

So far the self-healing mechanism has been proven to work with a very simple virtual appliance. In this section the same set of tests will be carried out on a slightly more complex virtual appliance. For these tests a motion activated light appliance shall be used. This appliance consists of a motion sensor, timer, and a light bulb. For these tests two other timer devices will also be running on the network and it is the timers that will be removed and have their locations changed. The rules used for this virtual appliance state that a timer device must be located in the kitchen. The aim of these tests is to make sure the motion sensor and light bulb devices both find the same timer device.

Figure 25: Configuration for testing the self-healing ability of the complex virtual appliance

Figure 25: Configuration for testing the self-healing ability of the complex virtual appliance

Test: Stop "Timer 1"
Description: In this test "Timer 1" will be stopped so that it can announce its removal to the network. In theory both "Motion Sensor 1" and "Light 1" will detect the removal immediately, however one of the devices will find an alternative and send a multicast message causing both devices to subscribe to the same timer device. To pass this test both devices must switch to the same timer device immediately upon removal of the original timer device.
Pass/Fail:
Both devices detected immediately the removal of "Timer 1" and successfully switched over to "Timer 2" in a matter of seconds leaving the virtual appliance continuing to function.
 
Test: "Timer 1" Recovers
Description: "Timer 1" will now be restarted. As the rule states that "Timer 1" is the preferred device both "Motion Sensor 1" and "Light 1" should submit copies of their rules to "Timer 1". upon successful subscriptions, both devices must revoke their rules and subscriptions from "Timer 2". To pass this test both devices must switch back to "Timer 1" and revoke their rules and subscriptions from "Timer 2" as soon as "Timer 1" reappears.
Pass/Fail:
Both "Motion Sensor 1" and "Light 1" immediately detected the return of "Timer 1" and changed over from "Timer 2" revoking their rules and subscriptions leaving the virtual appliance to function correctly as before.
 
Test: Unplug "Timer 1"
Description: In this test "Timer 1" will be unplugged from the network. "Motion Sensor 1" or "Light 1" will detect the device or subscription expiry first. This device must then find either "Timer 2" or "Timer 3" and upon succesful rule submission and subscription must then send a multicast message instructing the other device to change over as well. Lease durations are set at 120 seconds, therefore the change over must occur within 2 minutes of "Timer 1" being unplugged and both devices must connect to the same timer for this test to succeed.
Pass/Fail:
"Light 1" detected the lease expiry first and successfully subscribed to "Timer 3". It also broadcast a message causing "Motion Sensor 1" to send a copy of its rule to "Timer 3" which in turn caused "Timer 3" to subscribe to "Motion Sensor 1" and the virtual appliance continued to function normally.
 
Test: "Timer 1" Reconnected
Description: "Timer 1" will now be reconnected to the network. "Timer 1" is the preferred device and both "Motion Sensor 1" and "Light 1" should detect the return of "Timer 1" causing them to submit their rules to it and upon successful subscription then revoke their rules and subscriptions from "Timer 3". As the lease durations are set at 120 seconds the change over must occur within 2 minutes of "Timer 1" being reconnected. All rules and subscriptions must be revoked from "Timer 3" for this test to succeed.
Pass/Fail:
Both "Motion Sensor 1" and "Light 1" detected the return of "Timer 1" and changed over from "Timer 3" revoking their rules and subscriptions leaving the virtual appliance to function correctly as before.
 
Test: Change location of "Timer 1"
Description: The location of "Timer 1" will then be changed so that it is no longer located in the kitchen. As with the device removal, in theory both "Motion Sensor 1" and "Light 1" will detect the location change immediately, however one of the devices will find an alternative and send a multicast message causing both devices to subscribe to the same timer device. To pass this test both devices must switch to the same timer device immediately upon removal of the original timer device.
Pass/Fail:
As before, both devices detected immediately the location change of "Timer 1" and successfully switched over to "Timer 2" in a matter of seconds leaving the virtual appliance continuing to function.
 
Test: Move "Timer 1" back to the Kitchen
Description: "Timer 1" will now have its location set back to the kitchen. As the rule states that "Timer 1" is the preferred device, both "Motion Sensor 1" and "Light 1" must detect the location change of "Timer 1". Both devices? must then submit a copy of their rules to "Timer 1"?. Upon successful submission and subscription both devices must revoke their rules and subscriptions from "Timer 2".
Pass/Fail:
Both devices immediately detected the change of location of "Timer 1" and switched over from "Timer 2". Rules were correctly submitted to "Timer 1" and revoked from "Timer 2".

6.2.5. Exclusive Use

Exclusive use is used to prevent virtual appliances from interfering with each other during the self-healing process by attempting to use a device that is currently being used by another virtual appliance and that cannot be shared. For these tests both the light switch and motion activated light virtual appliances shall be running. Both these appliances require a light however the rules state that exclusive use of the light device is required. Figure 26 shows the devices that will be used for these tests.

Figure 26: Configuration for testing the exclusive use

Figure 26: Configuration for testing the exclusive use

Test: Remove "Light 1"
Description: "Light 1", which is in use by the motion activated light appliance, will be removed so that is announces its removal to the network. "Timer 1" is then responsible for finding a replacement light. If "Timer 1" tries to submit the rule to "Light 2" the rule must be rejected on the grounds of exclusive use required by the light switch appliance. "Timer 1" must then use either "Light 3" or "Light 4". This test will fail if "Light 2" accepts the rule thus causing interference between the appliances, or if "Timer 1" fails to use either "Light 3" or "Light 4".
Pass/Fail:
"Timer 1" immediately detected the removal of "Light 1" and found "Light 2" as a replacement. The rule submission was rejected by "Light 2" causing "Timer 1" to search for another light. This time it found "Light 3" which accepted the rule causing it to subscribe to "Timer 1" and allowing the motion activated light appliance to continue functioning with causing interference to the other virtual appliance.
 
Test: Remove "Light 2"
Description: "Light 2", which is in use by the light switch appliance, will now be removed so that is announces its removal to the network. "Switch 1" is then responsible for finding a replacement. Only "Light 3" and "Light 4" are now present on the network. If "Switch 1" tries to submit its rule to "Light 3" the rule must be rejected on the grounds of exclusive use required by the motion activated light appliance. "Switch 1" must then use "Light 4" as a replacement. This test will fail if "Light 3" accepts the rule thus causing interference between the appliances, or if "Switch 1" fails to use "Light 4".
Pass/Fail:
"Switch 1" did try and submit its rule to "Light 3" but it was rejected causing it to then try "Light 4" which accepted the rule and subscribed to "Switch 1". Both appliances continue to function correctly with no interference.

6.3. Integration Testing

The individual UPnP devices, failure detection, rule submission, and self-healing mechanisms have all be tested and proven to work correctly. The whole system will now be integrated and the Virtual Appliance Manager and Web Interface will be used to list the existing UPnP devices and virtual appliances on the network.

6.3.1. Test 1

The Virtual Appliance Manager and Web Interface are started first, followed by a UPnP switch and light. These two devices will use the rule for the light switch virtual appliance as used in the previous section.

Figure 27: Configuration of devices for first integration tes

Figure 27: Configuration of devices for first integration tes

Figures 28a and 28b show all the devices currently on the network, and the details of the light switch virtual appliance.

Figure 28a: Web interface showing the active UPnP devices and the details of the Light Switch virtual appliance

Figure 28a: Web interface showing the active UPnP devices and the details of the Light Switch virtual appliance

Figure 28b: Web interface showing the active UPnP devices and the details of the Light Switch virtual appliance

Figure 28b: Web interface showing the active UPnP devices and the details of the Light Switch virtual appliance

The Virtual Appliance Manager and Web Interface successfully updated to show the list of active devices as well as successfully detecting the virtual appliance and showing its details.

6.3.2. Test 2

The two UPnP devices from the previous test will be shutdown and the devices required for the motion triggered light virtual appliance started. The rules will still be stored on disk from the tests in the previous section.

Figure 29: Configuration of devices for second integration test

Figure 29: Configuration of devices for second integration test

The Virtual Appliance Manager and Web Interface must update to show the current device list as well as detect the new virtual appliance. The two devices for the light switch virtual appliance have been shutdown so the web interface should not show those anymore.

Figure 30a: Web interface showing the active UPnP devices and the details of the Motion Triggered Light virtual appliance

Figure 30a: Web interface showing the active UPnP devices and the details of the Motion Triggered Light virtual appliance

Figure 30b: Web interface showing the active UPnP devices and the details of the Motion Triggered Light virtual appliance

Figure 30b: Web interface showing the active UPnP devices and the details of the Motion Triggered Light virtual appliance

The Virtual Appliance Manager and Web Interface successfully updated to show the list of active devices as well as successfully detecting the new virtual appliance and showing its details. Although the Virtual Appliance Manager servlet detected the removal of the old devices and the appearance of the new devices and virtual appliance immediately, the web interface only refreshed every 60 seconds and so there was a slight delay in interface being updated.

6.3.3. Test 3 – Creating a Virtual Appliance

The final test is to use the Virtual Appliance Manager and Web Interface to create a virtual appliance. A selection of UPnP devices will be started and all pre-existing rules will be deleted from all devices.

Figure 31a: The window on the left shows the list of active UPnP devices. The window on the right shows the available virtual appliances templates.

Figure 31a: The window on the left shows the list of active UPnP devices. The window on the right shows the available virtual appliances templates.

Figure 31b: The window on the left shows the list of active UPnP devices. The window on the right shows the available virtual appliances templates.

Figure 31b: The window on the left shows the list of active UPnP devices. The window on the right shows the available virtual appliances templates.

Figure 31a shows the list of active UPnP devices on the network. In order to create a virtual appliance the user must first select which type of appliance they wish to create. A list of available templates is shown in the window in Figure 31b. For this test an "Uber Security System" will be created.

Figure 32: The second stage of creating a virtual appliance. The user must give the virtual appliance a name and select the desired locations of the devices.

Figure 32: The second stage of creating a virtual appliance. The user must give the virtual appliance a name and select the desired locations of the devices.

The second stage in creating a virtual appliance is to give the appliance a name and select the desired locations for the devices. Note that is it not possible to select a location for the timer as the template states that the location does not matter. The user can select a specific location for a device or leave it set to "ANY".

Figure 33a: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

Figure 33a: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

Figure 33b: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

Figure 33b: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

Figure 33c: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

Figure 33c: The top two windows show the virtual appliance being created. The bottom windows shows the updated list of active virtual appliances which now contains the newly created virtual appliance.

After the user has selected the locations for the devices and clicked the "Next" button the Web Interface will show a spinning animation, as shown in Figure 33a, while the Virtual Appliance Manager servlet submits the rules to the various devices. After the rules have been submitted, the interface is updated to show whether the creation was successful or not. In this case Figure 33b shows a smiley face indicating success. The windows listing the current virtual appliances, Figure 33c, now updates to show the virtual appliance that was just created.

Figure 34: This windows shows detailed information for the selected virtual appliance.

Figure 34: This windows shows detailed information for the selected virtual appliance.

Figure 34 shows detailed information on the virtual appliance including the device types in use along with their require locations and a list of the rules that comprise the virtual appliance.

In this test the creation of a virtual appliance was successful. If the devices required to create a virtual appliance were not present, not available in the specified location, or suitable devices were available but could not be used due to exclusive use by another virtual appliance, then the creation stage will fail and an error will be reported to the user.

7. Evaluation

For this project a number of both simulated and real UPnP devices have been created. UPnP, as with other discovery protocols, uses leases to control access to resources and to control the length of device advertisements. These leases are used to detect the failure of devices and services and must therefore be renewed at regular intervals to prevent the device from being timed-out. However, the more frequently that the leases are renewed and the more devices that are present on the network, then the less available network bandwidth there is for other communications. Table 1 in Section 4.1 showed the typical packet sizes for lease renewals the theoretical maximum number of devices that could be supported on a given network with a specified lease period. For example, on a 100mbit network with 10 minute lease periods the network can support several million devices. In reality there are never going to be that many UPnP devices in a single home. Having network cables running to every device in the home is not very realistic however, and it is much more likely that wireless networks will be used in homes for connecting these devices. Speeds for wireless network are increasing all the time, but even on an 11mbit wireless network with lease periods of 10 minutes, half a million devices can be supported.

Once it has been detected that a device which is part of a virtual appliance has failed the virtual appliance needs to self-heal to continue functioning. A requirement of the system was that it be as decentralised as possible and this has been achieved by storing the rules that make up the virtual appliance on the individual devices. A rule refers to two devices – the input and output device – and a copy of the rule is stored on both devices. When a device in the virtual appliance fails, a suitable replacement can be found and a copy of the required rule submitted to the replacement device automatically. The tests in the previous section show this particular solution works well and that the virtual appliances successfully self-heals when devices are removed cleanly from the network, are unplugged, and change location. Additionally the exclusive use condition allows virtual appliances to not interfere with each other when they self-heal. The ability to specify a preferred device allows the virtual appliance to make best efforts to use that device and switch back to using that device when it re-appears on the network.

The Virtual Appliance Manager and Web Interface allow the user to easily see which devices are present on the network and filter them by location and type. A list of active virtual appliances is also displayed to give the user a quick view of what virtual appliances are currently active on the network. The use of virtual appliance templates allows the user to easily create a new virtual appliance with the minimum of fuss without having to worry about designing rules.

7.1. UPnP Stack Reliability

In the past the only UPnP stack available for Java was one produced by Siemens [13]. In previous unrelated personal projects, the Siemens Java UPnP stack had been used and during the development and testing of these projects numerous sporadic errors and anomalies kept recurring related to the UPnP devices. It was noticed, both by myself and others using the stack, that occasionally UPnP devices would "disappear" from control points despite the device still running and that subscriptions would randomly time out. Other problems included the control points not always detecting all active devices when performing searches and embedded UPnP devices not always advertising themselves. Most of these anomalies disappeared after the version 1.01 of the Siemens Java UPnP stack was released, however occasionally errors still occurred.

As the Siemens stack is distributed as binary only it is not possible to check the source code to determine the cause of the errors or even attempt to fix them. At the beginning of this project it was noticed that a new and open source UPnP stack called CyberLink was being developed and this is what has been used for this project. The stack is still under active development and as it is open source users are able to submit patches and new features. With the exception of the modifications made for this project the CyberLink stack has been found to be extreemely reliable. Not once during the entire development of this project has it been noticed that devices have randomly "disapeared" or control points failed to detect all active devices. In fact many virtual appliances have been left running for days without any drop-outs and continued to function correctly.

Additionally, because the CyberLink stack is open source it would be possible to port the stack so that it could run on some of the available Java embedded devices (e.g. SNAP boards [6]) that require code to be J2ME compliant opening up the possibilities for small embedded UPnP enabled devices.

7.2. Project Management

The project aim to extend the previous work on virtual appliances and enable them to self-heal and distribute the configuration data with redundancy was successfully met. All of the primary objectives were met and completed on time. This includes conducting the research into the problem domain, experimenting with UPnP leases, designing and then implementing the selfhealing and redundancy mechanisms, and finally testing and evaluating the system.

The creation of the UPnP devices was completed quickly along with the tests on UPnP lease durations. Coming up with a suitable design for the self-healing and redundancy mechanisms took slightly longer and anticipated partly due to the lack of similar projects on which to base ideas. The implementation of the self-healing and rule redundancy mechanisms also took longer than anticipated. This was partly due to slightly underestimating the time required and also due to some of the modifications that needed to be made to the CyberLink UPnP stack (in particular the automatic lease renewals).

Despite these delays, there was still a bit time left to complete some of the further objectives outlined at the beginning of the project. These included creating some "real" UPnP devices which included the security camera, vibration sensor, and the orb ambient interface. A web interface for the Virtual Appliance Manager was also implemented which took slightly longer than anticipated due to the unfamiliarity with the OpenLaszlo which was used to create the interface. Unfortunately there was no time left to implement the Proxy device that was designed to allow dumb devices as well as pre-existing UPnP devices to participate in a virtual appliance. There was also not enough time to implement the editing of virtual appliances through the web interface, however all the virtual appliance rules contain version numbers and the association managers will submit updated rules to device as they are received.

8. Conclusion

8.1. Summary of Main Achievements

The ability for virtual appliances to self-heal in the event of failure and store rule configuration data in a decentralised manner with redundancy has been examined in this project. Research into the problem domain proved difficult as there are not many similar projects. In fact the idea of decomposing appliances and recombining them seems to be unique, especially the storing of the configuration information (the rules) in a completely decentralised manner. Most projects that focus on intelligent environments utilise a central agent to manage the devices in the environment, or a number of centralised agents to mange separate parts of the environment.

Research, however, was found in relation to self-healing within service discovery systems but this simply revolved around discussions of leasing systems to detect device failure. One paper that was found described an algorithm to perform adaptive leasing in Jini [21] which lead to the measurements of UPnP lease packet sizes and network utilisation in Section 4.1.

The modifications described by Kutter et. al. [28] to add mobility and location information to UPnP advertisement messages in order to enable easy searching by UPnP control points were implemented. Further limitations in the data passed in some of the UPnP message were also discovered including the lack of identification of the subscriber in a subscription request message. Although UPnP control points can be a completely separate entity in their own right, in the virtual appliances system all the devices incorporate a control point and it is the identify of the device hosting the control point that needed to be known in the subscription requests.

Another deliberate design decision and achievement is the design of the Virtual Appliance Manager and Web Interface. The servlet and interface were designed separately and communicate via XML in order to allow different interfaces to be created at a later date without having to modify the underlying servlet. Interfaces could be generated for specific devices such as PDAs and mobile phones, or for people with disabilities, e.g. visual impairment.

The use of virtual appliance "templates" in aiding the user to create new virtual appliances is seen as an achievement and unique feature as no other examples of templates to manage distributed devices in this context could be found.

8.2. Extensions and Further Work

The area of intelligent environments and digital homes is relatively new and therefore there are many issues that need to be examined and addressed. Due to time constraints this project had to be constrained so as to complete it on time. As such there are a number of areas in which further work could be conducted.

One limitation of the self-healing system developed here is that the system can only self-heal by finding a replacement device of the same time as the one that failed. Restrictions are placed on the selection process by constraining the location of the device and use of exclusivity to prevent virtual appliances from interfering with one another. However, there is no way to describe the specifications for a required service so that the most suitable one can be chosen from a selection of possible services with differing specifications. For example, a virtual appliance may require a motion sensor device, but it is possible that one is not available however there is an alternative device that could provide similar functionality allowing the virtual appliance to continue to function with slightly degraded performance.

An article in a recent IEEE Pervasive Computing journal [23] describes this problem stating that "Devices won?t work together unless you explicitly program each one to talk to every other type of device it might encounter". The example given is that with UPnP "a device might be programmed to communicate with peers that implement the UPnP MediaRenderer profile. These same devices, however, would have to be reprogrammed to communicate with UPnP printers or with new types of devices that appear in the future".

The problem of how to describe device capabilities, the capabilities required by an appliance, to what extent the performance of the appliance can be degraded but still be deemed acceptable, and all this done in such as way that applies to not only existing devices but to devices that haven?t yet been designed is not a trivial one. Indeed, the DTI Pervasive Home Environment Networks (PHEN) project is looking at just this kind of problem amongst other things.

One other issue that is of high importance and a challenge in a distributed and pervasive environment is that of security and access control. Security was not addressed in this project but is of critical importance in pervasive and distributed environments. Security is needed to prevent rogue devices from damaging a network and to prevent hackers from disabling certain appliances in order to break in. Security is also important in protecting privacy, for example if messages are not encrypted then someone could monitor your behaviour by seeing which sensors are triggered at what times, etc.

Fixed wired networks provide basic security in that physical access is required, however with the wide-spread adoption of wireless solutions a would-be hacker simply needs to be nearby in order receive the messages. WEP and WPA attempt to address some of these security issues by encrypting all data sent out over the wireless network. At a higher level the UPnP Forum is working on implementing security in the UPnP protocol and there is already standardised descriptions for implementing encryption and access control onto UPnP devices and services.

References

[1]  BinaryLight:1 Device Template Version 1.01.
       http://www.upnp.org/standardizeddcps/documents/BinaryLight1.0cc.pdf
       Last accessed: Wednesday 14th September 2005.

[2]  CyberLink UPnP for Java.
       http://www.cybergarage.org/net/upnp/java/
       Last accessed: Wednesday 14th September 2005.

[3]  DigitalSecurityCamera:1 Device Template Version 1.01.
       http://www.upnp.org/standardizeddcps/documents/Digital Camera1.0 000.pdf
       Last accessed: Wednesday 14th September 2005.

[4]  DigitalSecurityCameraSettings:1 Service Template Version 1.01.
       http://www.upnp.org/standardizeddcps/documents/Security Camera Settings1.0 000.pdf
       Last accessed: Wednesday 14th September 2005.

[5]  iButton.
       http://www.maxim-ic.com/products/ibutton/
       Last accessed: Wednesday 14th September 2005.

[6]  Imsys – embedded java platforms.
       http://www.imsys.se/
       Last accessed: Wednesday 14th September 2005.

[7]  Intel developer network for the digital home.
       http://www.intel.com/technology/dhdevnet/
       Last accessed: Wednesday 14th September 2005.

[8]  Jini Network Technology.
       http://http://www.sun.com/software/jini/
       Last accessed: Wednesday 14th September 2005.

[9]  JS – JavaSpaces(TM) Service Specification.
       http://java.sun.com/products/jini/2.0/doc/specs/html/js-spec.html
       Last accessed: Wednesday 14th September 2005.

[10]  Lonworks.
       http://www.echelon.com/products/lonworks/default.htm/
       Last accessed: Wednesday 14th September 2005.

[11]  OpenLaszlo.
         http://www.openlaszlo.org
         Last accessed: Wednesday 14th September 2005.

[12]  Salutation Consortium.
         http://www.salutation.org/
         Last accessed: Wednesday 14th September 2005.

[13]  Siemens AG – Plug and Play Technologies.
         http://www.plug-n-play-technologies.com/
         Last accessed: Wednesday 14th September2005.

[14]  SwitchPower:1 Service Template Version 1.01.
         http://www.upnp.org/standardizeddcps/documents/SwitchPower1.0cc.pdf
         Last accessed: Wednesday 14th September 2005.

[15]  TSpaces.
         http://www.almaden.ibm.com/cs/TSpaces/
         Last accessed: Wednesday 14th September 2005.

[16]  Unterstanding Universal Plug and Play.
         http://www.upnp.org/download/UPNP UnderstandingUPNP.doc
         Last accessed: Wednesday 14th September 2005.

[17]  UPnP Device Architecture 1.0.
         http://www.upnp.org/resources/documents/CleanUPnPDA101-20031202s.pdf
         Last accessed: Wednesday 14th September 2005.

[18]  UPnP Forum.
         http://www.upnp.org
         Last accessed: Wednesday 14th September 2005.

[19]  X10 – offical website.
         http://www.x10.com/
         Last accessed: Wednesday 14th September 2005.

[20]  Paul Albitz and Cricket Liu. DNS and BIND. O?Reilly, 4th edition, 2001.

[21]  Kevin Bowers, Kevin Mills, and Scott Rose. Self-adaptive leasing for Jini.
         First IEEE International Conference on Pervasive Computing and Communications (PerCom 2003), 2003.

[22]  D. Bull. Towards virtual appliances in the future digital home.
         Undergraduate degree project, Department of Computer Science, University of Essex, 2004.

[23] W. Keith Edwards, Mark W. Newman, Jana Z. Sedivy, and Trevor F. Smith. Bringing network effects to pervasive spaces.
         IEEE Pervasive Computing, 4(3):15-17, July-September 2005.

[24] Fritjof Boger Engelhardtsen and Tommy Gagnes. Using JavaSpaces to create adaptive distributed systems.
         Technical report, Adger University College, Norway, 2002.

[25] DJ. Greaves, A. Blackwell, DL. Gordon, U. Saif, A. McNeil, and SB. Suh. Autohan core services white paper one.
         White paper, Home Area Networks Group, Computer Laboratory, University of Cambridge, 2000.

[26] Arran Holmes, Hakan Duman, and Anthony Pounds-Cornish. The iDorm: Gateway to heterogeneous networking environments.
         International ITEA Workshop on Virtual Home Environments, Paderborn, Germany, February 2002.

[27] Michael Jeronimo and Jack Weast. Universal Plug and Play – the foundation of the digital home.
         Technology@Intel Magazine, June 2003.

[28] Oliver Kutter, Jens Neumann, and Thomas Schmitz. Extending Universal Plug and Play to support self-organising device ensembles.
         3rd International Conference on Pervasive Computing (PERVASIVE 2005), 2005.

[29] Anthony Pounds-Cornish and Arran Holmes. The iDorm – a practical deployment of grid technology.
       2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid2002), 2002.

[30] Antony Rowstron. Using agent wills to provide fault-tolerance in distributed shared memory systems.
         8th EUROMICRO Workshop on Parallel and Distributed Processing, pages 317-324, January 2000.

[31] M. Weiser. The computer for the 21st century. Scientific American, 1991.

Appendices

Omitted from this version because of space considerations, however it is available in the original PDF version of this report.