|
Learning network management from the bees
ANKUR GUPTA and GOPAL SUMANTH debate an alternative to traditional
network management solutions. The inspiration for the new model? Ants and bees
TRADITIONAL network management software uses a one-dimensional approach to
discover network topology and perform network management. The view of the network
is based on the centrality of the network management station that is responsible
for performing the discovery and layout operations as well as performing tasks
like fault management and event co-relation. A network in essence is a distributed
entity and hence a distributed approach to network management is more suited
to provide a multi-dimensional view of the network. This allows management of
the network by delegation. Also, existing network management software does not
extensively cover the five functional areas of network management as specified
by OSI, namely:
- Fault Management
- Configuration/Change Management
- Accounting/Asset Management
- Performance Management
- Security Management
The aim of this article is to formulate a distributed network management approach
that not only provides the basis for a comprehensive network management solution
(covering all the functional areas specified above), but also aims to overcome
some of the shortcomings of traditional network management software like single
point of failure, lack of scalability for large networks, and related issues
like accuracy and performance. Moreover, the amount of management information
processed threatens to exceed the cognitive capabilities of human managers.
The focus is therefore on a swarm intelligence-based approach to enable specialised
monitoring and management by delegation.
Swarm intelligence for agents
This article proposes use of the concept of societies of cooperating agents
(akin to swarms of worker bees or a colony of ants) in network management. The
model presented is based on general purpose (network monitoring, fault management)
vs special purpose (accounting, security, specific device-related management)
societies of agents and their collaborations, which accomplish the goal of managing
the network. The benefits provided by the use of swarm intelligence-based agents
in network management are many.
- Swarm intelligence as a concept lends itself well to distributed problem
solving, with many simple agents interacting with their environment and networking
with other agents to solve complex global problems. Essentially, the complexity
gets delegated and is managed better than in the centralised management approach.
- Swarm intelligence-based solutions are robust and flexible, whereas centralised
management solutions are prone to denial of service attacks and are slow to
react to changing network conditions.
- Swarm intelligence leads to better scalability and management of large
networks, a major problem with existing network management solutions, which
cannot cope very well with the growing scale of networks.
- A swarm intelligence-based approach will also score over traditional approaches
in related issues like performance and accuracy. It will reduce the amount
of network traffic generated and provide current data, including data which
is collected by agents in the locality of the network entity being monitored
(or even residing at the entity), thereby preventing delays caused by network
congestion. Moreover, the agent can process and consolidate the data collected
so that processing resources at the management station can be freed for other
purposes.
Swarm intelligence currently finds applications in a wide range of areas, from
optimisation (travelling salesman problem, shortest route problem) to network
management to telecommunications (adaptive routing, load balancing, guaranteeing
QoS).
Architecture
The framework is inspired by the structure and organisation of a beehive, which
provides examples of management by delegation and division of labour. Individual
worker bees, drones and the queen bee with their singularity of purpose ensure
management of the beehive as a whole. The framework therefore abstracts the
network to be managed as a collection of cells, with many agents assigned specific
roles within the cell. Each cell is managed by a manager agent whose job is
to oversee the worker agents and collect data from them. This data is stored
by the manager agent in a cell-wide repository and can be transferred to the
management station on demand or can be exported to a global repository for reporting,
accounting and other purposes. The partitioning of the network into conceptual
cells is done on the basis of configurable parameters like the number of network
devices/entities to be managed in each cell. The number of agents deployed in
each cell depends on the defined scope of management of each worker agent, i.e.
the number of devices to be managed by each worker agent. However, the number
of worker agents to be deployed will be a trade-off between the desired response
time and the available bandwidth. The framework provides for the following societies
of mobile agents:
l Worker Agents: These are installable on any network entity
and are able to access management information on that entity. Also, these should
be able to generate events to signal abnormal conditions on the host entity.
These agents will enable interfacing with the traditional network management
methodologies (SNMP, CMIP agents) . These are useful for fault management or
configuration management.
- Goal-oriented Agents: These are agents which perform specific tasks and
monitor management information based on goals defined in their preamble. They
could find application in threshold monitoring for network or application
parameters, ensuring service-level objectives, guaranteeing predefined quality
of service parameters, monitoring security breaches in the network, performing
critical route analysis, etc. These are useful for performance management/security
management.
- Cruising Agents: These agents will be useful for data collection and reporting.
Their main aim is to roam the network collecting predefined statistics and
gathering information about the overall health of the network, or use the
data for accounting/asset management.
- Manager Agents: These are high-level agents that will be responsible for
managing the worker agents, including their lifecycle management, deployment
strategies and processing of data reported by lower-level agents. For example,
the event correlation logic could be built into the manager agents. This would
allow only the co-related events of interest to reach the management station,
which gets freed of having to perform the co-relation.
- Discovery Agents: These agents are responsible for discovering the network
topology, which is later used for agent deployment.
- Special Purpose Agents: These agents are used for customised monitoring,
reporting or for triggering actions based on the occurrence of special conditions.
For example, agents to perform critical route analysis for a network entity,
or for generating an alert when the CPU usage of a device exceeds a certain
threshold, or for performing accounting for network usage.
- At the heart of the architecture is the agent factory (Fig. 1), which is
responsible for producing and deploying agents as per requirements. It generates
all the agent types described earlier. The main task of the agent factory
is to define the preamble of the agents, which includes setting their objectives
and defining their roles and responsibilities. Some of the parameters involved
in defining the DNA of the agents are:
- Role (worker, manager, cruiser, etc.)
- Number of devices/entities to be managed (number of agents to be managed
for manager agents)
- Lifespan (till completion of assigned task, discretion of manager agent,
predefined time, etc.)
- Migration strategy (based on completion of assigned task, predefined path
to be traversed, discretion of manager agent, static agent, etc.)
- Job scope (collect statistics, monitor specific devices/parameters, ensure
adherence to service level agreements, perform computations/accounting on
collected data, calculate critical route between entities, fault reporting,
watchdog for security breaches, etc.)
- Deployment strategy (which cell a particular agent belongs to, which manager
agent does a worker agent report to, redeployment of agents based on changes
in the cell, etc).
Once the agents are created and deployed hierarchically (Fig. 2), information
about them is stored in the agent repository. This information is used to locate
the agents and communicate with them or to change their preamble on-the-fly.
Manager agents in charge of different cells communicate periodically with manager
agents of adjacent cells. In case the manager agent of an adjacent cell goes
down, the cell can be taken over temporarily and the agent factory notified
of this condition. The agent factory can then deploy another manager agent for
the affected cell.
Since each cell has its own data repository, management stations can access
this data on demand. Abnormal conditions are communicated to the management
stations as they occur, and this in turn generates a pull request for data on
a particular cell to be transferred to the management station. Management stations
need not maintain the topology and other information for the whole network,
thus freeing their resources for reporting and presentation. For the complete
picture of the network, the cell-wide data can be exported to a global repository
periodically.
Next
We have presented a swarm intelligence-based framework for distributed network
management, which we believe will reduce the shortcomings of traditional network
management solutions. Currently, a prototype is under implementation. Future
work will involve:
- Implementing the complete framework and carrying out comparative tests
with traditional network management software (like performance analysis and
scalability testing).
- Extending the framework for managing different types of networks (ATM networks,
proprietary networks), and exploring the application of this framework to
management of resources in different domains.
The authors are with Hewlett Packard ESG-India at Bangalore.
They may be contacted at ankur_gupta@hp.com and sumanth.gopal@hp.com
|