SourceForge.net Logo

Jemma - Jason's Enhanced Machine Maintenance Architecture


------------------------------------------------------------------------------
SourceForge Submission Text
------------------------------------------------------------------------------
Jemma is a collection of tools to assist with administration of a
heterogeneous network of computers. In particular it is focused on
providing a consistent set of tools for maintaining a wide range of
Unix flavors.

This is accomplished be separating as many of its operations as possible
from the host operating system. In particular it uses its own version
of other third party Open Source tools to remove its dependency on the
host tools.

For example it will use its own packager manager (eg RPM) to manage
its own component software, it will use a standalone database system
(eg sqlite) for persistent data storage and perl for most coding.

This system is of particular use in Facilities Management environments
where the monitoring and management tools need to be keep separate from
the clients systems and tools. It will be perfectly normal to have both
a client version of perl and a management version of perl simultaneously
installed and at differing versions.
------------------------------------------------------------------------------
The Concepts

The main idea is to have a set of tools you use to maintain a set of
computers. These computers are traditionally of various Unix flavors.
The problem, as all System Administrators soon recognise, is that each
Vendor of a Unix operating system provides its useful functionality
through a different mechanism.

Very few commercial operating systems use anything but their own
custom package management system. Each vendor also has its own
performance monitoring tools. Many tools are available from the
Internet that work on a range of Unix flavors and these are used
extensively in Jemma.

The idea is divorce the monitoring tools from what you are monitoring.
It is very difficult to use Email as your message passing system if the
email server you send your messages through is out of order. It is also
difficult to report that the version of perl of the host is out-of-date
if your tools rely on a later version of perl.

Many of the issues are exasperated when the system being maintained is
not owned by the maintainer. This is the regular situation in Facilities
Management based environment. You cannot force a client to upgrade the
version of perl on their host just because your monitoring tools
need a later version (and vice versa).

For this reason (and others) Jemma relies on its own collection of tools
which are as independent as possible from the operating system. There are
some practical limits to this (hard to divorce from the kernel and libc
libraries), though most tools for a fully functional monitoring system
are ready and available through the great work of Linux and FreeBSD.

The main difficulty is that the tools need to be setup to work under
a non-traditional directory structure. In Jemma all the tools are
setup to run under a single directory (obviously /jemma) but of course
they can be easily moved to any other location. We strongly recommend
that you DON'T use already used locations such as: /, /usr, /opt and
/usr/local to name a few.

Of course there is an increase of disk space required as many commonly
used tools are installed more then once - though this is a small price
to pay for increased flexibility and resilience.

------------------------------------------------------------------------------
The Major Components of Jemma

Jemma is broken into distinct sub-sections:
  1) jemdb - Jemma change control and configuration database
  2) jevenger - Jemma event messenger
  3) jemini - Jemma software distribution system

Jemdb is the change control and configuration database that controls
the whole system. The database is a schema-less and typeless database
that control all aspects of the Jemma systems. The database is also
replicated in part to the managed hosts so that hosts can operate
independently from the controller host (more on this later).

Jevenger is the event messenger sub-system. It is used to send messages
from one application to another, potentially on different hosts. Messages
mainly consist of alerts and problem notification, but also includes
statistical data collection and software maintenance.

Finally jemini is the software distribution systems. Jemini is used
to update and maintain all the software components required for the
Jemma system to function and co-operate.

------------------------------------------------------------------------------
Initial Implementation

Initially Jemma will be implemented for the Linux and Solaris operating
systems (as they are what I have easy access to). In a commercial
system I have had a prototype system running on: Solaris, Linux, HP-UX,
OSF, Reliant, Dynix, IRIX, FreeBSD, SCO and Cygwin systems.

Core tools that are used:
  perl - most code written in perl
  rpm - for software management
  postgresql/sqlite - for database
  soap - for message passing
  apache - web server
  ??? - for GUI tools
  gnuplot - statistics plotting

------------------------------------------------------------------------------
Radical Random Ideas

1) jemdb is a schema-less and type-less database. In traditional
   Entity-Relationship terms it consists of a single table with a
   many-to-many relationship to itself.

   When you normalise this you end up with two tables.

   The first table (data) consists of five fields:
     oid - unique object identifier
     key - a key 'word'
     value - arbitrary text (possibly binary)
     start - a date-time when the object was created
     finish - a date-time when the object was deleted

   The second table (relate) is simply:
     parent - oid of object in 'data' table
     child - oid of object in 'data' table
     start - a date-time when the relationship was created
     finish - a date-time when the relationship was deleted

   In graph terminology each item in the data table is a node and
   each item in the relate table is a directed arc from one node
   to another. This gives you a network data model.

   It may not be obvious but extremely complex data structures can
   be represented in this schema and full time-travel is possible too.
   For example you can do queries on the database at any arbitrary
   point in time (eg list of all hosts we were managing on 10 September
   2002).

   There is a single ROOT node in database - need special rule to
   find this one (ie oid=1). All other information is related from
   this single node. Each node in the graph will generally be accessible
   from many different paths. This allows high-level concepts (eg a host)
   to be related by many different organisational structures. A host
   can be 'found' by:
     knowing its hostname
     knowing what its operating system is
     knowing what department and/or unit manages the host
     knowing what application it runs
     knowing what service it provides
     any combination of the above (and more)
   And conversely once you have the hosts node you can find all the
   above information trivially as they are all connected nodes.

   By using the organisational information around a node it is
   possible for jinni to target particular bits of software to only
   those hosts that require it. For example jinni will only send the
   oracle monitoring tools to hosts that run oracle. If a host does
   not run oracle then there is no need to install or run the tool.
   On a similar note it will use the hosts operating system to ensure
   the correct pre-compiled binaries are distributed to the host.

   Configuration of jemma software which runs on the remote host
   also uses the jemdb to dictate its operation (yet to determine
   how this is done - previously done through 'building' configuration
   files and distributing those - may be better to distribute sub-set
   of actual database instead).

2) jevenger breaks events down into a simple jemdb-style database.
   Each jevenger event only has one required attribute - its unique
   identifier. Extra data is then related to this event-id and based
   upon the data contents the event is routed to the relevant application.

3) Stick all software in /jemma directory and divorce OS dependency
   as much as possible.

   Take most of the useful tools from Linux and by using perl CPAN
   modules you can abstract out other OS dependencies.

   For example use things like gzip, perl, text-utils, sh-utils and
   openssh from OpenSource systems - all trivial to compile into
   a different prefix. In most cases these tools are better and
   more functional then commercial vendor supplied tools (though
   not always!).

   Nigh impossible to remove dependency on kernel and libc, but with
   things like User mode linux kernels this may even be possible (though
   probably not practical).

4) For each 'operation' need to determine a super-set of parameters
   for each operating system that we support.

   Determine which of these parameters are essential, which are optional
   and which we will simply ignore (ie new functionality not required).

------------------------------------------------------------------------------
Some Case Studies

1) disk usage
   perl and stavfs calls to collect data.
   Set of rules to raise events based on level.
   Check block, i-node and fragmentation where possible

2) kernel parameters
   collect kernel params and tune as required
   In particular SYSV shared memory, semaphores and message queues.

3) boot scripts
   determine what services start at boot
   turn on/off general services

4) volume managers
   determine physical and logical volume layouts
   veritas, disk-suite, lvm, metadisk et al
   pay attention to redundancy levels and health

5) account management
   create/remove/modify users and groups
   support many naming services

6) patch/software installation
   install vendor supplied software and/or patches

7) performance statistics
   collect os stats, eg df, sar, vxstat

8) application monitoring and availability
   make sure apps work
   database servers, web servers
   smtp, ntp, inetd, nfs, etc

9) application configuration
   configure applications
   oracle, apache, sendmail, postfix, whatever

10) security checks
    checks to ensure exe's are right - md5sum
    find new suid etc

11) hardware inventory
    determine all hardware - in particular model numbers (if possible)
    allow manual override if unable to determine electronically
    disks, memory, cpu, cdrom, tapes