Notes on the distribution: pfsa.tar.gz -------------------------------------- This archive contains programs to create and manipulate Probabilistic Finite State Automata (PFSA). Each program comes with a built-in usage function which can be invoked by calling it with the -h option. What follows is a brief description of the programs and how to make them. Read the file UserDoc.ps for a more detailed description of the various programs. You should find the following files in this archive: Makefile README (this file) accept.c beams.c canon.c dfa.c grafit.c genstr.c graphpro.ps groupnodes ktail.c misc.c mml.c opt.c parser.y pfsa.h printgraph ptest.c randpfsa.c seegraph simba.c skstr.c strings.c tok.l test.strings (use this to test if the programs work ok) UserDoc.ps Once you have uncompressed and untarred it into a separate directory, type "make depend" and then "make" and it should produce the following targets: genstr : generate strings using a specified pfsa canon : build a canonical pfsa from a set of input strings opt : reduce the input pfsa using one of the following algorithms simba : opt using the simulated beam annealing algorithm (Raman,1997) skstr : opt using the sk-strings algorithm (Raman&Patrick, 1995) ktail : opt using the k-tails algorithm (Biermann&Feldman, 1972) beams : opt using the standard beam search algorithm dfa : Determinise the possibly non-deterministic input machine grafit : Output the pfsa in a form suitable for Graphplace. randpfsa : Generate random pfsa under specified limits Ignore compiler warnings from the files tok.c and parser.c; these are generated by yacc and lex. You will find that skstr, ktail, beams and simba are just symbolic links to opt. If symbolic links are not possible on your system, create four copies of opt called by each of the above names. This is the way that opt decides which pfsa minimisation algorithm to run on the canonical machine. The beams program has been superceded by the simba program (15/01/97); beam search is now a special case of simulated beam annealing with an initial annealing temperature of 0 degrees. Thus, you may use "simba -t0" to achieve the same effect as "beams". beams is however, still distributed for "historical reasons". PFSA encoding and decoding: The pfsa used by the various programs below are represented in ASCII using lines of the following form: where and are state numbers, is the label on the arc between them and is the number of times that arc is traversed. Each line may optionally be terminated by a semicolon and '#' begins a comment that extends to the end of the line. The following is an example pfsa specification: # Gaines' (1976) 4 state machine # source target symbol frequency 0 1 c 4 0 3 b 3 1 1 a 9 1 2 b 7 2 0 \n 7 3 1 b 3 Either or or both may be left out. defaults to the one on the previous line and to 1. Restrictions: The PFSA must have a state numbered 0 Symbol names cannot begin with a number Input and Output strings: A "string" in the context of these pfsa programs is a series of ":" separated labels ending with a newline. For example the following are valid strings: a:b:c:d:e: hello:world: Note that labels can more than one character in width, although it is recommended that you keep them as short as possible. The file test.strings in this distribution contains a list of sample strings that can be used to test various parts of this package. What the programs do: Use canon to build a canonical PFSA from a list of strings. The canonical pfsa is just a trie of the input with frequencies on each arc reflecting how many times the arc was traversed in creating that structure. Try this: ("$" below is the shell prompt) $ canon test.strings or $ canon -v test.strings or $ canon -v