	The Lexical Analyzer for the C Language Processor
	=================================================

	The lexical analyzer is build as a subroutine serving as an
	interface between the to be compiled source program and the
	language processor. It reads the source program one character
	at a time  and constructs tokens where a token represents a
	sequence of characters  as a single logical entity.
	Identifiers, keywords, constants operators and delimeters are
	recognized tokens. Whitespace like blanks, tabs and newline
	separates tokens but is not returned to the caller. Comment
	is syntactical verified but in addition not returned to the
	caller.

	The following tokens can be recognized:

		identifier
		keyword
		integer constant
		long integer constant
		hexadecimal constant
		long hexadecimal constant
		octal constant
		long octal constant
		floating point constant
		character constant
		string constant
		operator

	The following grammar describes the tokens which can be
	recognized by CC_lex:
	
	  ANYCHAR  := any allowed character and C escape sequences 
		    starting with a '\'
 	  LETTER   := 'A' to 'Z', 'a' to 'z' and '_'
	  DIGIT    := '0' to '9'
	  OCTAL    := '0' to '7'
	  HEX      := DIGIT, 'A' to 'F' and 'a' to 'f'

	  name     := LETTER [ LETTER | DIGIT ]...
	  integer  := DIGIT [ DIGIT ]...
	  longint  := integer { 'l' | 'L' }
	  hexint   := '0' { 'x' | 'X' } HEX [ HEX ]...
	  longhex  := hexint { 'l' | 'L' }
	  octalint := '0' [ OCTAL ]...
	  longoct  := octalint { 'l' | 'L' }
	  exponent := { 'e' | 'E' } [ '+' | '-' ] integer
	  float    := integer '.' [ exponent ] 
	              | '.' integer [ exponent ]
	              | integer exponent
	  string   := '"' [ ANYCHAR ]...  '"'
	  char     := "'" ANYCHAR "'"

	The lexical analyzer is constructed out of three major pieces:

		an input routine
		a state machine
		and several action routines

	The input function ccgetc
	-------------------------

	The input function returns a character at a time. If a
	newline character was detected by the lexical analyzer CC_lex
	the input function will read a new record and will increase
	the line counter. A character counter ( a relative character
	position from begin of the source file ) is also maintained.
	A file pointer pointing to an open stream file has to be
	passed to ccgetc as the global variable input_fileptr_G. This
	file gets read by the native C function call fgets.

	The following values are returned by ccgetc:

		as function return value
		   . next input character 
		as global variable
		   . curr_line_G - current line number
		as compile global vaiable
		   . chp - current character position in input buffer
		   . newline     - is set to false after a new line
		                   has been read

	The lexical analyzer CC_lex
	___________________________

	The lexical analyzer CC_lex is implemented as a finite state
	automata. A transition table is used. Each row of the
	transition table reflects the state of a token while each
	column represents an admissible input character. A character
	table is used to translate the input character into the
	column index of the transition table.

	Format of the character table

	The character table has 128 entries, one entry for each
	possible input character. Each entry is a short integer,
	where each byte has a special meaning. The low byte holds the
	index into the state table. The high byte presents four
	different uses.
	
	     1) It contains the token type if the incoming character
	        is a single character operator. A delimeter is
	        considered as an operator. 

	     2) If the incoming  character could lead to a possible
	        multi character operator like +=, ++ or >>= etc. , it
	        contains an index into a "multi character operator"
	        table. In this case the value of the high byte is less
	        than 12.
		
	     3) The high byte is zero for 
		alpha character and digits.

	     4) In case of a character which is not allowed in the
		C character set the high byte contains the value 127

	The format of the state table

	This table is a two dimensional array as mentioned above.
	Each field of the array is of type short and holds as value 
	the new state of the token. If the state is larger then 255
	it is and end state.In this case the high byte describes the
	action which has to be taken while the low byte holds a
	positon action. The following position action are
	implemented:

	position action =
		  0 - don't adjust position
		  1 - backup one character
		  2 - skip one character
		  3 - an lexical error was detected, backup one
		      character

	The following actions are implemented:

	action = 
		  1 - handle a name token
		  2 - convert an integer token
		  3 - convert a long integer token
		  4 - convert a floating point token
		  5 - convert a hex token
		  6 - convert a long hex token
		  7 - convert an octal token
		  8 - convert an long octal token
		  9 - handle an operator token

	Action routines

	The action routines are conversion function for integer, hex,
	octal and floating point constants. They return the token
	type and a token value which is the converted constant value.
	The name action routine checks each name for a possible
	reserved word. For this reason it calls the function
	lookup_reswd which returns an index into the reserved word
	table or zero if the name was not a keyword. In case of a
	keyword the type of the keyword is returned by the name
	action routine. In case of a name token the type name is 
	returned. 

	The action routine operator checks if it is a single
	character operator. In this case the high byte of the
	character table entry is larger than 11 and represents the
	type of the operator token which will be returned by the
	operator action routine. If it is a possible multi character
	operator the multi character operator table is used to
	determine which operator it is. The high byte of the
	character table entry is the index into the multi character
	operator table. The format of the multi character operator
	table is as follows:
	
		It is a two dimensional table where each field
		contains  the type of the operator each column
		presents the character with which the operator
		begins. The table has three rows. The first row holds
		the type for the single character operator. The
		single row holds the type for assignment operator
		like +=, !=, etc. . The third row holds the type for
		duplicate character operators like ++, == etc.

	The operator action routine determines which row to take by
	looking at the next character in the input buffer. If it is
	an equal sign the type of the second row will be returned. If
	the next character is the same as the current character the
	type of the third row will be returned otherwise it is a
	single character operator and the type of the first row will
	be returned.

	Input variables to  CC_lex
	
		input_fileptr_G - file pointer passed to ccgetc

	Output variables

		as function return value
			token type
		as global variables
			toktyp_G    - token type
			tokval_G    - constant value in case of token 
				      being a constant
			tokpos_G    - begin character position of token
				      in source file
			toklen_G    - length of token
			tokptr_G    - pointer to token character string
			curr_line_G - current line number


	The lookup_reswd function

	This function searches in a keyword table if the current
	token is a keyword. The first character of the token is used
	to index into the keyword table to keywords starting with the
	same character. Keywords with the same starting character are
	linked to each other. Therefore only the linked keywords need
	to be compared for a match.
                                                                                                                                                                                                                                                                              