Commit 768fd8d0 authored by Ronald Charles Moore's avatar Ronald Charles Moore
Browse files

added new directory interpret++

parent b41e1b3a
# This Makefile made available to his students by
# Prof. Ronald Moore
# https://fbi.h-da.de/personen/ronald-moore/
# mailto:ronald.moore@h-da.de
# with no warranties whatsoever
PROGS := interpret++
SOURCES := main.cpp lexer.cpp parser.cpp
OBJS = $(SOURCES:.cpp=.o)
# Uncomment only one of the next two lines (choose your c++ compiler)
# CC=g++
CC := clang++
## Add your own CFLAGS if you find them necessary... such as -O3 or so...
## -g for debugging
## -std=<whatever> to select the right C++ Version
## -fmessage-length=0 disallows line wrapping in error messages
## (helps some IDEs (still?))
CPPFLAGS := -g -std=c++17 -Wall -fmessage-length=0
## More preliminaries
# See https://www.gnu.org/software/make/manual/html_node/Special-Targets.html
# In this makefile, we want to keep going even if we find errors
.IGNORE :
# Tell make that the following "targets" are "phony"
# Cf. https://www.gnu.org/software/make/manual/html_node/Phony-Targets.html#Phony-Targets
.PHONY : all clean tests
# This absolutely needs to be the first target (so to be the default target)
all: $(PROGS)
# Some of the "Automatic Variables" that can be used in Makefiles.
# Cf. https://www.gnu.org/software/make/manual/ - particularly
# https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html#Automatic-Variables
# $@ = The filename representing the target.
# $< = The filename of the first prerequisite.
# $(*F) = The stem of the filename of the target (i.e. without .o, .cpp...)
# $^ = The names of all the prerequisites, with spaces between them.
## Following magic is used to figure out which dot cpp files depend
# on which headers (dot h files) -- automatically (so that we recompile
# only the ncessary dot cpp files when a header is maodified).
# Magic taken from
# http://make.mad-scientist.net/papers/advanced-auto-dependency-generation/
# ... and then fixed, and fixed, and fixed some more.
DEPDIR := .deps
# Broken: DEPFLAGS = -MT $@ -MMD -MP -MF $(DEPDIR)/$*.d
DEPFLAGS = -MMD -MF $(DEPDIR)/$*.d
# include dep files, if they exist (minus says don't complain if they don't)
DEPS := $(OBJS:%.o=$(DEPDIR)/%.d)
-include $(DEPS)
%.o : %.cpp %.d
$(CC) -c $(CPPFLAGS) $(DEPFLAGS) -o $@ $<
# Make depdir if it doesn't exist...
$(DEPDIR): ; @mkdir -p $@
## Now, the targets -- the things that will get made!
$(PROGS): $(OBJS)
$(CC) $(CPPFLAGS) $(OBJS) $(LIBS) -o $@
clean:
$(RM) -v *~ *.o $(PROGS) tmp.txt
$(RM) -fvr $(DEPDIR)
# starting recursive clean...
cd tests && $(MAKE) clean
tests: $(PROGS)
# Going to the tests directory for testing
cd tests && $(MAKE) tests
Overview
========
Everything here is taken from the slides for the "Compiler Construction"
course, i.e. it is directly from Prof. Ronald C. Moore.
Or at least, after all this time, I really don't remember having stolen
this code from somewhere else, but if you find an older copy that looks
similar, please let me know, so I can give credit where credit is due --
or disavow knowledge of that source (as the case may be).
You are in the following subdirectory
** `interpret++`
This code take mathematical expressions and evaluates them,
i.e. it outputs numbers. It is the same as its sister folder `interpreter`
as far as recursive descent parsing goes, but uses more C++ features and
is set up so as to support growing up to be a larger project.
See **Chapter 1 Front End Construction**, Slides 21 and 22 (and please let me know when the inevitable day comes that these slide numbers are no longer correct).
Building and Running
====================
This version requires a C++17 compatible C++ compiler,
such as newer versions of `g++` or `clang`.
Build the program by running `make`.
In case of doubt, use the voodo command `make clean` and then repeat `make`.
To test the program, run `make tests`. This also illustrates how the intepreter is used.
Alternatively, just run the program with *no* parameters (i.e. simply `./intepret++`).
It will tell you how it wants to be run (Hint: there are two usages).
Contents (Manifest)
====================
You should find here:
* `lexer.cpp` and `lexer.h`
Source code for the lexer.
* `parser.cpp` and `parser.h`
Source code for the recursive descent parser.
* `main.cpp`
Source code for `main` -- the driver.
* `Makefile`
Used to run `make` (obviously?).
* `tests`
A directory full of test cases. Run `make tests`
(either in directory `tests` or the parent directory)
to run the tests. Not unit tests, rather acceptance tests,
but regression tests all the same.
* `README.md`
This file.
Ronald Moore
https://fbi.h-da.de/personen/ronald-moore/
ronald.moore@h-da.de
1 May 2020
// This code made available to his students by
// Prof. Ronald Moore
// https://fbi.h-da.de/personen/ronald-moore/
// mailto:ronald.moore@h-da.de
// with no warranties whatsoever!
#include <cassert>
#include <cctype> // for isspace
#include <cstdlib> // for strtod
#include <iostream>
#include <fstream>
#include <string>
// include <vector>
// ===================
// LEXICAL ANALYSIS
// The following are taken to be tokens:
// left and right parenthesis, the plus and minus characters,
// as well as asterisk and forward slash -- and numbers.
// In the script, substraction and division are not supported,
// but it seems like time to add them.
// Preliminaries and Utilities
// ============================
// Utility Types
typedef double numberType; // feel fee to change this to something else like int or float or bigint....
typedef enum Token {
tok_number = 'n',
tok_lparen = '(',
tok_rparen = ')',
tok_plus = '+',
tok_minus = '-',
tok_times = '*',
tok_div = '/',
tok_eof = 'E',
bad_tok = 'X'
} Token;
// global variables -- sue me if you don't like that!
static std::istream *input = &(std::cin); // until proven otherwise
static std::string currentLine( "" );
static int currentLineNumber = -1;
static int currentColumnNumber = 0;
static int currentTokenLength = 0;
static Token next_token; // again with the global variables...
static numberType currentNumber; // = zero....
static void printErrorMsg( const std::string Error )
{
std::cout << "ERROR on line " << currentLineNumber
<< ", column " << currentColumnNumber << " : "
<< Error << std::endl;
std::cout << currentLine;
for ( int col = 0; col < currentColumnNumber-1; col++ )
std::cout << '-';
std::cout << '^' << std::endl;
} // end printErrorMsg
// The Lexer
// ==========
static bool skippedWhiteSpace( ) { // return true if not at EOF, i.e. if skipped
while ( true ) {
int currentLineLength = currentLine.length();
while ( currentColumnNumber < currentLineLength )
if ( isspace( currentLine[ currentColumnNumber ] ) )
currentColumnNumber++;
else // if NOT isspace()
return true;
// if we're here, we're at the end of a line.
std::getline( *input, currentLine );
currentLineNumber++;
currentColumnNumber = 0;
if ( ! *input ) // EOF!!
return false;
// else, repeat!
// Which is the same as
// return skippedWhiteSpace() -- i.e. tail recursion.
};
};
static Token gettok( ) {
assert( input ); // we assume nullptr != input
if ( ! *input ) return bad_tok;
// else, we can read from input
// Skip white space, going to next line as necessary
if ( ! skippedWhiteSpace( ) ) return tok_eof;
// We're have visible text in front of us.
char currentChar = currentLine[ currentColumnNumber ];
currentColumnNumber++; // usually, but see num...
switch ( currentChar ) {
case '(' : return tok_lparen;
case ')' : return tok_rparen;
case '+' : return tok_plus;
case '-' : return tok_minus;
case '*' : return tok_times;
case '/' : return tok_div;
default :
// either we have a number in front of us, or we don't
assert( 0 < currentColumnNumber );
char *alpha = &(currentLine[ currentColumnNumber-1 ]);
// minus one because we incremented it before the switch
char *omega = nullptr; // until we call strtod...
double tmpValue = strtod( alpha, &omega );
if ( alpha == omega ) {
return bad_tok; // !!!
};
// else if strtod found a real number (or at least a double)
currentNumber = tmpValue; // let C++ do the converison
currentColumnNumber += (omega - alpha) -1;
// minus one because we incremented it before the switch
return tok_number;
}; // end switch
assert( false ); // we should never get here!
return bad_tok;
} // end gettok
// PARSING!!!
// ===========
//
// The grammar we are going to parse here is:
// Grammar:
// E → T E´
// E´ → + T E´ | - T E´ | ε
// T → F T´
// T´ → * F T´ | / F T´ | ε
// F → ( E ) | num
// Note that the recursive descent function for (e.g.) E´
// is nameded "E2ndHalf"-
// Forward Declarations
static numberType E();
static numberType E2ndHalf();
static numberType T();
static numberType T2ndHalf();
static numberType F();
// E → T E´
numberType E() { return T() + E2ndHalf(); }
// T → F T´
numberType T() { return F() * T2ndHalf(); }
// E´ → + T E´ | - T E´ | ε
numberType E2ndHalf() {
switch ( next_token ) {
case tok_plus :
next_token = gettok(); // eat +
return T() + E2ndHalf();
case tok_minus :
next_token = gettok(); // eat -
return (-1.0 * T()) + E2ndHalf();
default :
return 0.0;
};
} // end E2ndHalf
// T´ → * F T´ | / F T´ | ε
numberType T2ndHalf() {
numberType tmp, rhs, acc;
switch ( next_token ) {
case tok_times :
next_token = gettok(); // eat *
return F() * T2ndHalf();
case tok_div :
next_token = gettok(); // eat /
tmp = F();
if ( 0.0 != tmp )
return (1.0/tmp) * T2ndHalf();
// else if T() returned zero
printErrorMsg( "Division by zero!" );
// fall through to default return one
default :
return 1.0;
};
} // end T2ndHalf
// F → ( E ) | num
numberType F() {
numberType result = 0;
switch ( next_token ) {
case tok_lparen :
next_token = gettok(); // eat lparen
result = E();
if ( tok_rparen == next_token ) {
next_token = gettok(); // eat rparen
return result;
};
// else if rparen not found
printErrorMsg( "Expected Right Parenthesis" );
return 0.0;
case tok_number :
result = currentNumber; // side-effect of last gettok()
next_token = gettok(); // eat id
return result;
default :
printErrorMsg( "Expected Left Parenthesis or number" );
return 0.0;
};
}
// main (!)
// =========
int main( int argc, char **argv ) {
if (2 != argc) {
std::cerr << "Usage: " << argv[0] << " <fileName>.\n"
<< "You provided " << argc-1 << " arguments, we take exactly one (only).\n";
return( -1 );
};
// else if 1 == argc ....
std::string fileName( argv[1] );
if ( "-" != fileName ) {
static std::ifstream ifs( fileName );
input = &ifs;
}
// Prime the pump!
next_token = gettok( );
// get tokens and dump them...
while ( tok_eof != next_token ) {
std::cout << currentLineNumber << ":"
<< currentLine << std::endl;
numberType value = E( );
std::cout << "INTERPRETER: " << value << std::endl;
};
std::cout << "End Of File!" << std::endl;
return 0; // Alles klar!!!
}
// This code made available to his students by
// Prof. Ronald Moore
// https://fbi.h-da.de/personen/ronald-moore/
// mailto:ronald.moore@h-da.de
// with no warranties whatsoever!
#include "lexer.h"
#include <cassert>
#include <cctype> // for isspace
#include <cstdlib> // for strtod
#include <iostream>
#include <fstream>
namespace lex { // continue to define things in lex::
// instantiate here...
lex::Token next_token; // again with the global variables...
// global variables -- sue me if you don't like that!
static std::string inputSourceName( "standard input" );
static std::istream *input = &(std::cin); // until proven otherwise
static std::string currentLine( "" );
static int currentLineNumber = -1;
static int currentColumnNumber = 0;
// Namespace "member functions"
// ============================
void printInputLocation( ) {
std::cout << inputSourceName
<< " (" << currentLineNumber
<< ',' << currentColumnNumber
<< "):" << std::endl;
std::cout << currentLine << std::endl;
for ( int col = 0; col < currentColumnNumber; col++ )
std::cout << '-';
std::cout << '^' << std::endl;
} // end printInputLocation
void printErrorMsg( const std::string Error )
{
printInputLocation( );
std::cout << "ERROR : " << Error << std::endl;
advance_token( ); // Don't want to get stuck here.
} // end printErrorMsg
// Utility skippedWhiteSpace...
static bool skippedWhiteSpace( ) { // return true if not at EOF, i.e. if skipped
while ( true ) {
int currentLineLength = currentLine.length();
while ( currentColumnNumber < currentLineLength )
if ( isspace( currentLine[ currentColumnNumber ] ) )
currentColumnNumber++;
else // if NOT isspace()
return true;
// if we're here, we're at the end of a line.
std::getline( *input, currentLine );
currentLineNumber++;
currentColumnNumber = 0;
if ( ! *input ) // EOF!!
return false;
// else, repeat!
// Which is the same as
// return skippedWhiteSpace() -- i.e. tail recursion.
};
};
Token gettok( ) {
assert( input ); // we assume nullptr != input
Token result( bad_tok, '\0' ); // default
if ( ! *input ) return result; // i.e. bad_tok
// else, we can read from input
// Skip white space, going to next line as necessary
if ( ! skippedWhiteSpace( ) ) {
result.first = lex::tok_eof;
return result;
};
// else -- not eof, d.h. we have visible text in front of us.
char currentChar = currentLine[ currentColumnNumber ];
result.second = currentChar; // unless it's a number, etc.
currentColumnNumber++; // usually, but see num...
switch ( currentChar ) {
case '(' : result.first = tok_lparen;
break;
case ')' : result.first = tok_rparen;
break;
case '+' : result.first = tok_plus;
break;
case '-' : result.first = tok_minus;
break;
case '*' : result.first = tok_times;
break;
case '/' : result.first = tok_div;
break;
default :
// either we have a number in front of us, or we don't
assert( 0 < currentColumnNumber ); // remember, incremented!
char *alpha = &(currentLine[ currentColumnNumber-1 ]);
// minus one because we incremented it before the switch
char *omega = nullptr; // until we call strtod...
double tmpValue = strtod( alpha, &omega );
// strtod sets omega to the first char after the number
if ( alpha == omega ) {
result.second = *omega; // or *alpha, they're the same...
return result; // i.e. bad_tok !!!
};
// else if strtod found a real number (or at least a double)
result.first = tok_number;
result.second = tmpValue; // let C++ do any converisons
currentColumnNumber += (omega - alpha) -1;
// minus one because we incremented it before the switch
}; // end switch
return result;
} // end gettok
void openInputSource( std::string filename ) {
assert( ! filename.empty() ); // caller should check that
inputSourceName = filename;
if ( "-" == filename )
input = &(std::cin);
else { // if fileName is not "-" (a dash)
static std::ifstream ifs( filename, std::ifstream::in );
if ( ! ifs.good( ) ) {
std::cerr << "ERROR opening file name " << filename
<< " -- could not open.\n";
exit( -2 );
};
// else if ifs is good
input = &ifs;
};
} // end of openInputFile
} // end namespace lex
// This code made available to his students by
// Prof. Ronald Moore
// https://fbi.h-da.de/personen/ronald-moore/
// mailto:ronald.moore@h-da.de
// with no warranties whatsoever!
#pragma once
#include <cctype> // for isspace
#include <cstdlib> // for strtod
#include <string>
#include <utility> // for std::pair
#include <variant> // new C++17 feature! Like unions, only better!
// ===================
// LEXICAL ANALYSIS
// The following are taken to be tokens:
// left and right parenthesis, the plus and minus characters,
// as well as asterisk and forward slash -- and numbers.
// In the script, substraction and division are not supported,
// but it seems like time to add them.
// Preliminaries and Utilities
// ============================
namespace lex {
// Utility Types
typedef double numberType; // feel fee to change this to something else like int or float or bigint....
// Tokens -- are a pair of a tag and a value, where the value can be
// various things - a char or a numberType at present, but names and
// multicharacter operators could be added later
typedef enum {
tok_number = 'n',
tok_lparen = '(',
tok_rparen = ')',
tok_plus = '+',
tok_minus = '-',
tok_times = '*',
tok_div = '/',
tok_eof = 'E',
bad_tok = 'X'
} TokenTag;
typedef std::variant< char, numberType > TokenValue;
typedef std::pair< TokenTag, TokenValue > Token;
extern Token next_token; // again with the global variables... {
// Functons (or methods, if you prefer)