summaryrefslogtreecommitdiff
path: root/comp/lucas-standen-NEA/writeup2/writeup.tex
diff options
context:
space:
mode:
Diffstat (limited to 'comp/lucas-standen-NEA/writeup2/writeup.tex')
-rw-r--r--comp/lucas-standen-NEA/writeup2/writeup.tex523
1 files changed, 523 insertions, 0 deletions
diff --git a/comp/lucas-standen-NEA/writeup2/writeup.tex b/comp/lucas-standen-NEA/writeup2/writeup.tex
new file mode 100644
index 0000000..7427570
--- /dev/null
+++ b/comp/lucas-standen-NEA/writeup2/writeup.tex
@@ -0,0 +1,523 @@
+\documentclass[a4paper,12pt]{article}
+
+\usepackage{geometry}
+\usepackage{titling}
+\usepackage{titlesec}
+\usepackage[english]{babel}
+\usepackage[hidelinks]{hyperref}
+\usepackage{listings}
+\usepackage{xcolor}
+\usepackage{graphicx}
+\usepackage{forest}
+\usepackage{tikz-qtree}
+
+\definecolor{codegreen}{rgb}{0,0.6,0}
+\definecolor{codegray}{rgb}{0.5,0.5,0.5}
+\definecolor{codepurple}{rgb}{0.58,0,0.82}
+\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
+
+\lstdefinestyle{mystyle}{
+ backgroundcolor=\color{backcolour},
+ commentstyle=\color{codegreen},
+ keywordstyle=\color{magenta},
+ numberstyle=\tiny\color{codegray},
+ stringstyle=\color{codepurple},
+ basicstyle=\ttfamily\footnotesize,
+ breakatwhitespace=false,
+ breaklines=true,
+ captionpos=b,
+ keepspaces=true,
+ numbers=left,
+ numbersep=5pt,
+ showspaces=false,
+ showstringspaces=false,
+ showtabs=false,
+ tabsize=8
+}
+
+\lstset{style=mystyle}
+
+\titleformat{\section}
+{\Huge}
+{}
+{0em}
+{}[\titlerule]
+\geometry{
+ a4paper,
+ total={170mm,257mm},
+ left=20mm,
+ right=20mm,
+}
+
+\author{Lucas Standen}
+\title{The solution to bad code}
+
+\begin{document}
+\maketitle
+\newpage
+\tableofcontents
+\newpage
+
+{\setlength{\parindent}{0cm}
+
+\section{A breif head note and introduction}
+This document has been written for the use within AQA computer science
+Alevel coursework. It is published under the MiT license, which I hope
+will make it of use to someone other than myself some day on the in's
+and out's of simpler compiler design and programming languages as a
+whole.
+
+This is the second version of this document, it was writen in GNU
+roff before, however I decided to move over to latex for the
+more moddern features (notably image support). The latex source for
+this document is also avalible along with all referenced code at
+\url{https://github.com/standenboy/school}
+
+Any questions relating to this document should be sent to
+\href{mailto:thing1@seacrossedlovers.xyz}{thing1@seacrossedlovers.xyz}
+
+\section{Analysis}
+\subsection{The current problem}
+For general small and simple projects, I write in C. However this leads to
+hours of debugging due to segfaults, and memory leaks. Due to the languages
+manual memory management the programmer is required to know so much
+information about the hardware they write for, and the second anything goes
+wrong, it is vague on how to fix things.
+
+\textbf{I need a language that stops me from shooting myself in the foot}
+
+C has been standard for many decades now and its age is showing, it lacks
+many modern features like OOP, or higher level functional abstractions,
+that have become common in modern years due to there helpfulness. This is
+not to fault C's achievements either, the language is my personal choice for
+most projects for a reason, it's fast and powerful; any solution I make
+should not cut that away.
+
+\subsection{A solution}
+
+\textbf{\textit{Zippy LANG}}
+
+A next generation language, for general use. Designed for keeping code simple,
+neat and readable. It will be similar to functional languages, known for
+there strict ability to keep code safe and practical. The language should
+be compiled like C/C++, Haskell and Rust for fast runtime speeds
+
+The goal of Zippy is to make codding easier, while not limiting projects; to
+achive this the compiler will most likely want to use some form of middle man
+language to achieve compatibility with more libraries.
+
+\subsection{What is a programming language}
+\subsubsection{A very simple explanation}
+At its lowest definition a PL is a set of specific words, that when given
+to a computer in the right order have a reproducible behaviour. A more human
+way of saying that, would be "It's how we control computers".
+\subsubsection{Why are there so many}
+When someone is looking at code it can often be seen as just that, however
+there are hundreds of languages that all take the idea of "code" in very
+different ways. Some are designed for specific hardware, some are designed
+for making general use programs while others are highly specialized.
+It is important to see "code", as more than just one overarching term and
+instead see where the code is being used, and evaluate it from that.
+
+\subsection{Researching and getting a scope of the project}
+Before I start to design a language i should first find examples of others
+and find what i want my language to be like. I'd like my language to feel modern
+so i should take inspiration from what other modern languages do, however on the
+backed i want my language to be stable and fast, for that i should look at
+older projects.
+
+\subsubsection{Examples of older similar projects}
+\begin{description}
+ \item[Python]
+ Python is a high level OOP language that was designed in
+ 1991. It was made to make programming easy while still being able
+ to use some of C's functions. Although it has become standard for
+ many use cases, it is slow and inefficient, and very bloated.
+
+ \url{https://www.python.org/}
+
+ Zippy should take pythons high level abstractions, as they make
+ programming very easy and it should try and take notes from its
+ libraries as they are mostly well written,
+ and well documented.
+ \item[Lisp]
+ Lisp is the second ever programming language, developed at MiT,
+ and it is the first functional language, creating many common features
+ like higher order functions, recursion, and garbage collection. It is
+ generally not used any more as it feels old compared to other functional
+ languages, like Ocaml or Haskell.
+
+ \url{https://lisp-lang.org/}
+
+ Zippy should try to take alot from the syntax of lisp, () make
+ it easy to see what parts of code will effect what, and make
+ things easy to tokenize.
+ \item[Perl]
+ Perl is scripting language designed for use in linux, when bash is
+ too slow, or not suited for the job. Perl is often described as the glue
+ of the universe (see xkcd \url{https://3d.xkcd.com/224/}. Its syntax is
+ quite strange however and it is slow. Making it poorly suited towards
+ general use.
+
+ \url{https://www.perl.org/}
+
+ Zippy should take from perls minimalism, it is a small language that is
+ of a similar size to bash or zsh, while feeling closer to python.
+ If Zippy can achieve a similar small size, while remaining
+ powerful I will be happy with this outcome.
+\end{description}
+\subsubsection{Examples of newer similar projects}
+\begin{description}
+ \item[Gleam]
+ Gleam is a modern language releasing in the past 5 years. It is
+ highly functional, with no mutable data, no traditional loops.
+ Instead recursion can be used to replace alot of these features.
+ Gleam compiles to erlang/Beam bytecode, much like java to the
+ jvm, and doing this has made Gleam a highly scalable language with
+ good library support out the box.
+
+ \url{https://gleam.run/}
+
+ Zippy should take from the functional elements of Gleam, as they keep
+ programs safer, however Zippy should not remove all procedural
+ elements, as for loops are very helpful
+ \item[Haskell]
+ Haskell is another modern functional language known for being
+ very complicated, however incredibly powerful. Its syntax feels very
+ mathematical, and incredibly terse.
+
+ \url{https://www.haskell.org/}
+
+ Perhaps Zippy could learn from Haskell, as it provides functional and
+ procedural elements, making it a well rounded language
+ \item[Hare]
+ Hare was designed to be a 100 year language, and thus stability is
+ its main goal, it is not set to
+ have a syntax change any time soon, and it has strong emphasis on
+ memory managment like C. It fits into the same part of the tech stack
+ as C, and thus it can be used for some very low level work.
+
+ \url{https://harelang.org/}
+
+ I think Zippy should have a strong emphasis on stability, much like Hare,
+ too many times have I segfaulted due to a tiny mistake. Zippy should
+ also look to Hare's small size, you can buy a copy of Hare on a
+
+ \textbf{SINGLE 3 1/2" FLOLPY}
+
+ This is something I too should try to achieve.
+\end{description}
+\subsubsection{What should be taken away from these languages}
+I was already leaning towards functional programming when I started this project however
+now I believe it's the only option for producing safe applications. Zippy will be
+a functional language.
+
+I also believe that I should take size of the compiler into account, as this is important
+for keeping the project manageable and maintanable.
+
+And finally I think that syntax should be inspired by Lisp, although Lisp itself can be
+a messy language, with the right changes I am confident that I can make a attractive
+language for the 21st century.
+
+\subsection{Clients}
+In a project of this nature, the Client is every programmer alive; which is a pretty
+large scope. To narrow this down as much as possible, I will interview a small handful
+of people throughout the project, of different skill levels to get a good picture of
+what people think of the project.
+\subsubsection{Client 1: Amy C}
+My first client is a friend of mine, Amy C, she is a confident programmer who has
+completed many complicated projects. I am choosing her as a client as she can give me
+technical feed back on my project and its function/utility.
+\subsubsection{Client 2: Rayn M}
+Another friend of mine, Rayn M, is a technical computer user, however he does not know
+how to program at a high level. He will be a good client as he can show me how my
+language looks to some one who doesn't understand the inside workings, helping me
+design the structure of the code to something useable for more people.
+\subsubsection{Client 3: Myself}
+I've wanted to take out a project like this for a long long time, and this is the
+perfect opportunity to do so, I will be assessing myself along the way of this,
+building the project to my personal specification.
+
+\subsection{Questionnaires}
+It is important to get feedback from end users, so I will take multiple questionnaires
+throughout the project. I will then use them to slightly edit the requirements of my
+project this should make the final outcome more helpful and what people want.
+\subsubsection{Amy C, initial ideas}
+\begin{description}
+ \item[What do you find the most important in a language?]
+ Speed, readability, debugging ease and disk space efficiency.
+ \item[What tools are important for a language to have?]
+ IDE integration (things like tab complete and debugging tools), a
+ package manager, and the ability to interact with the user through the
+ command line easily.
+ \item[What features do you like from other languages?]
+ The ability to pass the memory reference of an object or function and a
+ collection of built-in or standard functions like "print", "split",
+ or "sort".
+ \item[What do you want to program in this language?]
+ Lightweight command line tools and web back ends.
+ \item[Do you intend to use graphics in the programs you write?]
+ Yes.
+ \item[Would you prefer a language that focuses on ease of
+ use, or power of code?]
+ I like a good balance between the two.
+ \item[What were your last 3 projects?]
+ A website, a small command-line tool and a midi keyboard (program runs
+ on a Raspberry Pi Pico).
+ \item[How many languages would you use on a single project?]
+ I try to use as little languages in a single project as possible, so i
+ could likely not use it in an existing project.
+ \item[Do you care for low level control, or would you prefer
+ high level abstractions?]
+ I think low-level control is very important, but high-level abstractions
+ are convenient, so a good balance between the two is best.
+ \item[Would you be happy to develop libraries for things that aren't
+ already implemented?]
+ Potentially if it is simple enough to implement new things.
+\end{description}
+\subsubsection{Notes from questionnare 1}
+Some of the key things that I'm taking away from this first questionnaire, are my
+client/users initial needs and use cases. I think it's clear my language can be of
+assistance to my client, Zippy will be a good language for web back ends and
+small command line tools, which my client expressed interested in.
+
+I find the fact my client is worried by executable size interesting, however
+I doubt it will be an issue; a ballooning code-base is unlikely as only one
+person is writing the whole project.
+
+I am also taking on the fact that my client wants good command line tools,
+so a pkg-manager and bundler should be a priority, perhaps they could be
+written in Zippy after the compiler is done.
+
+\subsection{The first elements of the project}
+At this stage I can say that I'm confident in my project and its scope. I
+have a goal in mind for
+it.
+
+\textbf{The key things to take away from this section are:}
+
+\begin{itemize}
+ \item
+ Make a high level language with a useable set of features, to
+ replace C in many situations.
+
+ \item
+ Keep the language readable and easy, with powerful tools available.
+
+ \item
+ Ensure the language is well supported with tools like a pkg-manager.
+\end{itemize}
+
+\section{Modelling}
+In larger projects, when a programmer needs a data structure that the language
+they are writing in doesn't provide, they will need to make their own. This can pose a
+challenge to some, especially in low level languages which don't provide anything
+out of the box.
+
+Bellow are a few examples of these data structures that C doesn't already provide,
+that I may use in my project.
+\subsection{Linked lists}
+this is an alternative implementation of a list, where you store some data, and
+the memory address to the next node. Then you can move through the list by reading
+the data then reading the data of the next node, and then repeating until the
+'next' part of the node is empty.
+\\
+
+A diagram showing this can be seen here:
+\\
+
+\begin{tikzpicture}
+ \tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
+ \Tree
+ [.ll
+ [.data
+ ]
+ [.next
+ [.ll
+ [.data
+ ]
+ [.next
+ [.ll
+ [.data
+ ]
+ [.next
+ ]
+ ]
+ ]
+ ]
+ ]
+ ]
+\end{tikzpicture}
+
+In C this is easy to implement as you can find a memory address very easily with to
+find where a bit of data is stored in memory (address of). I will need to use a 'struct',
+which is a bit like a class in C (however you can't attach a function to it, nor use
+inheritance). A simple implementation looks like this:
+
+\begin{lstlisting}[language=C++, caption=Linked list example]
+typedef struct ll {
+ void *data; // the data of the node
+ ll *next; // the next node
+} ll;
+\end{lstlisting}
+
+The pro's of a linked list are the fact that they can have data appended to the start or
+end easily by changing the root node, or the next node.
+
+Linked lists have a few downsides, for example you can't move through them backwards,
+and unless you store it on its own, you cant find the length of it in a fast way.
+
+In my project I would like to use linked list in the AST (see later sections for info),
+and perhaps to store lists in the language.
+
+\subsection{Dictionaries}
+A dictionary is a simple data structure that just stores, a bit of data, and a number or
+string to identify it.
+A dictionary like a linked list can be implemented with a struct in c like so:
+\begin{lstlisting}[language=C++, caption=Dictionary example]
+typedef struct dict {
+ void *data;
+ int id;
+} dict;
+\end{lstlisting}
+
+In my project I could use this to hold variables and functions which need to be
+checked and looked up, which is very slow when comparing entire strings, but with this
+I can compare integer ID's which is much faster.
+
+\subsection{Prototyping harder features}
+\subsubsection{Abstract syntax trees (AST's) theory}
+In a programming language many abstract data types will be used to allow the code
+to compile and execute, however I think the hardest part of this is an abstract
+syntax tree. This is a data structure that holds the code in an ordered form that
+can be analysed and executed in a simple way. It is a tree structure, with the top
+node being a root and all lower nodes being things needed to calculate the root. It can
+be used to show mathematical expressions and function calls, but I thing easiest way to
+show it is via a mathematical example.
+
+Take the follow expression for example:
+\\
+
+{\Large{\(1 + (10 * (3 - (2 * 4)))\)}}
+\\
+
+We know that this is equal to \(-49\)
+
+However for a computer this is far harder to understand. This is because it has no
+understanding of order of operation.
+
+To solve this we use an AST.
+
+When you solve that expression you know to start with
+\((2 * 4)\), then \(3 -\)
+from that, following the rules of BIDMAS to solve.
+
+We can represent the steps as a tree like so:
+
+\begin{tikzpicture}
+ \tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
+ \Tree
+ [.+
+ [.1
+ ]
+ [.*
+ [.10
+ ]
+ [.-
+ [.3
+ ]
+ [.*
+ [.2
+ ]
+ [.4
+ ]
+ ]
+ ]
+ ]
+ ]
+\end{tikzpicture}
+\\
+This will evaluate \(1 + (10 * (3 - (2 * 4)))\)
+
+As you can see, you need to evaluate the expression in the most brackets
+first, then the next, and so on, working you way up.
+
+You can evaluate code in a similar way, treating each operation (such as +-*/)
+as functions, doing the most deeply nested function first, then working up.
+Each expression can be represented in this tree, then to show a whole program you
+can create a list of trees.
+
+\subsubsection{Abstract syntax trees (AST's) practical}
+As a prototype i will make a program that can take mathematical expressions and evaluate
+them, and allowing for functions (in the form f(x)). It will do this via AST's.
+
+This prototype takes 173 lines of code, it takes a string as a cmd line argument then
+converts it into an abstract syntax tree, and finally it executes it. This is just a
+simple prototype and thus it is small in scope. It can only do simple operators (+-*/)
+and requires literal values to be surrounded by [] so it knows its not another
+expression to evaluate.
+
+\lstinputlisting[language=C++]{../code/proto/AST/ast.c}
+\textit{The main loop for the ast code.}
+\\
+
+\lstinputlisting[language=C++]{../code/proto/AST/astg.c}
+\textit{The execution loop for the ast code.}
+\\
+
+\lstinputlisting[language=C++]{../code/proto/AST/astg.h}
+\textit{The definition of the ast, and function prototypes.}
+\\
+
+Above is the code for the AST, it stores an operation (which is just an integer), and
+it stores a real left and real right value, along side two other nodes. The real values
+are integers, this would be the 2 numbers in reference, in the expression. The 2 nodes are a
+recursive data structure, much like putting an object of a class inside the definition of that class
+itself. They are used to store values that may still be expressions, for example
+(+ [1] (+ [1] [1])) the second part of this expression would be in the "right"
+variable.
+\\
+
+When code is executed I can check if "left", or "right" are NULL and if
+they are I know that I am at the lowest expression that is only literal values.
+Then I can execute that node and work my way up the tree.
+\\
+
+The exec function will execute the operation, unless there is a deeper node, if there is
+a deeper node, then it executes it, and places the result in the right or left spot
+respectively.
+\\
+
+\textbf{Here is an example input and output:}
+
+ ./ast "(+ (- [3] [1]) (- [3] [1]))"
+
+4
+
+{\small Note the [ ] used to tell the program where the literal values are.}
+\\
+
+Overall this was a relatively successful prototype, however it isn't fully functional
+as a language but it has fit the design for a prototype.
+\\
+
+\textbf{The code for the AST can be found here:
+\url{https://github.com/standenboy/school/tree/master/comp/lucas-standen-NEA/code/proto/ast}}
+
+\subsection{Feedback}
+From my first Client (Amy C), she said that putting the numbers inside square brackets
+was inconvenient and annoying and it would be better if the numbers were separated
+by spaces instead of separate square bracket surrounded literals.
+
+As this is a prototype I won't fix this issue, however in the actual language this is
+a needed feature that I will be implementing.
+
+\subsection{Mixing linked lists and AST's}
+Mixing these 2 data structures together you can represents an entire program. A linked
+list of AST's is how Zippy will represent all code the user writes. To do this, use a
+linked list, and in the data element put a AST, then the next node can contain the same.
+This might be a help to zippy as the compiler can convert all code to an AST, then
+compile it.
+}
+\end{document}
+