diff options
Diffstat (limited to 'comp/lucas-standen-NEA/writeup2/writeup.tex')
-rw-r--r-- | comp/lucas-standen-NEA/writeup2/writeup.tex | 523 |
1 files changed, 523 insertions, 0 deletions
diff --git a/comp/lucas-standen-NEA/writeup2/writeup.tex b/comp/lucas-standen-NEA/writeup2/writeup.tex new file mode 100644 index 0000000..7427570 --- /dev/null +++ b/comp/lucas-standen-NEA/writeup2/writeup.tex @@ -0,0 +1,523 @@ +\documentclass[a4paper,12pt]{article} + +\usepackage{geometry} +\usepackage{titling} +\usepackage{titlesec} +\usepackage[english]{babel} +\usepackage[hidelinks]{hyperref} +\usepackage{listings} +\usepackage{xcolor} +\usepackage{graphicx} +\usepackage{forest} +\usepackage{tikz-qtree} + +\definecolor{codegreen}{rgb}{0,0.6,0} +\definecolor{codegray}{rgb}{0.5,0.5,0.5} +\definecolor{codepurple}{rgb}{0.58,0,0.82} +\definecolor{backcolour}{rgb}{0.95,0.95,0.92} + +\lstdefinestyle{mystyle}{ + backgroundcolor=\color{backcolour}, + commentstyle=\color{codegreen}, + keywordstyle=\color{magenta}, + numberstyle=\tiny\color{codegray}, + stringstyle=\color{codepurple}, + basicstyle=\ttfamily\footnotesize, + breakatwhitespace=false, + breaklines=true, + captionpos=b, + keepspaces=true, + numbers=left, + numbersep=5pt, + showspaces=false, + showstringspaces=false, + showtabs=false, + tabsize=8 +} + +\lstset{style=mystyle} + +\titleformat{\section} +{\Huge} +{} +{0em} +{}[\titlerule] +\geometry{ + a4paper, + total={170mm,257mm}, + left=20mm, + right=20mm, +} + +\author{Lucas Standen} +\title{The solution to bad code} + +\begin{document} +\maketitle +\newpage +\tableofcontents +\newpage + +{\setlength{\parindent}{0cm} + +\section{A breif head note and introduction} +This document has been written for the use within AQA computer science +Alevel coursework. It is published under the MiT license, which I hope +will make it of use to someone other than myself some day on the in's +and out's of simpler compiler design and programming languages as a +whole. + +This is the second version of this document, it was writen in GNU +roff before, however I decided to move over to latex for the +more moddern features (notably image support). The latex source for +this document is also avalible along with all referenced code at +\url{https://github.com/standenboy/school} + +Any questions relating to this document should be sent to +\href{mailto:thing1@seacrossedlovers.xyz}{thing1@seacrossedlovers.xyz} + +\section{Analysis} +\subsection{The current problem} +For general small and simple projects, I write in C. However this leads to +hours of debugging due to segfaults, and memory leaks. Due to the languages +manual memory management the programmer is required to know so much +information about the hardware they write for, and the second anything goes +wrong, it is vague on how to fix things. + +\textbf{I need a language that stops me from shooting myself in the foot} + +C has been standard for many decades now and its age is showing, it lacks +many modern features like OOP, or higher level functional abstractions, +that have become common in modern years due to there helpfulness. This is +not to fault C's achievements either, the language is my personal choice for +most projects for a reason, it's fast and powerful; any solution I make +should not cut that away. + +\subsection{A solution} + +\textbf{\textit{Zippy LANG}} + +A next generation language, for general use. Designed for keeping code simple, +neat and readable. It will be similar to functional languages, known for +there strict ability to keep code safe and practical. The language should +be compiled like C/C++, Haskell and Rust for fast runtime speeds + +The goal of Zippy is to make codding easier, while not limiting projects; to +achive this the compiler will most likely want to use some form of middle man +language to achieve compatibility with more libraries. + +\subsection{What is a programming language} +\subsubsection{A very simple explanation} +At its lowest definition a PL is a set of specific words, that when given +to a computer in the right order have a reproducible behaviour. A more human +way of saying that, would be "It's how we control computers". +\subsubsection{Why are there so many} +When someone is looking at code it can often be seen as just that, however +there are hundreds of languages that all take the idea of "code" in very +different ways. Some are designed for specific hardware, some are designed +for making general use programs while others are highly specialized. +It is important to see "code", as more than just one overarching term and +instead see where the code is being used, and evaluate it from that. + +\subsection{Researching and getting a scope of the project} +Before I start to design a language i should first find examples of others +and find what i want my language to be like. I'd like my language to feel modern +so i should take inspiration from what other modern languages do, however on the +backed i want my language to be stable and fast, for that i should look at +older projects. + +\subsubsection{Examples of older similar projects} +\begin{description} + \item[Python] + Python is a high level OOP language that was designed in + 1991. It was made to make programming easy while still being able + to use some of C's functions. Although it has become standard for + many use cases, it is slow and inefficient, and very bloated. + + \url{https://www.python.org/} + + Zippy should take pythons high level abstractions, as they make + programming very easy and it should try and take notes from its + libraries as they are mostly well written, + and well documented. + \item[Lisp] + Lisp is the second ever programming language, developed at MiT, + and it is the first functional language, creating many common features + like higher order functions, recursion, and garbage collection. It is + generally not used any more as it feels old compared to other functional + languages, like Ocaml or Haskell. + + \url{https://lisp-lang.org/} + + Zippy should try to take alot from the syntax of lisp, () make + it easy to see what parts of code will effect what, and make + things easy to tokenize. + \item[Perl] + Perl is scripting language designed for use in linux, when bash is + too slow, or not suited for the job. Perl is often described as the glue + of the universe (see xkcd \url{https://3d.xkcd.com/224/}. Its syntax is + quite strange however and it is slow. Making it poorly suited towards + general use. + + \url{https://www.perl.org/} + + Zippy should take from perls minimalism, it is a small language that is + of a similar size to bash or zsh, while feeling closer to python. + If Zippy can achieve a similar small size, while remaining + powerful I will be happy with this outcome. +\end{description} +\subsubsection{Examples of newer similar projects} +\begin{description} + \item[Gleam] + Gleam is a modern language releasing in the past 5 years. It is + highly functional, with no mutable data, no traditional loops. + Instead recursion can be used to replace alot of these features. + Gleam compiles to erlang/Beam bytecode, much like java to the + jvm, and doing this has made Gleam a highly scalable language with + good library support out the box. + + \url{https://gleam.run/} + + Zippy should take from the functional elements of Gleam, as they keep + programs safer, however Zippy should not remove all procedural + elements, as for loops are very helpful + \item[Haskell] + Haskell is another modern functional language known for being + very complicated, however incredibly powerful. Its syntax feels very + mathematical, and incredibly terse. + + \url{https://www.haskell.org/} + + Perhaps Zippy could learn from Haskell, as it provides functional and + procedural elements, making it a well rounded language + \item[Hare] + Hare was designed to be a 100 year language, and thus stability is + its main goal, it is not set to + have a syntax change any time soon, and it has strong emphasis on + memory managment like C. It fits into the same part of the tech stack + as C, and thus it can be used for some very low level work. + + \url{https://harelang.org/} + + I think Zippy should have a strong emphasis on stability, much like Hare, + too many times have I segfaulted due to a tiny mistake. Zippy should + also look to Hare's small size, you can buy a copy of Hare on a + + \textbf{SINGLE 3 1/2" FLOLPY} + + This is something I too should try to achieve. +\end{description} +\subsubsection{What should be taken away from these languages} +I was already leaning towards functional programming when I started this project however +now I believe it's the only option for producing safe applications. Zippy will be +a functional language. + +I also believe that I should take size of the compiler into account, as this is important +for keeping the project manageable and maintanable. + +And finally I think that syntax should be inspired by Lisp, although Lisp itself can be +a messy language, with the right changes I am confident that I can make a attractive +language for the 21st century. + +\subsection{Clients} +In a project of this nature, the Client is every programmer alive; which is a pretty +large scope. To narrow this down as much as possible, I will interview a small handful +of people throughout the project, of different skill levels to get a good picture of +what people think of the project. +\subsubsection{Client 1: Amy C} +My first client is a friend of mine, Amy C, she is a confident programmer who has +completed many complicated projects. I am choosing her as a client as she can give me +technical feed back on my project and its function/utility. +\subsubsection{Client 2: Rayn M} +Another friend of mine, Rayn M, is a technical computer user, however he does not know +how to program at a high level. He will be a good client as he can show me how my +language looks to some one who doesn't understand the inside workings, helping me +design the structure of the code to something useable for more people. +\subsubsection{Client 3: Myself} +I've wanted to take out a project like this for a long long time, and this is the +perfect opportunity to do so, I will be assessing myself along the way of this, +building the project to my personal specification. + +\subsection{Questionnaires} +It is important to get feedback from end users, so I will take multiple questionnaires +throughout the project. I will then use them to slightly edit the requirements of my +project this should make the final outcome more helpful and what people want. +\subsubsection{Amy C, initial ideas} +\begin{description} + \item[What do you find the most important in a language?] + Speed, readability, debugging ease and disk space efficiency. + \item[What tools are important for a language to have?] + IDE integration (things like tab complete and debugging tools), a + package manager, and the ability to interact with the user through the + command line easily. + \item[What features do you like from other languages?] + The ability to pass the memory reference of an object or function and a + collection of built-in or standard functions like "print", "split", + or "sort". + \item[What do you want to program in this language?] + Lightweight command line tools and web back ends. + \item[Do you intend to use graphics in the programs you write?] + Yes. + \item[Would you prefer a language that focuses on ease of + use, or power of code?] + I like a good balance between the two. + \item[What were your last 3 projects?] + A website, a small command-line tool and a midi keyboard (program runs + on a Raspberry Pi Pico). + \item[How many languages would you use on a single project?] + I try to use as little languages in a single project as possible, so i + could likely not use it in an existing project. + \item[Do you care for low level control, or would you prefer + high level abstractions?] + I think low-level control is very important, but high-level abstractions + are convenient, so a good balance between the two is best. + \item[Would you be happy to develop libraries for things that aren't + already implemented?] + Potentially if it is simple enough to implement new things. +\end{description} +\subsubsection{Notes from questionnare 1} +Some of the key things that I'm taking away from this first questionnaire, are my +client/users initial needs and use cases. I think it's clear my language can be of +assistance to my client, Zippy will be a good language for web back ends and +small command line tools, which my client expressed interested in. + +I find the fact my client is worried by executable size interesting, however +I doubt it will be an issue; a ballooning code-base is unlikely as only one +person is writing the whole project. + +I am also taking on the fact that my client wants good command line tools, +so a pkg-manager and bundler should be a priority, perhaps they could be +written in Zippy after the compiler is done. + +\subsection{The first elements of the project} +At this stage I can say that I'm confident in my project and its scope. I +have a goal in mind for +it. + +\textbf{The key things to take away from this section are:} + +\begin{itemize} + \item + Make a high level language with a useable set of features, to + replace C in many situations. + + \item + Keep the language readable and easy, with powerful tools available. + + \item + Ensure the language is well supported with tools like a pkg-manager. +\end{itemize} + +\section{Modelling} +In larger projects, when a programmer needs a data structure that the language +they are writing in doesn't provide, they will need to make their own. This can pose a +challenge to some, especially in low level languages which don't provide anything +out of the box. + +Bellow are a few examples of these data structures that C doesn't already provide, +that I may use in my project. +\subsection{Linked lists} +this is an alternative implementation of a list, where you store some data, and +the memory address to the next node. Then you can move through the list by reading +the data then reading the data of the next node, and then repeating until the +'next' part of the node is empty. +\\ + +A diagram showing this can be seen here: +\\ + +\begin{tikzpicture} + \tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}} + \Tree + [.ll + [.data + ] + [.next + [.ll + [.data + ] + [.next + [.ll + [.data + ] + [.next + ] + ] + ] + ] + ] + ] +\end{tikzpicture} + +In C this is easy to implement as you can find a memory address very easily with to +find where a bit of data is stored in memory (address of). I will need to use a 'struct', +which is a bit like a class in C (however you can't attach a function to it, nor use +inheritance). A simple implementation looks like this: + +\begin{lstlisting}[language=C++, caption=Linked list example] +typedef struct ll { + void *data; // the data of the node + ll *next; // the next node +} ll; +\end{lstlisting} + +The pro's of a linked list are the fact that they can have data appended to the start or +end easily by changing the root node, or the next node. + +Linked lists have a few downsides, for example you can't move through them backwards, +and unless you store it on its own, you cant find the length of it in a fast way. + +In my project I would like to use linked list in the AST (see later sections for info), +and perhaps to store lists in the language. + +\subsection{Dictionaries} +A dictionary is a simple data structure that just stores, a bit of data, and a number or +string to identify it. +A dictionary like a linked list can be implemented with a struct in c like so: +\begin{lstlisting}[language=C++, caption=Dictionary example] +typedef struct dict { + void *data; + int id; +} dict; +\end{lstlisting} + +In my project I could use this to hold variables and functions which need to be +checked and looked up, which is very slow when comparing entire strings, but with this +I can compare integer ID's which is much faster. + +\subsection{Prototyping harder features} +\subsubsection{Abstract syntax trees (AST's) theory} +In a programming language many abstract data types will be used to allow the code +to compile and execute, however I think the hardest part of this is an abstract +syntax tree. This is a data structure that holds the code in an ordered form that +can be analysed and executed in a simple way. It is a tree structure, with the top +node being a root and all lower nodes being things needed to calculate the root. It can +be used to show mathematical expressions and function calls, but I thing easiest way to +show it is via a mathematical example. + +Take the follow expression for example: +\\ + +{\Large{\(1 + (10 * (3 - (2 * 4)))\)}} +\\ + +We know that this is equal to \(-49\) + +However for a computer this is far harder to understand. This is because it has no +understanding of order of operation. + +To solve this we use an AST. + +When you solve that expression you know to start with +\((2 * 4)\), then \(3 -\) +from that, following the rules of BIDMAS to solve. + +We can represent the steps as a tree like so: + +\begin{tikzpicture} + \tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}} + \Tree + [.+ + [.1 + ] + [.* + [.10 + ] + [.- + [.3 + ] + [.* + [.2 + ] + [.4 + ] + ] + ] + ] + ] +\end{tikzpicture} +\\ +This will evaluate \(1 + (10 * (3 - (2 * 4)))\) + +As you can see, you need to evaluate the expression in the most brackets +first, then the next, and so on, working you way up. + +You can evaluate code in a similar way, treating each operation (such as +-*/) +as functions, doing the most deeply nested function first, then working up. +Each expression can be represented in this tree, then to show a whole program you +can create a list of trees. + +\subsubsection{Abstract syntax trees (AST's) practical} +As a prototype i will make a program that can take mathematical expressions and evaluate +them, and allowing for functions (in the form f(x)). It will do this via AST's. + +This prototype takes 173 lines of code, it takes a string as a cmd line argument then +converts it into an abstract syntax tree, and finally it executes it. This is just a +simple prototype and thus it is small in scope. It can only do simple operators (+-*/) +and requires literal values to be surrounded by [] so it knows its not another +expression to evaluate. + +\lstinputlisting[language=C++]{../code/proto/AST/ast.c} +\textit{The main loop for the ast code.} +\\ + +\lstinputlisting[language=C++]{../code/proto/AST/astg.c} +\textit{The execution loop for the ast code.} +\\ + +\lstinputlisting[language=C++]{../code/proto/AST/astg.h} +\textit{The definition of the ast, and function prototypes.} +\\ + +Above is the code for the AST, it stores an operation (which is just an integer), and +it stores a real left and real right value, along side two other nodes. The real values +are integers, this would be the 2 numbers in reference, in the expression. The 2 nodes are a +recursive data structure, much like putting an object of a class inside the definition of that class +itself. They are used to store values that may still be expressions, for example +(+ [1] (+ [1] [1])) the second part of this expression would be in the "right" +variable. +\\ + +When code is executed I can check if "left", or "right" are NULL and if +they are I know that I am at the lowest expression that is only literal values. +Then I can execute that node and work my way up the tree. +\\ + +The exec function will execute the operation, unless there is a deeper node, if there is +a deeper node, then it executes it, and places the result in the right or left spot +respectively. +\\ + +\textbf{Here is an example input and output:} + + ./ast "(+ (- [3] [1]) (- [3] [1]))" + +4 + +{\small Note the [ ] used to tell the program where the literal values are.} +\\ + +Overall this was a relatively successful prototype, however it isn't fully functional +as a language but it has fit the design for a prototype. +\\ + +\textbf{The code for the AST can be found here: +\url{https://github.com/standenboy/school/tree/master/comp/lucas-standen-NEA/code/proto/ast}} + +\subsection{Feedback} +From my first Client (Amy C), she said that putting the numbers inside square brackets +was inconvenient and annoying and it would be better if the numbers were separated +by spaces instead of separate square bracket surrounded literals. + +As this is a prototype I won't fix this issue, however in the actual language this is +a needed feature that I will be implementing. + +\subsection{Mixing linked lists and AST's} +Mixing these 2 data structures together you can represents an entire program. A linked +list of AST's is how Zippy will represent all code the user writes. To do this, use a +linked list, and in the data element put a AST, then the next node can contain the same. +This might be a help to zippy as the compiler can convert all code to an AST, then +compile it. +} +\end{document} + |