summaryrefslogtreecommitdiff
path: root/comp/lucas-standen-NEA/writeup2/writeup.tex
blob: 3eba42e4e25ff7a3ce9d8554262f567b2df630c4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
\documentclass[a4paper,12pt]{article}

\usepackage{geometry}
\usepackage{titling}
\usepackage{titlesec}
\usepackage[english]{babel}
\usepackage[hidelinks]{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{graphicx}
\usepackage{forest}
\usepackage{tikz-qtree}

\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}

\lstdefinestyle{mystyle}{
    backgroundcolor=\color{backcolour},   
    commentstyle=\color{codegreen},
    keywordstyle=\color{magenta},
    numberstyle=\tiny\color{codegray},
    stringstyle=\color{codepurple},
    basicstyle=\ttfamily\footnotesize,
    breakatwhitespace=false,         
    breaklines=true,                 
    captionpos=b,                    
    keepspaces=true,                 
    numbers=left,                    
    numbersep=5pt,                  
    showspaces=false,                
    showstringspaces=false,
    showtabs=false,                  
    tabsize=8
}


\lstset{style=mystyle}

\titleformat{\section}
{\Huge}
{}
{0em}
{}[\titlerule]
\geometry{
	a4paper,
	total={170mm,257mm},
	left=20mm,
	right=20mm,
}

\author{Lucas Standen}
\title{The solution to bad code}

\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage

\setlength{\parskip}{1em}
{\setlength{\parindent}{0cm}

\section{A breif head note and introduction}
This document has been written for the use within AQA computer science 
Alevel coursework. It is published under the MiT license, which I hope
will make it of use to someone other than myself some day on the in's
and out's of simpler compiler design and programming languages as a
whole.

This is the second version of this document, it was writen in GNU
roff before, however I decided to move over to latex for the
more moddern features (notably image support). The latex source for
this document is also avalible along with all referenced code at 
\url{https://github.com/standenboy/school}

Any questions relating to this document should be sent to 
\href{mailto:thing1@seacrossedlovers.xyz}{thing1@seacrossedlovers.xyz}

\section{Analysis}
\subsection{The current problem}
For general small and simple projects, I write in C. However this leads to
hours of debugging due to segfaults, and memory leaks. Due to the languages 
manual memory management the programmer is required to know so much 
information about the hardware they write for, and the second anything goes
wrong, it is vague on how to fix things.

\textbf{I need a language that stops me from shooting myself in the foot}

C has been standard for many decades now and its age is showing, it lacks
many modern features like OOP, or higher level functional abstractions, 
that have become common in modern years due to there helpfulness. This is 
not to fault C's achievements either, the language is my personal choice for
most projects for a reason, it's fast and powerful; any solution I make
should not cut that away.

\subsection{A solution}

\textbf{\textit{Zippy LANG}}

A next generation language, for general use. Designed for keeping code simple,
neat and readable. It will be similar to functional languages, known for 
there strict ability to keep code safe and practical. The language should 
be compiled like C/C++, Haskell and Rust for fast runtime speeds

The goal of Zippy is to make codding easier, while not limiting projects; to 
achive this the compiler will most likely want to use some form of middle man
language to achieve compatibility with more libraries.

\subsection{What is a programming language}
\subsubsection{A very simple explanation}
At its lowest definition a PL is a set of specific words, that when given
to a computer in the right order have a reproducible behaviour. A more human 
way of saying that, would be "It's how we control computers".
\subsubsection{Why are there so many}
When someone is looking at code it can often be seen as just that, however
there are hundreds of languages that all take the idea of "code" in very 
different ways. Some are designed for specific hardware, some are designed 
for making general use programs while others are highly specialized.
It is important to see "code", as more than just one overarching term and
instead see where the code is being used, and evaluate it from that.

\subsection{Researching and getting a scope of the project}
Before I start to design a language i should first find examples of others
and find what i want my language to be like. I'd like my language to feel modern 
so i should take inspiration from what other modern languages do, however on the 
backed i want my language to be  stable and fast, for that i should look at 
older projects.

\subsubsection{Examples of older similar projects}
\begin{description}
	\item[Python] 
		Python is a high level OOP language that was designed in 
		1991. It was made to make programming easy while still being able 
		to use some of C's functions. Although it has become standard for 
		many use cases, it is slow and inefficient, and very bloated.

		\url{https://www.python.org/}

		Zippy should take pythons high level abstractions, as they make 
		programming very easy and it should try and take notes from its 
		libraries as they are mostly well written,
		and well documented.
	\item[Lisp] 
		Lisp is the second ever programming language, developed at MiT, 
		and it is the first functional language, creating many common features 
		like higher order functions, recursion, and garbage collection. It is
		generally not used any more as it feels old compared to other functional
		languages, like Ocaml or Haskell.

		\url{https://lisp-lang.org/}

		Zippy should try to take alot from the syntax of lisp, () make 
		it easy to see what parts of code will effect what, and make 
		things easy to tokenize.
	\item[Perl] 
		Perl is scripting language designed for use in linux, when bash is 
		too slow, or not suited for the job. Perl is often described as the glue 
		of the universe (see xkcd \url{https://3d.xkcd.com/224/}. Its syntax is 
		quite strange however and it is slow. Making it poorly suited towards 
		general use.

		\url{https://www.perl.org/}

		Zippy should take from perls minimalism, it is a small language that is 
		of a similar size to bash or zsh, while feeling closer to python. 
		If Zippy can achieve a similar small size, while remaining
		powerful I will be happy with this outcome.
\end{description}
\subsubsection{Examples of newer similar projects}
\begin{description}
	\item[Gleam] 
		Gleam is a modern language releasing in the past 5 years. It is 
		highly functional, with no mutable data, no traditional loops. 
		Instead recursion can be used to replace alot of these features.
		Gleam compiles to erlang/Beam bytecode, much like java to the
		jvm, and doing this has made Gleam a highly scalable language with 
		good library support out the box.
	
		\url{https://gleam.run/}

		Zippy should take from the functional elements of Gleam, as they keep 
		programs safer, however Zippy should not remove all procedural 
		elements, as for loops are very helpful
	\item[Haskell] 
		Haskell is another modern functional language known for being
		very complicated, however incredibly powerful. Its syntax feels very 
		mathematical, and incredibly terse.

		\url{https://www.haskell.org/}

		Perhaps Zippy could learn from Haskell, as it provides functional and
		procedural elements, making it a well rounded language
	\item[Hare] 
		Hare was designed to be a 100 year language, and thus stability is 
		its main goal, it is not set to
		have a syntax change any time soon, and it has strong emphasis on 
		memory managment like C. It fits into the same part of the tech stack 
		as C, and thus it can be used for some very low level work.

		\url{https://harelang.org/}

		I think Zippy should have a strong emphasis on stability, much like Hare,
		too many times have I segfaulted due to a tiny mistake. Zippy should 
		also look to Hare's small size, you can buy a copy of Hare on a

		\textbf{SINGLE 3 1/2" FLOLPY}

		This is something I too should try to achieve.
\end{description}
\subsubsection{What should be taken away from these languages}
I was already leaning towards functional programming when I started this project however 
now I believe it's the only option for producing safe applications. Zippy will be
a functional language.

I also believe that I should take size of the compiler into account, as this is important 
for keeping the project manageable and maintanable.

And finally I think that syntax should be inspired by Lisp, although Lisp itself can be 
a messy language, with the right changes I am confident that I can make a attractive
language for the 21st century.

\subsection{Clients}
In a project of this nature, the Client is every programmer alive; which is a pretty 
large scope. To narrow this down as much as possible, I will interview a small handful
of people throughout the project, of different skill levels to get a good picture of
what people think of the project.
\subsubsection{Client 1: Amy C}
My first client is a friend of mine, Amy C, she is a confident programmer who has 
completed many complicated projects. I am choosing her as a client as she can give me
technical feed back on my project and its function/utility.
\subsubsection{Client 2: Rayn M}
Another friend of mine, Rayn M, is a technical computer user, however he does not know 
how to program at a high level. He will be a good client as he can show me how my
language looks to some one who doesn't understand the inside workings, helping me 
design the structure of the code to something useable for more people.
\subsubsection{Client 3: Myself}
I've wanted to take out a project like this for a long long time, and this is the 
perfect opportunity to do so, I will be assessing myself along the way of  this, 
building the project to my personal specification.

\subsection{Questionnaires}
It is important to get feedback from end users, so I will take multiple questionnaires 
throughout the project. I will then use them to slightly edit the requirements of my
project this should make the final outcome more helpful and what people want.
\subsubsection{Amy C, initial ideas}
\begin{description}
	\item[What do you find the most important in a language?]
		Speed, readability, debugging ease and disk space efficiency.
	\item[What tools are important for a language to have?] 
		IDE integration (things like tab complete and debugging tools), a 
		package manager, and the ability to interact with the user through the 
		command line easily.
	\item[What features do you like from other languages?] 
		The ability to pass the memory reference of an object or function and a
		collection of built-in or standard functions like "print", "split", 
		or "sort".
	\item[What do you want to program in this language?]
		Lightweight command line tools and web back ends.
	\item[Do you intend to use graphics in the programs you write?]
		Yes.
	\item[Would you prefer a language that focuses on ease of 
		use, or power of code?]
		I like a good balance between the two.
	\item[What were your last 3 projects?]
		A website, a small command-line tool and a midi keyboard (program runs 
		on a Raspberry Pi Pico).
	\item[How many languages would you use on a single project?]
		I try to use as little languages in a single project as possible, so i 
		could likely not use it in an existing project.
	\item[Do you care for low level control, or would you prefer 
		high level abstractions?]
		I think low-level control is very important, but high-level abstractions
		are convenient, so a good balance between the two is best.
	\item[Would you be happy to develop libraries for things that aren't 
		already implemented?]
		Potentially if it is simple enough to implement new things.
\end{description}
\subsubsection{Notes from questionnare 1}
Some of the key things that I'm taking away from this first questionnaire, are my 
client/users initial needs and use cases. I think it's clear my language can be of
assistance to my client, Zippy will be a good language for web back ends and 
small command line tools, which my client expressed interested in.

I find the fact my client is worried by executable size interesting, however
I doubt it will be an issue; a ballooning code-base is unlikely as only one 
person is writing the whole project.

I am also taking on the fact that my client wants good command line tools,
so a pkg-manager and bundler should be a priority, perhaps they could be 
written in Zippy after the compiler is done.

\subsection{The first elements of the project}
At this stage I can say that I'm confident in my project and its scope. I
have a goal in mind for
it.

\textbf{The key things to take away from this section are:}

\begin{itemize}
	\item
		Make a high level language with a useable set of features, to
		replace C in many situations.

	\item
		Keep the language readable and easy, with powerful tools available.

	\item
		Ensure the language is well supported with tools like a pkg-manager.
\end{itemize}

\section{Modelling}
In larger projects, when a programmer needs a data structure that the language
they are writing in doesn't provide, they will need to make their own. This can pose a
challenge to some, especially in low level languages which don't provide anything 
out of the box. 

Bellow are a few examples of these data structures that C doesn't already provide,
that I may use in my project.
\subsection{Linked lists}
this is an alternative implementation of a list, where you store some data, and 
the memory address to the next node. Then you can move through the list by reading 
the data then reading the data of the next node, and then repeating until the 
'next' part of the node is empty.

A diagram showing this can be seen here:

\begin{tikzpicture}
	\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
	\Tree 
	[.ll
		[.data
		]
		[.next
			[.ll
				[.data
				]
				[.next
					[.ll
						[.data
						]
						[.next
						]
					]
				]
			]
		]
	]
\end{tikzpicture}

In C this is easy to implement as you can find a memory address very easily with to 
find where a bit of data is stored in memory (address of). I will need to use a 'struct', 
which is a bit like a class in C (however you can't attach a function to it, nor use 
inheritance). A simple implementation looks like this:

\begin{lstlisting}[language=C++, caption=Linked list example]
typedef struct ll {
        void *data; // the data of the node
        ll *next; // the next node
} ll;
\end{lstlisting}

The pro's of a linked list are the fact that they can have data appended to the start or 
end easily by changing the root node, or the next node.

Linked lists have a few downsides, for example you can't move through them backwards, 
and unless you store it on its own, you cant find the length of it in a fast way.

In my project I would like to use linked list in the AST (see later sections for info), 
and perhaps to store lists in the language.

\subsection{Dictionaries}
A dictionary is a simple data structure that just stores, a bit of data, and a number or 
string to identify it.
A dictionary like a linked list can be implemented with a struct in c like so:
\begin{lstlisting}[language=C++, caption=Dictionary example]
typedef struct dict {
        void *data;
        int id;
} dict;
\end{lstlisting}

In my project I could use this to hold variables and functions which need to be 
checked and looked up, which is very slow when comparing entire strings, but with this
I can compare integer ID's which is much faster.

\subsection{Prototyping harder features}
\subsubsection{Abstract syntax trees (AST's) theory}
In a programming language many abstract data types will be used to allow the code 
to compile and execute, however I think the hardest part of this is an abstract 
syntax tree. This is a data structure that holds the code in an ordered form that 
can be analysed and  executed in a simple way. It is a tree structure, with the top 
node being a root and all lower nodes being things needed to calculate the root. It can 
be used to show mathematical expressions and function calls, but I thing easiest way to 
show it is via a mathematical example.

Take the follow expression for example:

{\Large{\(1 + (10 * (3 - (2 * 4)))\)}}

We know that this is equal to \(-49\)

However for a computer this is far harder to understand. This is because it has no 
understanding of order of operation.

To solve this we use an AST.

When you solve that expression you know to start with 
\((2 * 4)\), then \(3 -\) 
from that, following the rules of BIDMAS to solve.

We can represent the steps as a tree like so:

\begin{tikzpicture}
	\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
	\Tree 
	[.+
		[.1
		]
		[.*
			[.10
			]
			[.-
				[.3
				]
				[.*
					[.2
					]
					[.4
					]
				]
			]
		]
	]
\end{tikzpicture}
\\
This will evaluate \(1 + (10 * (3 - (2 * 4)))\)

As you can see, you need to evaluate the expression in the most brackets
first, then the next, and so on, working you way up.

You can evaluate code in a similar way, treating each operation (such as +-*/)
as functions, doing the most deeply nested function first, then working up. 
Each expression can be represented in this tree, then to show a whole program you 
can create a list of trees.

\subsubsection{Abstract syntax trees (AST's) practical}
As a prototype i will make a program that can take mathematical expressions and evaluate 
them, and allowing for functions (in the form f(x)). It will do this via AST's.

This prototype takes 173 lines of code, it takes a string as a cmd line argument then 
converts it into an abstract syntax tree, and finally it executes it. This is just a
simple prototype and thus it is small in scope. It can only do simple operators (+-*/) 
and requires literal values to be surrounded by [] so it knows its not another 
expression to evaluate.

\lstinputlisting[language=C++]{../code/proto/AST/ast.c}
\textit{The main loop for the ast code.}

\lstinputlisting[language=C++]{../code/proto/AST/astg.c}
\textit{The execution loop for the ast code.}

\lstinputlisting[language=C++]{../code/proto/AST/astg.h}
\textit{The definition of the ast, and function prototypes.}

Above is the code for the AST, it stores an operation (which is just an integer), and 
it stores a real left and real right value, along side two other nodes. The real values
are integers, this would be the 2 numbers in reference, in the expression. The 2 nodes are a
recursive data structure, much like putting an object of a class inside the definition of that class
itself. They are used to store values that may still be expressions, for example 
(+ [1] (+ [1] [1])) the second part of this expression would be in the "right" 
variable. 

When code is executed I can check if "left", or "right" are NULL and if 
they are I know that I am at the lowest expression that is only literal values.
Then I can execute that node and work my way up the tree.

The exec function will execute the operation, unless there is a deeper node, if there is 
a deeper node, then it executes it, and places the result in the right or left spot 
respectively.

\textbf{Here is an example input and output:}

 ./ast "(+ (- [3] [1]) (- [3] [1]))"

4

{\small Note the [ ] used to tell the program where the literal values are.}

Overall this was a relatively successful prototype, however it isn't fully functional 
as a language but it has fit the design for a prototype.

\textbf{The code for the AST can be found here:
\url{https://github.com/standenboy/school/tree/master/comp/lucas-standen-NEA/code/proto/ast}}

\subsection{Feedback}
From my first Client (Amy C), she said that putting the numbers inside square brackets 
was inconvenient and annoying and it would be better if the numbers were separated
by spaces instead of separate square bracket surrounded literals.

As this is a prototype I won't fix this issue, however in the actual language this is 
a needed feature that I will be implementing.

\subsection{Mixing linked lists and AST's}
Mixing these 2 data structures together you can represents an entire program. A linked 
list of AST's is how Zippy will represent all code the user writes. To do this, use a 
linked list, and in the data element put a AST, then the next node can contain the same. 
This might be a help to zippy as the compiler can convert all code to an AST, then 
compile it.
}
\section{Objectives}
Zippy must support the following features, it needs them to be a usable language that has 
many uses.
\subsection{Core objectives}
\begin{description}
	\item[A compiler for the Zippy language]
	\item[AST's used to compile source code]
	\item[A lisp like syntax]
	\item[Functional paradigm language]
	\item[Recursion]
	\item[Higher order functions] \textit{(this means functions can be passed as arguments to 
		other functions)}
	\item[High performance language]
	\item[A package manager]
	\item[Ability to call C functions]
\end{description}
If possible I would like Zippy to also meet the following extra objectives
\subsection{Extra objectives}
\begin{description}
	\item[String parsing in the stdlib]
	\item[graphs in the stdlib]
	\item[networking in the stdlib]
	\item[graphics in the stdlib]
\end{description}

\section{Design}
\subsection{Language specification}
Like any other programming language Zippy needs to have a defined syntax, bellow
you can find a syntax for zippy that will be complaint with my objectives.

\end{document}