1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
|
\documentclass[a4paper,12pt]{article}
\usepackage{geometry}
\usepackage{titling}
\usepackage{titlesec}
\usepackage[english]{babel}
\usepackage[hidelinks]{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{graphicx}
\usepackage{forest}
\usepackage{tikz-qtree}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\ttfamily\footnotesize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=8
}
\lstset{style=mystyle}
\titleformat{\section}
{\Huge}
{}
{0em}
{}[\titlerule]
\geometry{
a4paper,
total={170mm,257mm},
left=20mm,
right=20mm,
}
\author{Lucas Standen}
\title{The solution to bad code}
\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage
{\setlength{\parindent}{0cm}
\section{A breif head note and introduction}
This document has been written for the use within AQA computer science
Alevel coursework. It is published under the MiT license, which I hope
will make it of use to someone other than myself some day on the in's
and out's of simpler compiler design and programming languages as a
whole.
This is the second version of this document, it was writen in GNU
roff before, however I decided to move over to latex for the
more moddern features (notably image support). The latex source for
this document is also avalible along with all referenced code at
\url{https://github.com/standenboy/school}
Any questions relating to this document should be sent to
\href{mailto:thing1@seacrossedlovers.xyz}{thing1@seacrossedlovers.xyz}
\section{Analysis}
\subsection{The current problem}
For general small and simple projects, I write in C. However this leads to
hours of debugging due to segfaults, and memory leaks. Due to the languages
manual memory management the programmer is required to know so much
information about the hardware they write for, and the second anything goes
wrong, it is vague on how to fix things.
\textbf{I need a language that stops me from shooting myself in the foot}
C has been standard for many decades now and its age is showing, it lacks
many modern features like OOP, or higher level functional abstractions,
that have become common in modern years due to there helpfulness. This is
not to fault C's achievements either, the language is my personal choice for
most projects for a reason, it's fast and powerful; any solution I make
should not cut that away.
\subsection{A solution}
\textbf{\textit{Zippy LANG}}
A next generation language, for general use. Designed for keeping code simple,
neat and readable. It will be similar to functional languages, known for
there strict ability to keep code safe and practical. The language should
be compiled like C/C++, Haskell and Rust for fast runtime speeds
The goal of Zippy is to make codding easier, while not limiting projects; to
achive this the compiler will most likely want to use some form of middle man
language to achieve compatibility with more libraries.
\subsection{What is a programming language}
\subsubsection{A very simple explanation}
At its lowest definition a PL is a set of specific words, that when given
to a computer in the right order have a reproducible behaviour. A more human
way of saying that, would be "It's how we control computers".
\subsubsection{Why are there so many}
When someone is looking at code it can often be seen as just that, however
there are hundreds of languages that all take the idea of "code" in very
different ways. Some are designed for specific hardware, some are designed
for making general use programs while others are highly specialized.
It is important to see "code", as more than just one overarching term and
instead see where the code is being used, and evaluate it from that.
\subsection{Researching and getting a scope of the project}
Before I start to design a language i should first find examples of others
and find what i want my language to be like. I'd like my language to feel modern
so i should take inspiration from what other modern languages do, however on the
backed i want my language to be stable and fast, for that i should look at
older projects.
\subsubsection{Examples of older similar projects}
\begin{description}
\item[Python]
Python is a high level OOP language that was designed in
1991. It was made to make programming easy while still being able
to use some of C's functions. Although it has become standard for
many use cases, it is slow and inefficient, and very bloated.
\url{https://www.python.org/}
Zippy should take pythons high level abstractions, as they make
programming very easy and it should try and take notes from its
libraries as they are mostly well written,
and well documented.
\item[Lisp]
Lisp is the second ever programming language, developed at MiT,
and it is the first functional language, creating many common features
like higher order functions, recursion, and garbage collection. It is
generally not used any more as it feels old compared to other functional
languages, like Ocaml or Haskell.
\url{https://lisp-lang.org/}
Zippy should try to take alot from the syntax of lisp, () make
it easy to see what parts of code will effect what, and make
things easy to tokenize.
\item[Perl]
Perl is scripting language designed for use in linux, when bash is
too slow, or not suited for the job. Perl is often described as the glue
of the universe (see xkcd \url{https://3d.xkcd.com/224/}. Its syntax is
quite strange however and it is slow. Making it poorly suited towards
general use.
\url{https://www.perl.org/}
Zippy should take from perls minimalism, it is a small language that is
of a similar size to bash or zsh, while feeling closer to python.
If Zippy can achieve a similar small size, while remaining
powerful I will be happy with this outcome.
\end{description}
\subsubsection{Examples of newer similar projects}
\begin{description}
\item[Gleam]
Gleam is a modern language releasing in the past 5 years. It is
highly functional, with no mutable data, no traditional loops.
Instead recursion can be used to replace alot of these features.
Gleam compiles to erlang/Beam bytecode, much like java to the
jvm, and doing this has made Gleam a highly scalable language with
good library support out the box.
\url{https://gleam.run/}
Zippy should take from the functional elements of Gleam, as they keep
programs safer, however Zippy should not remove all procedural
elements, as for loops are very helpful
\item[Haskell]
Haskell is another modern functional language known for being
very complicated, however incredibly powerful. Its syntax feels very
mathematical, and incredibly terse.
\url{https://www.haskell.org/}
Perhaps Zippy could learn from Haskell, as it provides functional and
procedural elements, making it a well rounded language
\item[Hare]
Hare was designed to be a 100 year language, and thus stability is
its main goal, it is not set to
have a syntax change any time soon, and it has strong emphasis on
memory managment like C. It fits into the same part of the tech stack
as C, and thus it can be used for some very low level work.
\url{https://harelang.org/}
I think Zippy should have a strong emphasis on stability, much like Hare,
too many times have I segfaulted due to a tiny mistake. Zippy should
also look to Hare's small size, you can buy a copy of Hare on a
\textbf{SINGLE 3 1/2" FLOLPY}
This is something I too should try to achieve.
\end{description}
\subsubsection{What should be taken away from these languages}
I was already leaning towards functional programming when I started this project however
now I believe it's the only option for producing safe applications. Zippy will be
a functional language.
I also believe that I should take size of the compiler into account, as this is important
for keeping the project manageable and maintanable.
And finally I think that syntax should be inspired by Lisp, although Lisp itself can be
a messy language, with the right changes I am confident that I can make a attractive
language for the 21st century.
\subsection{Clients}
In a project of this nature, the Client is every programmer alive; which is a pretty
large scope. To narrow this down as much as possible, I will interview a small handful
of people throughout the project, of different skill levels to get a good picture of
what people think of the project.
\subsubsection{Client 1: Amy C}
My first client is a friend of mine, Amy C, she is a confident programmer who has
completed many complicated projects. I am choosing her as a client as she can give me
technical feed back on my project and its function/utility.
\subsubsection{Client 2: Rayn M}
Another friend of mine, Rayn M, is a technical computer user, however he does not know
how to program at a high level. He will be a good client as he can show me how my
language looks to some one who doesn't understand the inside workings, helping me
design the structure of the code to something useable for more people.
\subsubsection{Client 3: Myself}
I've wanted to take out a project like this for a long long time, and this is the
perfect opportunity to do so, I will be assessing myself along the way of this,
building the project to my personal specification.
\subsection{Questionnaires}
It is important to get feedback from end users, so I will take multiple questionnaires
throughout the project. I will then use them to slightly edit the requirements of my
project this should make the final outcome more helpful and what people want.
\subsubsection{Amy C, initial ideas}
\begin{description}
\item[What do you find the most important in a language?]
Speed, readability, debugging ease and disk space efficiency.
\item[What tools are important for a language to have?]
IDE integration (things like tab complete and debugging tools), a
package manager, and the ability to interact with the user through the
command line easily.
\item[What features do you like from other languages?]
The ability to pass the memory reference of an object or function and a
collection of built-in or standard functions like "print", "split",
or "sort".
\item[What do you want to program in this language?]
Lightweight command line tools and web back ends.
\item[Do you intend to use graphics in the programs you write?]
Yes.
\item[Would you prefer a language that focuses on ease of
use, or power of code?]
I like a good balance between the two.
\item[What were your last 3 projects?]
A website, a small command-line tool and a midi keyboard (program runs
on a Raspberry Pi Pico).
\item[How many languages would you use on a single project?]
I try to use as little languages in a single project as possible, so i
could likely not use it in an existing project.
\item[Do you care for low level control, or would you prefer
high level abstractions?]
I think low-level control is very important, but high-level abstractions
are convenient, so a good balance between the two is best.
\item[Would you be happy to develop libraries for things that aren't
already implemented?]
Potentially if it is simple enough to implement new things.
\end{description}
\subsubsection{Notes from questionnare 1}
Some of the key things that I'm taking away from this first questionnaire, are my
client/users initial needs and use cases. I think it's clear my language can be of
assistance to my client, Zippy will be a good language for web back ends and
small command line tools, which my client expressed interested in.
I find the fact my client is worried by executable size interesting, however
I doubt it will be an issue; a ballooning code-base is unlikely as only one
person is writing the whole project.
I am also taking on the fact that my client wants good command line tools,
so a pkg-manager and bundler should be a priority, perhaps they could be
written in Zippy after the compiler is done.
\subsection{The first elements of the project}
At this stage I can say that I'm confident in my project and its scope. I
have a goal in mind for
it.
\textbf{The key things to take away from this section are:}
\begin{itemize}
\item
Make a high level language with a useable set of features, to
replace C in many situations.
\item
Keep the language readable and easy, with powerful tools available.
\item
Ensure the language is well supported with tools like a pkg-manager.
\end{itemize}
\section{Modelling}
In larger projects, when a programmer needs a data structure that the language
they are writing in doesn't provide, they will need to make their own. This can pose a
challenge to some, especially in low level languages which don't provide anything
out of the box.
Bellow are a few examples of these data structures that C doesn't already provide,
that I may use in my project.
\subsection{Linked lists}
this is an alternative implementation of a list, where you store some data, and
the memory address to the next node. Then you can move through the list by reading
the data then reading the data of the next node, and then repeating until the
'next' part of the node is empty.
\\
A diagram showing this can be seen here:
\\
\begin{tikzpicture}
\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
\Tree
[.ll
[.data
]
[.next
[.ll
[.data
]
[.next
[.ll
[.data
]
[.next
]
]
]
]
]
]
\end{tikzpicture}
In C this is easy to implement as you can find a memory address very easily with to
find where a bit of data is stored in memory (address of). I will need to use a 'struct',
which is a bit like a class in C (however you can't attach a function to it, nor use
inheritance). A simple implementation looks like this:
\begin{lstlisting}[language=C++, caption=Linked list example]
typedef struct ll {
void *data; // the data of the node
ll *next; // the next node
} ll;
\end{lstlisting}
The pro's of a linked list are the fact that they can have data appended to the start or
end easily by changing the root node, or the next node.
Linked lists have a few downsides, for example you can't move through them backwards,
and unless you store it on its own, you cant find the length of it in a fast way.
In my project I would like to use linked list in the AST (see later sections for info),
and perhaps to store lists in the language.
\subsection{Dictionaries}
A dictionary is a simple data structure that just stores, a bit of data, and a number or
string to identify it.
A dictionary like a linked list can be implemented with a struct in c like so:
\begin{lstlisting}[language=C++, caption=Dictionary example]
typedef struct dict {
void *data;
int id;
} dict;
\end{lstlisting}
In my project I could use this to hold variables and functions which need to be
checked and looked up, which is very slow when comparing entire strings, but with this
I can compare integer ID's which is much faster.
\subsection{Prototyping harder features}
\subsubsection{Abstract syntax trees (AST's) theory}
In a programming language many abstract data types will be used to allow the code
to compile and execute, however I think the hardest part of this is an abstract
syntax tree. This is a data structure that holds the code in an ordered form that
can be analysed and executed in a simple way. It is a tree structure, with the top
node being a root and all lower nodes being things needed to calculate the root. It can
be used to show mathematical expressions and function calls, but I thing easiest way to
show it is via a mathematical example.
Take the follow expression for example:
\\
{\Large{\(1 + (10 * (3 - (2 * 4)))\)}}
\\
We know that this is equal to \(-49\)
However for a computer this is far harder to understand. This is because it has no
understanding of order of operation.
To solve this we use an AST.
When you solve that expression you know to start with
\((2 * 4)\), then \(3 -\)
from that, following the rules of BIDMAS to solve.
We can represent the steps as a tree like so:
\begin{tikzpicture}
\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
\Tree
[.+
[.1
]
[.*
[.10
]
[.-
[.3
]
[.*
[.2
]
[.4
]
]
]
]
]
\end{tikzpicture}
\\
This will evaluate \(1 + (10 * (3 - (2 * 4)))\)
As you can see, you need to evaluate the expression in the most brackets
first, then the next, and so on, working you way up.
You can evaluate code in a similar way, treating each operation (such as +-*/)
as functions, doing the most deeply nested function first, then working up.
Each expression can be represented in this tree, then to show a whole program you
can create a list of trees.
\subsubsection{Abstract syntax trees (AST's) practical}
As a prototype i will make a program that can take mathematical expressions and evaluate
them, and allowing for functions (in the form f(x)). It will do this via AST's.
This prototype takes 173 lines of code, it takes a string as a cmd line argument then
converts it into an abstract syntax tree, and finally it executes it. This is just a
simple prototype and thus it is small in scope. It can only do simple operators (+-*/)
and requires literal values to be surrounded by [] so it knows its not another
expression to evaluate.
\lstinputlisting[language=C++]{../code/proto/AST/ast.c}
\textit{The main loop for the ast code.}
\\
\lstinputlisting[language=C++]{../code/proto/AST/astg.c}
\textit{The execution loop for the ast code.}
\\
\lstinputlisting[language=C++]{../code/proto/AST/astg.h}
\textit{The definition of the ast, and function prototypes.}
\\
Above is the code for the AST, it stores an operation (which is just an integer), and
it stores a real left and real right value, along side two other nodes. The real values
are integers, this would be the 2 numbers in reference, in the expression. The 2 nodes are a
recursive data structure, much like putting an object of a class inside the definition of that class
itself. They are used to store values that may still be expressions, for example
(+ [1] (+ [1] [1])) the second part of this expression would be in the "right"
variable.
\\
When code is executed I can check if "left", or "right" are NULL and if
they are I know that I am at the lowest expression that is only literal values.
Then I can execute that node and work my way up the tree.
\\
The exec function will execute the operation, unless there is a deeper node, if there is
a deeper node, then it executes it, and places the result in the right or left spot
respectively.
\\
\textbf{Here is an example input and output:}
./ast "(+ (- [3] [1]) (- [3] [1]))"
4
{\small Note the [ ] used to tell the program where the literal values are.}
\\
Overall this was a relatively successful prototype, however it isn't fully functional
as a language but it has fit the design for a prototype.
\\
\textbf{The code for the AST can be found here:
\url{https://github.com/standenboy/school/tree/master/comp/lucas-standen-NEA/code/proto/ast}}
\subsection{Feedback}
From my first Client (Amy C), she said that putting the numbers inside square brackets
was inconvenient and annoying and it would be better if the numbers were separated
by spaces instead of separate square bracket surrounded literals.
As this is a prototype I won't fix this issue, however in the actual language this is
a needed feature that I will be implementing.
\subsection{Mixing linked lists and AST's}
Mixing these 2 data structures together you can represents an entire program. A linked
list of AST's is how Zippy will represent all code the user writes. To do this, use a
linked list, and in the data element put a AST, then the next node can contain the same.
This might be a help to zippy as the compiler can convert all code to an AST, then
compile it.
}
\end{document}
|