1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
|
\documentclass[a4paper,12pt]{article}
\usepackage{geometry}
\usepackage{titling}
\usepackage{titlesec}
\usepackage[english]{babel}
\usepackage[hidelinks]{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{graphicx}
\usepackage{forest}
\usepackage{tikz-qtree}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\ttfamily\footnotesize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=8
}
\lstset{style=mystyle}
\titleformat{\section}
{\Huge}
{}
{0em}
{}[\titlerule]
\geometry{
a4paper,
total={170mm,257mm},
left=20mm,
right=20mm,
}
\author{Lucas Standen}
\title{The solution to bad code}
\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage
\setlength{\parskip}{1em}
{\setlength{\parindent}{0cm}
\section{A breif head note and introduction}
This document has been written for the use within AQA computer science
Alevel coursework. It is published under the MiT license, which I hope
will make it of use to someone other than myself some day on the in's
and out's of simpler compiler design and programming languages as a
whole.
This is the second version of this document, it was writen in GNU
roff before, however I decided to move over to latex for the
more moddern features (notably image support). The latex source for
this document is also avalible along with all referenced code at
\url{https://github.com/standenboy/school}
Any questions relating to this document should be sent to
\href{mailto:thing1@seacrossedlovers.xyz}{thing1@seacrossedlovers.xyz}
\section{Analysis}
\subsection{The current problem}
For general small and simple projects, I write in C. However this leads to
hours of debugging due to segfaults, and memory leaks. Due to the languages
manual memory management the programmer is required to know so much
information about the hardware they write for, and the second anything goes
wrong, it is vague on how to fix things.
\textbf{I need a language that stops me from shooting myself in the foot}
C has been standard for many decades now and its age is showing, it lacks
many modern features like OOP, or higher level functional abstractions,
that have become common in modern years due to there helpfulness. This is
not to fault C's achievements either, the language is my personal choice for
most projects for a reason, it's fast and powerful; any solution I make
should not cut that away.
\subsection{A solution}
\textbf{\textit{Zippy LANG}}
A next generation language, for general use. Designed for keeping code simple,
neat and readable. It will be similar to functional languages, known for
there strict ability to keep code safe and practical. The language should
be compiled like C/C++, Haskell and Rust for fast runtime speeds
The goal of Zippy is to make codding easier, while not limiting projects; to
achive this the compiler will most likely want to use some form of middle man
language to achieve compatibility with more libraries.
\subsection{What is a programming language}
\subsubsection{A very simple explanation}
At its lowest definition a PL is a set of specific words, that when given
to a computer in the right order have a reproducible behaviour. A more human
way of saying that, would be "It's how we control computers".
\subsubsection{Why are there so many}
When someone is looking at code it can often be seen as just that, however
there are hundreds of languages that all take the idea of "code" in very
different ways. Some are designed for specific hardware, some are designed
for making general use programs while others are highly specialized.
It is important to see "code", as more than just one overarching term and
instead see where the code is being used, and evaluate it from that.
\subsection{Researching and getting a scope of the project}
Before I start to design a language i should first find examples of others
and find what i want my language to be like. I'd like my language to feel modern
so i should take inspiration from what other modern languages do, however on the
backed i want my language to be stable and fast, for that i should look at
older projects.
\subsubsection{Examples of older similar projects}
\begin{description}
\item[Python]
Python is a high level OOP language that was designed in
1991. It was made to make programming easy while still being able
to use some of C's functions. Although it has become standard for
many use cases, it is slow and inefficient, and very bloated.
\url{https://www.python.org/}
Zippy should take pythons high level abstractions, as they make
programming very easy and it should try and take notes from its
libraries as they are mostly well written,
and well documented.
\item[Lisp]
Lisp is the second ever programming language, developed at MiT,
and it is the first functional language, creating many common features
like higher order functions, recursion, and garbage collection. It is
generally not used any more as it feels old compared to other functional
languages, like Ocaml or Haskell.
\url{https://lisp-lang.org/}
Zippy should try to take alot from the syntax of lisp, () make
it easy to see what parts of code will effect what, and make
things easy to tokenize.
\item[Perl]
Perl is scripting language designed for use in linux, when bash is
too slow, or not suited for the job. Perl is often described as the glue
of the universe (see xkcd \url{https://3d.xkcd.com/224/}. Its syntax is
quite strange however and it is slow. Making it poorly suited towards
general use.
\url{https://www.perl.org/}
Zippy should take from perls minimalism, it is a small language that is
of a similar size to bash or zsh, while feeling closer to python.
If Zippy can achieve a similar small size, while remaining
powerful I will be happy with this outcome.
\end{description}
\subsubsection{Examples of newer similar projects}
\begin{description}
\item[Gleam]
Gleam is a modern language releasing in the past 5 years. It is
highly functional, with no mutable data, no traditional loops.
Instead recursion can be used to replace alot of these features.
Gleam compiles to erlang/Beam bytecode, much like java to the
jvm, and doing this has made Gleam a highly scalable language with
good library support out the box.
\url{https://gleam.run/}
Zippy should take from the functional elements of Gleam, as they keep
programs safer, however Zippy should not remove all procedural
elements, as for loops are very helpful
\item[Haskell]
Haskell is another modern functional language known for being
very complicated, however incredibly powerful. Its syntax feels very
mathematical, and incredibly terse.
\url{https://www.haskell.org/}
Perhaps Zippy could learn from Haskell, as it provides functional and
procedural elements, making it a well rounded language
\item[Hare]
Hare was designed to be a 100 year language, and thus stability is
its main goal, it is not set to
have a syntax change any time soon, and it has strong emphasis on
memory managment like C. It fits into the same part of the tech stack
as C, and thus it can be used for some very low level work.
\url{https://harelang.org/}
I think Zippy should have a strong emphasis on stability, much like Hare,
too many times have I segfaulted due to a tiny mistake. Zippy should
also look to Hare's small size, you can buy a copy of Hare on a
\textbf{SINGLE 3 1/2" FLOLPY}
This is something I too should try to achieve.
\end{description}
\subsubsection{What should be taken away from these languages}
I was already leaning towards functional programming when I started this project however
now I believe it's the only option for producing safe applications. Zippy will be
a functional language.
I also believe that I should take size of the compiler into account, as this is important
for keeping the project manageable and maintanable.
And finally I think that syntax should be inspired by Lisp, although Lisp itself can be
a messy language, with the right changes I am confident that I can make a attractive
language for the 21st century.
\subsection{Clients}
In a project of this nature, the Client is every programmer alive; which is a pretty
large scope. To narrow this down as much as possible, I will interview a small handful
of people throughout the project, of different skill levels to get a good picture of
what people think of the project.
\subsubsection{Client 1: Amy C}
My first client is a friend of mine, Amy C, she is a confident programmer who has
completed many complicated projects. I am choosing her as a client as she can give me
technical feed back on my project and its function/utility.
\subsubsection{Client 2: Rayn M}
Another friend of mine, Rayn M, is a technical computer user, however he does not know
how to program at a high level. He will be a good client as he can show me how my
language looks to some one who doesn't understand the inside workings, helping me
design the structure of the code to something useable for more people.
\subsubsection{Client 3: Myself}
I've wanted to take out a project like this for a long long time, and this is the
perfect opportunity to do so, I will be assessing myself along the way of this,
building the project to my personal specification.
\subsection{Questionnaires}
It is important to get feedback from end users, so I will take multiple questionnaires
throughout the project. I will then use them to slightly edit the requirements of my
project this should make the final outcome more helpful and what people want.
\subsubsection{Amy C, initial ideas}
\begin{description}
\item[What do you find the most important in a language?]
Speed, readability, debugging ease and disk space efficiency.
\item[What tools are important for a language to have?]
IDE integration (things like tab complete and debugging tools), a
package manager, and the ability to interact with the user through the
command line easily.
\item[What features do you like from other languages?]
The ability to pass the memory reference of an object or function and a
collection of built-in or standard functions like "print", "split",
or "sort".
\item[What do you want to program in this language?]
Lightweight command line tools and web back ends.
\item[Do you intend to use graphics in the programs you write?]
Yes.
\item[Would you prefer a language that focuses on ease of
use, or power of code?]
I like a good balance between the two.
\item[What were your last 3 projects?]
A website, a small command-line tool and a midi keyboard (program runs
on a Raspberry Pi Pico).
\item[How many languages would you use on a single project?]
I try to use as little languages in a single project as possible, so i
could likely not use it in an existing project.
\item[Do you care for low level control, or would you prefer
high level abstractions?]
I think low-level control is very important, but high-level abstractions
are convenient, so a good balance between the two is best.
\item[Would you be happy to develop libraries for things that aren't
already implemented?]
Potentially if it is simple enough to implement new things.
\end{description}
\subsubsection{Notes from questionnare 1}
Some of the key things that I'm taking away from this first questionnaire, are my
client/users initial needs and use cases. I think it's clear my language can be of
assistance to my client, Zippy will be a good language for web back ends and
small command line tools, which my client expressed interested in.
I find the fact my client is worried by executable size interesting, however
I doubt it will be an issue; a ballooning code-base is unlikely as only one
person is writing the whole project.
I am also taking on the fact that my client wants good command line tools,
so a pkg-manager and bundler should be a priority, perhaps they could be
written in Zippy after the compiler is done.
\subsection{The first elements of the project}
At this stage I can say that I'm confident in my project and its scope. I
have a goal in mind for
it.
\textbf{The key things to take away from this section are:}
\begin{itemize}
\item
Make a high level language with a useable set of features, to
replace C in many situations.
\item
Keep the language readable and easy, with powerful tools available.
\item
Ensure the language is well supported with tools like a pkg-manager.
\end{itemize}
\section{Modelling}
In larger projects, when a programmer needs a data structure that the language
they are writing in doesn't provide, they will need to make their own. This can pose a
challenge to some, especially in low level languages which don't provide anything
out of the box.
Bellow are a few examples of these data structures that C doesn't already provide,
that I may use in my project.
\subsection{Linked lists}
this is an alternative implementation of a list, where you store some data, and
the memory address to the next node. Then you can move through the list by reading
the data then reading the data of the next node, and then repeating until the
'next' part of the node is empty.
A diagram showing this can be seen here:
\begin{tikzpicture}
\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
\Tree
[.ll
[.data
]
[.next
[.ll
[.data
]
[.next
[.ll
[.data
]
[.next
]
]
]
]
]
]
\end{tikzpicture}
In C this is easy to implement as you can find a memory address very easily with to
find where a bit of data is stored in memory (address of). I will need to use a 'struct',
which is a bit like a class in C (however you can't attach a function to it, nor use
inheritance). A simple implementation looks like this:
\begin{lstlisting}[language=C++, caption=Linked list example]
typedef struct ll {
void *data; // the data of the node
ll *next; // the next node
} ll;
\end{lstlisting}
The pro's of a linked list are the fact that they can have data appended to the start or
end easily by changing the root node, or the next node.
Linked lists have a few downsides, for example you can't move through them backwards,
and unless you store it on its own, you cant find the length of it in a fast way.
In my project I would like to use linked list in the AST (see later sections for info),
and perhaps to store lists in the language.
\subsection{Dictionaries}
A dictionary is a simple data structure that just stores, a bit of data, and a number or
string to identify it.
A dictionary like a linked list can be implemented with a struct in c like so:
\begin{lstlisting}[language=C++, caption=Dictionary example]
typedef struct dict {
void *data;
int id;
} dict;
\end{lstlisting}
In my project I could use this to hold variables and functions which need to be
checked and looked up, which is very slow when comparing entire strings, but with this
I can compare integer ID's which is much faster.
\subsection{Prototyping harder features}
\subsubsection{Abstract syntax trees (AST's) theory}
In a programming language many abstract data types will be used to allow the code
to compile and execute, however I think the hardest part of this is an abstract
syntax tree. This is a data structure that holds the code in an ordered form that
can be analysed and executed in a simple way. It is a tree structure, with the top
node being a root and all lower nodes being things needed to calculate the root. It can
be used to show mathematical expressions and function calls, but I thing easiest way to
show it is via a mathematical example.
Take the follow expression for example:
{\Large{\(1 + (10 * (3 - (2 * 4)))\)}}
We know that this is equal to \(-49\)
However for a computer this is far harder to understand. This is because it has no
understanding of order of operation.
To solve this we use an AST.
When you solve that expression you know to start with
\((2 * 4)\), then \(3 -\)
from that, following the rules of BIDMAS to solve.
We can represent the steps as a tree like so:
\begin{tikzpicture}
\tikzset{edge from parent/.style={draw,edge from parent path={(\tikzparentnode.south)-- +(0,-8pt)-| (\tikzchildnode)}}}
\Tree
[.+
[.1
]
[.*
[.10
]
[.-
[.3
]
[.*
[.2
]
[.4
]
]
]
]
]
\end{tikzpicture}
\\
This will evaluate \(1 + (10 * (3 - (2 * 4)))\)
As you can see, you need to evaluate the expression in the most brackets
first, then the next, and so on, working you way up.
You can evaluate code in a similar way, treating each operation (such as +-*/)
as functions, doing the most deeply nested function first, then working up.
Each expression can be represented in this tree, then to show a whole program you
can create a list of trees.
\subsubsection{Abstract syntax trees (AST's) practical}
As a prototype i will make a program that can take mathematical expressions and evaluate
them, and allowing for functions (in the form f(x)). It will do this via AST's.
This prototype takes 173 lines of code, it takes a string as a cmd line argument then
converts it into an abstract syntax tree, and finally it executes it. This is just a
simple prototype and thus it is small in scope. It can only do simple operators (+-*/)
and requires literal values to be surrounded by [] so it knows its not another
expression to evaluate.
\lstinputlisting[language=C++]{../code/proto/AST/ast.c}
\textit{The main loop for the ast code.}
\lstinputlisting[language=C++]{../code/proto/AST/astg.c}
\textit{The execution loop for the ast code.}
\lstinputlisting[language=C++]{../code/proto/AST/astg.h}
\textit{The definition of the ast, and function prototypes.}
Above is the code for the AST, it stores an operation (which is just an integer), and
it stores a real left and real right value, along side two other nodes. The real values
are integers, this would be the 2 numbers in reference, in the expression. The 2 nodes are a
recursive data structure, much like putting an object of a class inside the definition of that class
itself. They are used to store values that may still be expressions, for example
(+ [1] (+ [1] [1])) the second part of this expression would be in the "right"
variable.
When code is executed I can check if "left", or "right" are NULL and if
they are I know that I am at the lowest expression that is only literal values.
Then I can execute that node and work my way up the tree.
The exec function will execute the operation, unless there is a deeper node, if there is
a deeper node, then it executes it, and places the result in the right or left spot
respectively.
\textbf{Here is an example input and output:}
./ast "(+ (- [3] [1]) (- [3] [1]))"
4
{\small Note the [ ] used to tell the program where the literal values are.}
Overall this was a relatively successful prototype, however it isn't fully functional
as a language but it has fit the design for a prototype.
\textbf{The code for the AST can be found here:
\url{https://github.com/standenboy/school/tree/master/comp/lucas-standen-NEA/code/proto/ast}}
\subsection{Feedback}
From my first Client (Amy C), she said that putting the numbers inside square brackets
was inconvenient and annoying and it would be better if the numbers were separated
by spaces instead of separate square bracket surrounded literals.
As this is a prototype I won't fix this issue, however in the actual language this is
a needed feature that I will be implementing.
\subsection{Mixing linked lists and AST's}
Mixing these 2 data structures together you can represents an entire program. A linked
list of AST's is how Zippy will represent all code the user writes. To do this, use a
linked list, and in the data element put a AST, then the next node can contain the same.
This might be a help to zippy as the compiler can convert all code to an AST, then
compile it.
\section{Objectives}
Zippy must support the following features, it needs them to be a usable language that has
many uses.
\subsection{Core objectives}
\begin{description}
\item[A compiler for the Zippy language]
\item[AST's used to compile source code]
\item[A lisp like syntax]
\item[Functional paradigm language]
\item[Recursion]
\item[Higher order functions] \textit{(this means functions can be passed as arguments to
other functions)}
\item[High performance language]
\item[A package manager]
\item[Ability to call C functions]
\end{description}
If possible I would like Zippy to also meet the following extra objectives
\subsection{Extra objectives}
\begin{description}
\item[String parsing in the stdlib]
\item[graphs in the stdlib]
\item[networking in the stdlib]
\item[graphics in the stdlib]
\end{description}
I think with these objectives in mind I will make a well rounded language that achieves my
goal of being used in the same way and places that C is. If all goes to plan I will
create a high level, compiled, functional programming language.
\section{Design}
\subsection{Language specification}
Like any other programming language Zippy needs to have a defined syntax, bellow
you can find a syntax for zippy that will be complaint with my objectives.
\subsection{Keywords}
\begin{description}
\item[defun] starts the definition of a function, will take in a name, return type, and arguments with types
\item[endfun] ends the definition of a function
\item[let] define a variable with a initial value, takes a variable name and a value
\item[set] change a pre defined variable, takes a variable name and a new value
\item[def] define a variable which doesn't have a initial value, takes a variable name
\item[defunptr] defines a ptr to a function same as defun, but used to pass functions as arguments
\item[if] starts an if block, takes a condition
\item[elif] starts an elif block, must come after and if block, takes a condition
\item[else] starts an else block, must come after an elif or if block
\item[endif] must be at the end of an if/elif/else block
\item[for] defines a for loop, takes a variable name to use as an iterator, a starting value for that variable, a condition, and a difference to change the iterator by each loop
\item[endfor] ends a for block
\item[symbol] defines a function for semantic purposes from an external executable, take a name and arguments, the symbol must be in a linked binary
\item[+] addition, takes 2 arguments
\item[-] subtraction, takes 2 arguments
\item[*] multiplication, takes 2 arguments
\item[/] division, takes 2 arguments
\item[=] equality comparison operation, takes 2 arguments
\item[!=] negative equality operation, takes 2 arguments
\item[\(<\)] less than equality operation, takes 2 arguments
\item[\(>\)] greater than equality operation, takes 2 arguments
\item[\(<=\)] less than or equal operation, takes 2 arguments
\item[\(>=\)] greater than or equal operation, takes 2 arguments
\item[exit] exits the program, takes a value to exit on
\item[return] returns from a function, and takes a value to return
\item[alloc] allocates a block of memory, takes a size in bytes
\item[struct] defines a struct, takes a name as an argument
\item[endstruct] ends the definition of a struct
\item[sizeof] returns the size of a given type, takes the type name as an argument
\end{description}
\subsection{Other code elements}
As one can see from the list of keywords, all blocks such as functions are defined and ended. This leads to a nice
form of code that isn't reliant on indentation, where blocks can be easily seen. I have made sure to keep the
language uniform, so function, struts, if statements and for loops all use this syntax.
Comments can be written in zippy by prefacing a line with a '/'.
All code must have a main function, if it doesn't the linker will fail. This is the entry point to the program, so
it is where code will start executing from.
Reverse Polish notation is used for mathematical and comparison operations, so \((2 + (4 + 3))\) would be written
as \((+\; 2\; (+\; 4\; 3))\).
\subsection{Memory management}
In zippy memory is semi manual to manage, like in C the programmer needs to allocate blocks of a large enough size,
however they do not need to free the memory, as all allocs will be freed just before a function returns. This means
that memory has a life time of the function it was defined in. One should note that this means that if an allocated
block needs to be returned, the caller function needs to allocate the block, or else the memory returned will be
freed as soon as the function has finished.
\subsection{The steps in compiling a zippy program}
All the programmer should need to do is call the "zpy" command, and then the code will be converted from zippy to
binary. However on the back end of the project, more has to happen. First the zippy code gets converted to C, this
is how zippy will be cross compatible with C, then the C code is compiled to ASM, then the ASM, is converted to
binary, and finally the linker will convert the binary into an executable, and add any libraries that are used.
For my project I will only need to do the first step, and slightly edit the second step, as C compilers, assemblers
and linkers already exist. So I don't need to reinvent the wheel.
\subsubsection{Converting zippy to C}
The process of converting zippy to C is yet again another long one. It starts with reading in the program,
then converting that into individual expressions (in zippy one expression takes up one line), then the expression
is converted to a data structure that holds a function name, and its arguments in individual variables and finally
it converts that data structure to C code via substituting names, functions, and literal values into template C code.
\subsection{Actually using zippy}
For the programmer to use zippy they must be able to call it as a command, on a text file containing their code and
they must give an output file name, to place the binary. To call the command, it can be simply done using the command
line with the "zpy" command, which the user will need to install, this is the compiler.
The compiler will have the following options to tweak the output.
All options are prefixed with a '-', to denote that they are not the input file.
\begin{description}
\item{o} - changes the output file location
\item{i} - tells the C compiler to include a library that is given after this argument
\item{c} - tells the zippy compiler to return C code instead of a compiled binary,
this is helpful when using zippy to make small functions in a C/C++ code
base
\item{f} - pass a flag to the C compiler, the next option should be an argument
accepted by gcc.
\end{description}
The programmer will also have access to "zpypkg" which will automatically setup their project with compile commands
and a template executable. It will be a very simple tool, that copies needed files around to the correct folders,
it will be written using Bash, (the main scripting language used by unix, along with sh and perl). It will have the
following arguments to allow the programmer to quickly write zippy code.
\begin{description}
\item[init] this will initialize the package manager, creating the needed files.
\item[advinit] this is an advanced form of init, it will set things up to have interoperability with C.
\item[build] this command can be used from within a zippy project directory, and will build the project into an executable.
\item[run] this will build the project, and run the newly built executable.
\item[clean] this will remove all temp files and binaries from the project folder, this is helpful when sharing code.
\item[remove] this will remove zpypkg from the current directory (think of it like a de-init).
\end{description}
\section{Implementation}
As has been previously mentioned, zippy will have its compiler written in C and its package manager writen in bash.
The code will be displayed bellow, it has been commented to use a tool I wrote called autodoc, which can function
like a doc-string in python, after each file I will explain in detail what it does and why it is in its own file.
PUT A LIST OF FILES HERE
\lstinputlisting[language=C++]{../code2/zpy.c}
\textit{zpy.c}
This is the main executable, containing my main function, which is the program entry point. This file handles the
opening of input and output files, calling other functions, and processing the command line arguments.
\lstinputlisting[language=C++]{../code2/fileread.c}
\textit{fileread.c}
This file is responsible for reading in the file contents, into and array list of strings (an array list is an array
with a length attached to it, to help with memory managemnt). This file also handles the stripping of tabs and other
forms of white space.
\lstinputlisting[language=C++]{../code2/tokenizer.c}
\textit{tokenizer.c}
This file has the job of converting a single line of code into An abstract syntax tree, it splits each expression into
its function name, as a string, and each of its arguments (max 8) as a string, and if there are nested function calls,
it will create more AST nodes for them, in the recursive data structure. The main tokenizer function is using
a lesser known feature of the C programming language, \textit{goto}, it uses this to achieve recursion, without needing
to define a second function, as goto lets the user define a label in their code, just like in ASM.
\lstinputlisting[language=C++]{../code2/appendsnprintf.c}
\textit{appendsnprintf.c}
This is a smaller file, that defines a helper function used heavily in the comp.c file, this function allows the user
to concatenate an unknown amount of strings together. It uses another lesser known feature of C, \textit{vardic functions},
which allow the program to parse an unknown amount of arguments to a function. This function is needed, as C does not
provide an effective way to concatenate strings multiple times, the function that this is built off \textit{snprintf} can only be used
once per variable, which didn't fit my use case.
\lstinputlisting[language=C++]{../code2/comp.c}
\textit{comp.c}
This file is by far the largest and most important in the program, it converts a zpy AST to C code, and writes it to an output
file. It also handles syntax errors in the users code. The first function from this file that is called, is the CompilerInit,
this sets up an interrupt which is used to jump the program to a different point, if there is a syntax error.
The next function from this file that is called, is the main Compile function, this is actually a wrapper around the compile function.
This function is used to set the current line, which is used if there is an error, call the main compile function, write to
the output file and add newlines, if needed. And finally it handles the automatic free function of zippy.
The main function that this provides is the compile function, this starts by processing child arguments, A child argument can be created
by the tokenizer when a nest function is used in zpy, for example (let a:int (+ 2 2)), in this case the child function is (+ 2 2). It
will compile the children function first, using the processChildren function, assigning their outputs to the argument section of the AST.
It then uses a large if block, to determine which function it is needing to generate. Each branch, use the appendsnprintf function to
combine the arguments and templated C code. The checkNULL function is used a lot in this function. This is because if the user does
not provide enough arguments to a function call, which is the most common syntax error after a missing bracket, the argument slot, will
be null, thus I can check if it is NULL, to tell if the user made an error. If an error is made a signal is sent to the process, which
will cause the interrupt handler function to be triggered, telling the user the line number they made the error on with a brief description
of the issue.
Also in this file are a handful of helper functions in the conversion process, these convert zippy's reverse polish notation to traditional
infix definition, and convert zippy's type annotations to C's type annotations.
Finally, as previously mention there is the errorhandle function. This is triggered by a signal being sent to the program, and will cause
it to stop instantly, and will trigger an error print.
\lstinputlisting[language=C++]{../code2/util.c}
\textit{util.c}
This file defines a smaller helper function, which is used in the main executable to gracefully exit if there is an error.
\subsection{Header files}
\lstinputlisting[language=C++]{../code2/fileread.h}
\textit{fileread.h}
\lstinputlisting[language=C++]{../code2/tokenizer.h}
\textit{tokenizer.h}
\lstinputlisting[language=C++]{../code2/comp.h}
\textit{comp.h}
\lstinputlisting[language=C++]{../code2/appendsnprintf.h}
\textit{appendsnprintf.h}
\lstinputlisting[language=C++]{../code2/util.h}
\textit{util.h}
}
\end{document}
|