A branch of mathematics that attempts to prove stuff about computers.
Unfortunately, all software engineers already know the answer to the useful theorems though (except perhaps notably for cryptography), e.g. all programmers obviously know that iehter P != NP or that this is unprovable or some other "for all practical purposes practice P != NP", even though they don't have proof.
And 99% of their time, software engineers are not dealing with mathematically formulatable problems anyways, which is sad.
The only useful "computer science" subset every programmer ever needs to know is:
- for arrays: dynamic array vs linked list
- for associative array: binary search tree vs hash table. See also Heap vs Binary Search Tree (BST). No need to understand the algorithmic details of the hash function, the NSA has already done that for you.
- don't use Bubble sort for sorting
- you can't parse HTML with regular expressions: stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 because of formal language theory
Funnily, due to the formalization of mathematics, mathematics can be seen as a branch of computer science, just like computer science can be seen as a branch of Mathematics!
The dominating model of a computer.
The model is extremely simple, but has been proven to be able to solve all the problems that any reasonable computer model can solve, thus its adoption as the "default model".
The smallest known Turing machine that cannot be proven to halt or not as of 2019 is 7,918-states: www.scottaaronson.com/blog/?p=2725. Shtetl-Optimized by Scott Aaronson is just the best website.
A bunch of non-reasonable-looking computers have also been proven to be Turing complete for fun, e.g. Magic: The Gathering.
A Turing machine that simulates another Turing machine/input pair that has been encoded as a string.
In other words: an emulator!
The concept is fundamental to state several key results in computer science, notably the halting problem.
A computer model that is as powerful as the most powerful computer model we have: Turing machine!
There is a Turing machine that halts for every member of the language with the answer yes, but does not necessarily halt for non-members.
Tagged
Subset of recursively enumerable language as explained at: difference between recursive language and recursively enumerable language.
Set of all decision problems solvable by a Turing machine, i.e. that decide if a string belongs to a recursive language.
Is a decision problem of determining if something belongs to a non-recursive language.
Or in other words: there is no Turing machine that always halts for every input with the yes/no output.
Every undecidable problem must obviously have an infinite number of "possibilities of stuff you can try": if there is only a finite number, then you can brute-force it.
Some undecidable problems are of recursively enumerable language, e.g. the halting problem.
Lists of undecidable problems.
Coolest ones besides the obvious boring halting problem:
- mortal matrix problem
- Diophantine equation existence of solutions: undecidable Diophantine equation problems
If there are infinitely many inputs, we can always construct a (potentially exponentially huge) Turing machine that hardcodes the outcome for every possible input, so the problem is never undecidable.
The problem is of course deciding and proving the outcome for each possible input, notably as it is possible that calculation for some of the inputs may be independent from ZFC.
One of the most simple to state undecidable problems.
The reason that it is undecidable is that you can repeat each matrix any number of times, so there isn't a finite number of possibilities to check.
A:
- decidable problem is to a decision problem
- like a computable problem is to a function problem
The prototypical example is the Busy beaver function, which is the easiest example to reach from the halting problem.
Tagged
There are only boring exampes of taking an uncomputable language and converting it into a number?
Same as recursive language but in the context of the integers.
This is the classic result of formal language theory, but there is too much slack between context free and context sensitive, which is PSPACE (larger than NP!).
By Noam Chomsky.
TODO had seen a good table on Wikipedia with an expanded hierarchy, but lost it!
Tagged
Computational problem where the solution is either yes or no.
When there are more than two possible answers, it is called a function problem.
Decision problems come up often in computer science because many important problems are often stated in terms of "decide if a given string belongs to given formal language".
Tagged
The canonical undecidable problem.
A Turing machine decider is a program that decides if one or more Turing machines halts of not.
Of course, because what we know about the halting problem, there cannot exist a single decider that decies all Turing machines.
E.g. The Busy Beaver Challenge has a set of deciders clearly published, which decide a large part of BB(5). Their proposed deciders are listed at: discuss.bbchallenge.org/c/deciders/5 and actually applied ones at: bbchallenge.org.
But there are deciders that can decide large classes of turing machines.
Many (all/most?) deciders are based on simulation of machines with arbitrary cutoff hyperparameters, e.g. the cutoff space/time of a Turing machine cycler decider.
The simplest and most obvious example is the Turing machine cycler decider
Turing machine regex tape notation is Ciro Santilli's made up name for the notation used e.g. at:Most of it is just regular regular expression notation, with a few differences:
- denotes the right or left edge of the (zero initialized) tape. It is often omitted as we always just assume it is always present on both sides of every regex
A
,B
,C
,D
andE
denotes the current machine state. This is especially common notation in the context of the BB(5) problem<
and>
next to the state indicate if the head is on top of the left or right element. E.g.:indicates that the head11 (01)^n <A 00 (0011)^{n+2}
A
is on top of the last1
of the last sequence of n01
s to the left of the head.
This notation is very useful, as it helps compress long repeated sequences of Turing machine tape and extract higher level patterns from them, which is how you go about understanding a Turing machin in order to apply Turing machine acceleration.
Bibliography: discuss.bbchallenge.org/t/decider-cyclers/33
Example: bbchallenge.org/279081.
These are very simple, they just check for exact state repetitions, which obviously imply that they will run forever.
Unfortunately, cyclers may need to run throun an initial setup phase before reaching the initial cycle point, which is not very elegant.
Also, we have no way of knowing the initial setup length of the actual cycle length, so we just need an arbitrary cutoff value.
And unfortunatly, this can lead to misses, e.g. Skelet machine #1, a 5 state machine, has a (translated) cycle that starts at around 50-200M styeps, and takes 8 trillion steps to repeat.
Bibliography: discuss.bbchallenge.org/t/decider-translated-cyclers/34
Like a cycler, but the cycle starts at an offset.
To see infinity, we check that if the machine only goes left N squares until reaching the repetition, then repetition must only be N squares long.
Tagged
Described at: www.sligocki.com/2022/06/10/ctl.html
The busy beaver game consists in finding, for a given , the turing machine with states that writes the largest possible number of 1's on a tape initially filled with 0's. In other words, computing the busy beaver function for a given .
There are only finitely many Turing machines with states, so we are certain that there exists such a maxium. Computing the Busy beaver function for a given then comes down to solving the halting problem for every single machine with states.
Some variant definitions define it as the number of time steps taken by the machine instead. Wikipedia talks about their relationship, but no patience right now.
The Busy Beaver problem is cool because it puts the halting problem in a more precise numerical light, e.g.:
- the Busy beaver function is the most obvious uncomputable function one can come up with starting from the halting problem
- the Busy beaver scale allows us to gauge the difficulty of proving certain (yet unproven!) mathematical conjectures
Bibliography:
The step busy beaver is a variant of the busy beaver game counts the number of steps before halt, instead of the number of 1's written to the tape.
As of 2023, it appears that BB(5) the same machine, , will win both for 5 states. But this is not always necessarily the case.
is the largest number of 1's written by a halting -state Turing machine on a tape initially filled with 0's.
The following things come to mind when you look into research in this area, especially the search for BB(5) which was hard but doable:
- it is largely recreational mathematics, i.e. done by non-professionals, a bit like the aperiodic tiling. Humbly, they tend to call their results lemmas
- complex structure emerges from simple rules, leading to a complex classification with a few edge cases, much like the classification of finite simple groups
Bibliography:
Turing machine acceleration refers to using high level understanding of specific properties of specific Turing machines to be able to simulate them much fatser than naively running the simulation as usual.
Acceleration allows one to use simulation to find infinite loops that might be very long, and would not be otherwise spotted without acceleration.
This is for example the case of www.sligocki.com/2023/03/13/skelet-1-infinite.html proof of Skelet machine #1.
Project trying to compute BB(5) once and for all. Notably it has better presentation and organization than any other previous effort, and appears to have grouped everyone who cares about the topic as of the early 2020s.
Very cool initiative!
By 2023, they had basically decided every machine: discuss.bbchallenge.org/t/the-30-to-34-ctl-holdouts-from-bb-5/141
In June 2024 they felt that they had verified the result after a full Coq proof was published:
So now onto BB(6) I guess.
The last value we will likely every know for the busy beaver function! BB(6) is likely completely out of reach forever.
By 2023, it had basically been decided by the The Busy Beaver Challenge as mentioned at: discuss.bbchallenge.org/t/the-30-to-34-ctl-holdouts-from-bb-5/141, pending only further verification. It is going to be one of those highly computational proofs that will be needed to be formally verified for people to finally settle.
As that project beautifully puts it, as of 2023 prior to full resolution, this can be considered the:on the Busy beaver scale.
simplest open problem in mathematics
Best busy beaver machine known since 1989 as of 2023, before a full proof of all 5 state machines had been carried out.
Paper extracted to HTML by Heiner Marxen: turbotm.de/~heiner/BB/mabu90.html
Non formal proof with a program March 2023: www.sligocki.com/2023/03/13/skelet-1-infinite.html Awesome article that describes the proof procedure.
The proof uses Turing machine acceleration to show that Skelet machine #1 is a Translated cycler Turing machine with humongous cycle paramters:
- start between 50-200 M steps, not calculated precisely on the original post
- period: ~8 billion steps
wiki.bbchallenge.org/wiki/Antihydra:
- news.ycombinator.com/item?id=40864949 BB(6), The 6th Busy Beaver Number, is harder than a Collatz-like math problem
- www.reddit.com/r/math/comments/1dubva0/finding_the_6th_busy_beaver_number_%CF%836_aka_bb6_is/ "Finding the 6th busy beaver number (Σ(6), AKA BB(6)) is at least as hard as a hard Collatz-like math problem called Antihydra":
- www.reddit.com/r/compsci/comments/1duc62e/finding_the_6th_busy_beaver_number_%CF%836_aka_bb6_is/
Also posted at:
gmp/antihydra.c
/* Tested on GMP 6.3.0, Ubuntu 24.04. */
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <time.h>
#include <gmp.h>
static uint64_t get_milis(void) {
struct timespec ts;
timespec_get(&ts, TIME_UTC);
return (uint64_t)(ts.tv_sec * 1000 + ts.tv_nsec/1000000);
}
int main(int argc, char **argv) {
char *as, *bs;
mpz_t a, aq, ar, b;
uint64_t i, time, newtime;
/* CLI and init. */
if (argc > 1) {
as = argv[1];
} else {
as = "8";
}
if (argc > 2) {
bs = argv[2];
} else {
bs = "0";
}
mpz_init_set_str(a, as, 10);
mpz_init_set_str(b, bs, 10);
mpz_init(aq);
mpz_init(ar);
i = 0;
time = get_milis();
/* Run. */
while (1) {
/* aq = a / 2
* ar = a % 2 */
mpz_fdiv_qr_ui(aq, ar, a, 2);
if (
/* odd */
mpz_cmp_ui(ar, 0)
) {
/* b == 0 */
if (!mpz_cmp_ui(b, 0)) break;
/* a = aq * 3 + 1 */
mpz_mul_ui(a, aq, 3);
mpz_add_ui(a, a, 1);
/* b -= 1 */
mpz_sub_ui(b, b, 1);
} else {
/* a = aq * 3 */
mpz_mul_ui(a, aq, 3);
/* b += 2 */
mpz_add_ui(b, b, 2);
}
i++;
if (i % 100000 == 0) {
newtime = get_milis();
gmp_printf("%" PRIu64 " ms=%" PRIu64 " log10(a)=%ju log10(b)=%ju\n",
i/100000, newtime - time, mpz_sizeinbase(a, 10), mpz_sizeinbase(b, 10));
time = newtime;
}
}
/* Cleanup if we ever reach it. */
mpz_clear(a);
mpz_clear(aq);
mpz_clear(ar);
mpz_clear(b);
return 0;
}
The Busy beaver scale allows us to gauge the difficulty of proving certain (yet unproven!) mathematical conjectures!
To to this, people have reduced certain mathematical problems to deciding the halting problem of a specific Turing machine.
A good example is perhaps the Goldbach's conjecture. We just make a Turing machine that successively checks for each even number of it is a sum of two primes by naively looping down and trying every possible pair. Let the machine halt if the check fails. So this machine halts iff the Goldbach's conjecture is false! See also Conjecture reduction to a halting problem.
Therefore, if we were able to compute , we would be able to prove those conjectures automatically, by letting the machine run up to , and if it hadn't halted by then, we would know that it would never halt.
Of course, in practice, is generally uncomputable, so we will never know it. And furthermore, even if it were computable, it would take a lot longer than the age of the universe to compute any of it, so it would be useless.
However, philosophically speaking at least, the number of states of the equivalent Turing machine gives us a philosophical idea of the complexity of the problem.
The busy beaver scale is likely mostly useless, since we are able to prove that many non-trivial Turing machines do halt, often by reducing problems to simpler known cases. But still, it is cute.
But maybe, just maybe, reduction to Turing machine form could be useful. E.g. The Busy Beaver Challenge and other attempts to solve BB(5) have come up with large number of automated (usually parametrized up to a certain threshold) Turing machine decider programs that automatically determine if certain (often large numbers of) Turing machines run forever.
So it it not impossible that after some reduction to a standard Turing machine form, some conjecture just gets automatically brute-forced by one of the deciders, this is a path to
If you can reduce a mathematical problem to the Halting problem of a specific turing machine, as in the case of a few machines of the Busy beaver scale, then using Turing machine deciders could serve as a method of automated theorem proving.
That feels like it could be an elegant proof method, as you reduce your problem to one of the most well studied representations that exists: a Turing machine.
However it also appears that certain problems cannot be reduced to a halting problem... OMG life sucks (or is awesome?): Section "Turing machine that halts if and only if Collatz conjecture is false".
bbchallenge.org/story#what-is-known-about-bb lists some (all?) cool examples,
- BB(15): Erdős' conjecture on powers of 2, which has some relation to Collatz conjecture
- BB(27): Goldbach's conjecture
- BB(744): Riemann hypothesis
- BB(748): independent from the Zermelo-Fraenkel axioms
- BB(7910): independent from the ZFC
wiki.bbchallenge.org/wiki/Cryptids contains a larger list. In June 2024 it was discovered that BB(6) is hard.
mathoverflow.net/questions/309044/is-there-a-known-turing-machine-which-halts-if-and-only-if-the-collatz-conjectur suggests one does not exist. Amazing.
Intuitively we see that the situation is fundamentally different from the Turing machine that halts if and only if the Goldbach conjecture is false because for Collatz the counter example must go off into infinity, while in Goldbach conjecture we can finitely check any failures.
Amazing.
A problem that has more than two possible yes/no outputs.
It is therefore a generalization of a decision problem.
Tagged
Complexity: NP-intermediate as of 2020:
- expected not to be NP-complete because it would imply NP != Co-NP: cstheory.stackexchange.com/questions/167/what-are-the-consequences-of-factoring-being-np-complete#comment104849_169
- expected not to be in P because "could we be that dumb that we haven't found a solution after having tried for that long?
The basis of RSA: RSA. But not proved NP-complete, which leads to:
Tagged
This is natural question because both integer factorization and discrete logarithm are the basis for the most popular public-key cryptography systems as of 2020 (RSA and Diffie-Hellman key exchange respectively), and both are NP-intermediate. Why not use something more provenly hard?
- cs.stackexchange.com/questions/356/why-hasnt-there-been-an-encryption-algorithm-that-is-based-on-the-known-np-hard "Why hasn't there been an encryption algorithm that is based on the known NP-Hard problems?"
NP-intermediate as of 2020 for similar reasons as integer factorization.
An important case is the discrete logarithm of the cyclic group in which the group is a cyclic group.
Tagged
This is the discrete logarithm problem where the group is a cyclic group.
In this case, the problem becomes equivalent to reversing modular exponentiation.
This computational problem forms the basis for Diffie-Hellman key exchange, because modular exponentiation can be efficiently computed, but no known way exists to efficiently compute the reverse function.
A solution to a computational problem!
Draft by Ciro Santilli with cross language input/output test cases: github.com/cirosantilli/algorithm-cheat
By others:
More commonly known as a map or dictionary.
Like Binary search tree, but each node can have multiple objects and more than two children.
Tagged
This is a family of notations related to the big O notation. A good mnemonic summary of all notations would be:
E.g.:
- . For , is enough. Otherwise, any will do, the bottom line will always catch up to the top one eventually.
Stronger version of the big O notation, basically means that ratio goes to zero. In big O notation, the ratio does not need to go to zero.
E.g.:
- K does not tend to zero
In intuitive terms it consists of all integer functions, possibly with multiple input arguments, that can be written only with a sequence of:and such that
- variable assignments
- addition and subtraction
- integer comparisons and if/else
- for loops
for (i = 0; i < n; i++)
n
does not change inside the loop body, i.e. no while loops with arbitrary conditions.n
does not have to be a constant, it may come from previous calculations. But it must not change inside the loop body.Primitive recursive functions basically include every integer function that comes up in practice. Primitive recursive functions can have huge complexity, and it strictly contains EXPTIME. As such, they mostly only come up in foundation of mathematics contexts.
The cool thing about primitive recursive functions is that the number of iterations is always bound, so we are certain that they terminate and are therefore computable.
This also means that there are necessarily functions which are not primitive recursive, as we know that there must exist uncomputable functions, e.g. the busy beaver function.
Adding unbounded while loops of course enables us to simulate arbitrary Turing machines, and therefore increases the complexity class.
More finely, there are non-primitive total recursive functions, e.g. most famously the Ackermann function.
To get an intuition for it, see the sample computation at: en.wikipedia.org/w/index.php?title=Ackermann_function&oldid=1170238965#TRS,_based_on_2-ary_function where in this context. From this, we immediately get the intuition that these functions are recursive somehow.
Strictly speaking, only defined for decision problems: cs.stackexchange.com/questions/9664/is-it-necessary-for-np-problems-to-be-decision-problems/128702#128702
Tagged
Interesting because of the Cook-Levin theorem: if only a single NP-complete problem were in P, then all NP-complete problems would also be P!
We all know the answer for this: either false or independent.
A problem such that all NP problems can be reduced in polynomial time to it.
This is the most interesting class of problems for BQP as we haven't proven that they are neither:
- P: would be boring on quantum computer
- NP-complete: would likely be impossible on a quantum computer
P for quantum computing!
Heck, we know nothing about this class yet related to non quantum classes!
- conjectured not to intersect with NP-complete, because if it were, all NP-complete problems could be solved efficiently on quantum computers, and none has been found so far as of 2020.
- conjectured to be larger than P, but we don't have a single algorithm provenly there:
- it is believed that the NP complete ones can't be solved
- if they were neither NP-complete nor P, it would imply P != NP
- we just don't know if it is even contained inside NP!
- math.stackexchange.com/questions/361422/why-isnt-np-conp "Why isn't NP = coNP?"
- stackoverflow.com/questions/17046440/whats-the-difference-between-np-and-co-np
- cs.stackexchange.com/questions/9795/is-the-open-question-np-co-np-the-same-as-p-np
- mathoverflow.net/questions/31821/problems-known-to-be-in-both-np-and-conp-but-not-known-to-be-in-p
The exact same problem appears over and over, e.g.:
- transportaion: the last mile of the trip when everyone leaves the train and goes to their different respective offices is the most expensive
- telecommunications: the last mile of wire linking local hubs to actual homes is the most expensive
- electrical grid: same as telecommunications
Ciro Santilli also identified knowledge version of this problem: the missing link between basic and advanced.
The function being maximized in a optimization problem.
It is cool how even for such a "simple looking" problem, we were still unable to prove optimality as of 2020.
Tagged
Applications:
- hash map which is a O(1) amortized implementation of a map
- creating unbreakable chains of data, e.g. for Git commits or Bitcoin.
- storing passwords on a server in a way that if the password database is stolen, attackers can't reuse them on other websites where the user used the same password: security.blogoverflow.com/2013/09/about-secure-password-hashing/
Tagged