Algorithms In Context #6: Cycle Detection

Can Bayar
9 min readJan 31, 2021

--

Gather around kids, it’s time for a story. And this story is about a breathtaking race between the tortoise and the hare. Spoiler alert: At the end, tortoise catches up to the hare. Well, kind of…

Okay, so I haven’t been writing for a while because… Well, I didn’t want to. Then yesterday I came across this question on HackerRank, which happens to be the last question of the C++ section and found it quite interesting. But before directly diving into the question, let’s start with a more simple problem and progressively build this thing up.

Detecting Cycles In Linked Lists

This is a classic problem asked in programming interviews and I am sure that many of you are already familiar with it. How do you determine if there is a cycle in a singly linked list? You can spot the cycle in the figure, right?

There are multiple approaches to this problem. Let’s examine them one by one:

  1. We can slightly alter the data structure by adding a boolean flag into the nodes. Then all we need is to iterate through the linked list and mark the visited nodes. If we come across a marked node at any point, it means that we found a cycle.
struct Node {
int data;
Node* next;
bool visited = false;
}
bool containsCycle(Node* head) { while (head != NULL) { if (head->visited) {
return true;
}
head->visited = true;
head = head->next;
}
return false;
}

2. Above algorithm runs perfectly fine and takes O(n) to complete. However, you may not always be able to change the existing data structure. So you need an alternative marking method. We can introduce an auxiliary set to store the visited nodes, instead of marking them.

unordered_set<Node*> lookup;bool containsCycle(Node* head) {    while (head != NULL) {        if (lookup.count(head) > 0) {
return true;
}
lookup.insert(head);
head = head->next;
}
return false;
}

3. Using a lookup also works in O(n) but we used extra space to store the visited nodes (plus, we spent extra time for the hashing function). Obviously, there is a better and smarter way to solve this problem, which brings us to the actual subject of this post: Floyd’s and Brent’s cycle detection algorithms

Floyd’s Algorithm

In this method, we will use two pointers instead of one, one of them iterates faster than the other. We will call the fast one the hare, and the slow one the tortoise — hence the story.

At the start, we initialize both pointers to the head node. Then at each step, we move tortoise by one and hare by two. If the two pointers meet at any point, it means that we have a cycle in our linked list.

bool floyd(Node* head) {    Node* tortoise = head;
Node* hare = head;
while (hare != NULL && hare->next != NULL) { tortoise = tortoise->next;
hare = hare->next->next;
if (tortoise == hare) {
return true;
}
}
return false;
}

It is that simple. But in order to really understand this strategy works, get a pen and paper, draw different cases and count the steps. Go ahead, do it now! I believe this strategy works for different moving speeds for the pointers.

So, you want to find the starting node of the cycle? Easy… Just keep the hare where both pointers have met and initialize the tortoise back to the head. Then move both pointers by one step until they met again. Your cycle starts where they have met.

// Keep the hare at the meeting point
tortoise = head;
while (tortoise != hare) {
tortoise = tortoise->next;
hare = hare->next;
}
Node* cycleStart = tortoise;

Crazy, right? How is this even working? Let’s do a simple example by using the drawing. Suppose that the tortoise and the hare met at the meeting point after hare makes one complete round on the cycle.

The distance traveled by the tortoise is x + y and the distance traveled by the hare is x + y + z + y. The hare travels at twice the speed of the tortoise’s and so, it travels twice the distance. From there;

Baaam! In reality, this is just a single case and for full proof, you need to consider multiple rounds for both the hare and the tortoise. And in case you want to move them at different speeds, you can try that using different constant multipliers. Still, it is good enough to illustrate the logic of the algorithm.

Let’s see how Brent’s algorithm work.

Brent’s Algorithm

Brent’s algorithm works similar to Floyd’s algorithm but it uses a different strategy to move the pointers. This time around the hare moves by the powers of two, each time doubling its speed. The ambitious tortoise spent his time in its lab and now it can travel at the speed of light. So, it just waits for the hare to reach its next destination and it just teleports there. Then hare doubles its speed again and this continues until the hare’s speed is bigger than the length of the cycle. Next time they meet, the hare makes a whole tour around the cycle and it is back again at the tortoise’s location.

bool brent(Node* head) {    Node* tortoise = head;
Node* hare = tortoise->next;
unsigned int power = 1;
unsigned int length = 1;
while (hare != NULL) { if (hare == tortoise) {
return true;
}
if (length == power) {
tortoise = hare;
length = 0;
power *= 2;
}

hare = hare->next;
length++;
}
return false;
}

As you can see, the hare moves one step each iteration until it reaches the current power of two. The number of steps taken by the hare stored in the length variable and the length is reset and the tortoise is teleported each time the hare reaches the current power of two. If you are having problems understanding, just draw this and follow along with the list and it shall be clear.

One advantage of this method over Floyd’s is that it also finds the length of the cycle because the length is actually the length of the cycle. You need an extra step to find the length of the cycle in Floyd’s method.

If you want to find the starting point of the cycle, what you need to do is simple. Initialize both pointers back to the head. Then move one of them by the length of the cycle. Then move both pointers until they meet.

tortoise = head;
hare = head;
for (unsigned int i = 0; i < length; i++) {
hare = hare->next;
}

while (tortoise != hare) {
tortoise = tortoise->next;
hare = hare->next;
}
Node* cycleStart = tortoise;

Up to this point, we have seen different methods to detect cycles on linked lists. It’s been years since I learned this but it wasn’t until recently that I’ve seen these strategies are actually designed to solve a more generic problem, a certain kind of function essentially. Now let’s finally discuss the aforementioned HackerRank question.

Cycle Detection on Functions

In the question, we are given a function. Given the positive integers N, S, P, Q; it asks us how many distinct integers will be generated by the given function.

f(0) = S mod 2^31
for i = 1 to N
f(i) = (f(i-1) * P + Q) mod 2^31

Now you can store each generated integer in a set and just look if the next one already exists in the set. But that would be too simple… The first thing you need to notice is this: Once you acquire an integer that is already generated before, the rest of the integers will keep repeating. We have a cycle, people!

The function given in the question is a function that maps to itself and in fact, for any function that maps a finite set of integers to itself in the form of

we can Floyd’s and Brent’s algorithms. Notice that this kind of a sequence must continue periodically in the same cycle, once an already calculated value is reached. You can also visualize this as a directed graph, where each vertex only has a single outgoing edge and all the vertices are reachable. Linked list is a special kind of graph anyway.

I am not going to share the algorithms that solve the question and leave it as an exercise - and also I am an a**hole and find this rather entertaining. Instead, I will share test results for three different test cases:

  • First Case: 100 million iterations, 31 distinct integers
  • Second Case: 10 million iterations, all distinct integers
  • Third Case: 100 million iterations, all distinct integers
clock_t begin = clock();
int count = count(100000000, 569099406, 1607140150, 823906344);
clock_t end = clock();

cout << "TEST CASE 1" << endl;
cout << "Distinct Integer Count: " << count << endl;
cout << "Algorithm Runtime: " << float(end - begin) / CLOCKS_PER_SEC << "s" << endl;

begin = clock();
count = count(10000000, 658061970, 695098531, 1430548937);
end = clock();

cout << endl;
cout << "TEST CASE 2" << endl;
cout << "Distinct Integer Count: " << count << endl;
cout << "Algorithm Runtime: " << float(end - begin) / CLOCKS_PER_SEC << "s" << endl;

begin = clock();
count = count(100000000, 923092883, 976061291, 1205350525);
end = clock();

cout << endl;
cout << "TEST CASE 3" << endl;
cout << "Distinct Integer Count: " << count << endl;
cout << "Algorithm Runtime: " << float(end - begin) / CLOCKS_PER_SEC << "s" << endl;

I made three different implementations. The first one uses a lookup structure just as we did at the beginning of the post. At first, I used a hash set but then decided to use a boolean vector because the overhead of the hashing function was too much and it was failing some test cases. Here is the output:

TEST CASE 1
Distinct Integer Count: 31
Algorithm Runtime: 0.115112s
TEST CASE 2
Distinct Integer Count: 10000000
Algorithm Runtime: 1.54522s
TEST CASE 3
Distinct Integer Count: 100000000
Algorithm Runtime: 14.1766s

It passed most of the test cases but 14 seconds at the last test case was too big. Now let’s see what happens when I implemented Floyd’s algorithm on the second one:

TEST CASE 1
Distinct Integer Count: 31
Algorithm Runtime: 0s
TEST CASE 2
Distinct Integer Count: 10000000
Algorithm Runtime: 0.19692s
TEST CASE 3
Distinct Integer Count: 100000000
Algorithm Runtime: 1.76177s

10 times better! It turns out using an auxiliary data structure that big is not a very good idea and it causes a lot of cache misses. The third one is Brent’s algorithm:

TEST CASE 1
Distinct Integer Count: 31
Algorithm Runtime: 0s
TEST CASE 2
Distinct Integer Count: 10000000
Algorithm Runtime: 0.11419s
TEST CASE 3
Distinct Integer Count: 100000000
Algorithm Runtime: 0.909387s

It performs nearly twice faster than Floyd’s algorithm. That’s because Floyd’s algorithm has that extra step to calculate the length of the cycle.

Again, if at any point you didn’t understand something, just go ahead and draw an example and it shall be clear as day. Otherwise, you may want to try to solve the question yourself in order to soak in how Floyd’s and Brent’s algorithms work. You can also try the lookup option, but it’s not going to be good enough.

My friends, this has been a long run, heh hee… The reason I am writing these posts is most of the study materials around only focus on acing the programming interviews but they never seem to talk about these concepts in context. I believe understanding how something came out and thinking around it is a much better strategy for life. It also contains some extent of excitement even though it is hard at times. You are not a race horse; be like the tortoise, slow and steady. I promise you will get what you deserver, sooner or later. Take care…

--

--

Can Bayar
Can Bayar

Written by Can Bayar

Senior Software Engineer, Rock Vocalist & Guitarist

No responses yet