Building a Polymarket Copy-Trading Bot, Part 3: The Bot That Ate Its Own Memory

This is the third entry in a development log about building a bot that copies profitable traders on Polymarket, the on-chain prediction market. The first two were about things that refused to work: orders that were rejected before they could exist, and a logging bug that hid every other bug behind it. This one is different. This time nothing was broken. The bot ran, placed orders, reconciled results, and did everything I asked of it — while quietly eating more and more memory until I happened to look.

As always, I am writing in English because Polymarket is not available in every jurisdiction, and this series is for builders in markets where it is. And as always, I am being deliberately vague about the part that makes money — which wallets I follow, how positions are sized, the exact entry filters. Everything in this post is infrastructure: how a long-running Node process measures its own behavior, and how that measurement turned into the single most expensive thing the bot did.

A number that should not have been climbing

I run the bot under pm2 on a small VPS, and most days I do not think about it. On the day this story starts I ran pm2 list for an unrelated reason and noticed the resident memory sitting near 290 MB after about twenty hours of uptime. That is not a catastrophe — the box has room — but it was the trend that bothered me. A copy-trading bot does almost nothing most of the time. It waits, it watches a handful of wallets, it occasionally fires an order. A process like that should reach a flat, boring memory profile within minutes and stay there. Mine was a slow, steady climb, and a slow steady climb only ends one way: a restart, at the worst possible moment, in the middle of a fill.

So I went looking, expecting a classic leak — a forgotten event listener, a closure holding a reference it should have released. What I found was more embarrassing and more interesting than that.

Three ways to leak memory without leaking memory

There were three culprits, and not one of them was a leak in the textbook sense. Every line of code was doing exactly what it had been told to do. The problem was that what it had been told to do got more expensive every hour the bot stayed alive.

The first two were unbounded sets. The bot deduplicates work it has already seen — transaction hashes it has already processed, signal IDs it has already acted on — by dropping them into an in-memory Set and checking membership before doing anything. That is the right instinct. The mistake was that nothing ever left those sets. seenHashes in the transaction monitor, and a small family of dedup sets in the main loop, grew monotonically from the moment the process started until the moment it was restarted. On a quiet bot this is slow. On a busy day it is not. There was even a dedupeWindowMs value sitting in the config, written long ago, never wired up — a TTL that existed in spirit but not in code.

Those were real, but they were not the headline. The headline was the third one, and it was hiding inside the very thing I had built to keep the bot honest.

The real culprit: measuring the past by loading all of it

The bot writes one JSON line per meaningful event to an append-only log — every signal, every decision, every submitted order, every fill, every reconciliation. Once that log existed, I wrote a small performance reporter on top of it. Every five minutes it computes a rolling one-hour summary: how many signals fired, what fraction became orders, the fill rate, the win rate, the expected value per signal. It is the dashboard that tells me whether the strategy is alive or dead. I was proud of it.

It was also implemented in the laziest way imaginable. To produce a one-hour window, it did this:

const raw = fs.readFileSync(logPath, 'utf8');   // load the ENTIRE log
const events = raw.split('\n')
  .filter(Boolean)
  .map(line => JSON.parse(line))
  .filter(e => e.timestamp >= Date.now() - windowMs);

Read the whole file into a string. Split it. Parse every line. Then throw away everything older than one hour. Every five minutes, forever.

When the log was a few megabytes, nobody noticed. But an append-only log only grows, and by the time I looked, the file on disk was 848 MB. So four times an hour, the bot was loading 848 MB of text into memory, parsing hundreds of thousands of JSON objects, and discarding all but the last sixty minutes of them. The cost of answering “what happened in the last hour” had quietly become proportional to everything that had ever happened. And because the work to report on the log made the process heavier, which made garbage collection slower, the reporter was making the very thing it measured worse.

That is the part I keep coming back to. It was not a bug in the trading logic. It was the observability layer — the thing whose entire job is to be a cheap, passive observer — turning into the single most expensive operation in the system.

Triage first, then the actual fix

The fastest way to stop the bleeding was not to touch code at all. Stop the process, keep only the recent tail of the log, restart:

pm2 stop bot
tail -n 10000 logs/app.jsonl > /tmp/recent.jsonl
mv /tmp/recent.jsonl logs/app.jsonl
pm2 start bot

848 MB became 3.1 MB in one command, and memory dropped immediately. But truncating by hand is not a fix; it is a chore I would forget to do until the next slow climb. The real fix had to make the reporter’s cost depend on the size of the window, not the size of the file.

So I replaced “read the whole file” with “read backward from the end, newest line first, and stop the instant you cross the one-hour boundary.” Because the log is append-only and chronological, once you reach an event older than the window, every event before it is older too — you can stop reading. In practice that means touching a few kilobytes instead of hundreds of megabytes:

const CHUNK = 4 * 1024 * 1024; // read 4MB at a time, from the tail

function readTailJsonl(logPath: string, windowMs: number): any[] {
  const windowStart = Date.now() - windowMs;
  const fd = fs.openSync(logPath, 'r');
  try {
    let position = fs.fstatSync(fd).size;
    const results: any[] = [];
    let carry = '';        // an incomplete line split across chunks
    let done = false;

    while (position > 0 && !done) {
      const size = Math.min(CHUNK, position);
      position -= size;
      const buf = Buffer.alloc(size);
      fs.readSync(fd, buf, 0, size, position);

      const lines = (buf.toString('utf8') + carry).split('\n');
      carry = lines[0];    // first line may be cut off; defer it

      for (let i = lines.length - 1; i >= 1; i--) {
        const line = lines[i].trim();
        if (!line) continue;
        const e = JSON.parse(line);
        if (e.timestamp < windowStart) { done = true; break; } // past the window: stop
        results.push(e);
      }
    }
    return results;
  } finally {
    fs.closeSync(fd);
  }
}

Then, so the file can never balloon to most of a gigabyte again, the reporter rotates it before each run — if the log is over 50 MB, keep the last 20,000 lines and drop the rest:

function rotateIfNeeded(logPath: string): void {
  try {
    if (fs.statSync(logPath).size <= 50 * 1024 * 1024) return;
    execSync(`tail -n 20000 "${logPath}" > "${logPath}.tmp" && mv "${logPath}.tmp" "${logPath}"`,
             { stdio: 'ignore' });
  } catch {
    // rotation failing must never take the bot down
  }
}

The result

The change touched one file and the type-checker had nothing to say about it. After a restart:

Before After
Process memory 292 MB 21.4 MB
Log file on disk 848 MB 3.1 MB
Restarts climbing toward one 0

A fourteen-fold drop in memory, and a profile that is finally flat. The unbounded sets are still on my list — the right fix there is to wire up that forgotten dedupeWindowMs and evict entries past a TTL, because deduping against transactions from hours ago serves no purpose. But the reporter was the fire, and the fire is out.

What I actually learned

Two things I will carry into the next long-running process I write.

First: filtering a time window by loading all of history is O(history), and the fix is almost always to make it O(window). The lazy version is fine in development, where the file is small and the feedback is instant. It fails in production precisely because production is where the file grows. Any “give me the last N minutes” query against an append-only store should read from the end and stop early, not scan from the beginning and discard.

Second, and the one I did not expect: your observability is part of your system, and it has a cost budget. I had treated the performance reporter as free — it only reads, it never trades, what harm could it do? But a monitor that costs more than the thing it monitors is not a monitor; it is a second workload wearing a monitor’s clothes. The lesson is not “don’t measure.” It is “measure as if the measurement itself could be the bug,” because one day it will be.

The bot is lighter now, and a little more honest about what it costs to watch itself. Next time: the harder question of whether copying other people’s trades is a good idea at all — and what the data eventually told me.

Leave a Comment