Academy's road map for Observability

Ben Clifford, May 2026

Introduction

This post carries on from a presentation I gave at Academy's fortnightly community call in February 2026. I put the slides for that presentation online.

We've learned a few hard lessons about observability over the years working on execution systems including Parsl and Globus Compute.

One is that developers of task execution systems try to hide the guts of their system from users, because the task execution should be doing all the hard work. That leads to trouble when those users graduate from "hello world" users to power users who are both capable of understanding complex systems, and who actively want to understand those systems as they debug what they have built.

I've encountered that in source code (this source code isn't for you to read) and in observability: by which I mean a broad space of wanting to know what is happening inside a running system, and which overlaps other keywords such as: monitoring, provenance, log files, distributed tracing, performance metrics, and outage detection.

I've recently merged some pull requests in this area, and I want to describe my approach here.

Other observability projects

There are many projects that want to do observability-related stuff. I don't want to trivialize that work or claim to be able to easily reinvent the work of an entire research project/product. Instead I want Academy users to be able to make use of both the good stuff Academy brings and the good stuff other projects bring.

A couple of interesting examples that I've personally engaged with:

Diaspora which looks like it could be useful for gathering log-style information across a distributed "academy" of agents and clients (providing a single place for further work on analysis of that data)

Flowcept - a distributed workflow provenance system, which we have already prototyped an integration with here: https://github.com/academy-agents/academy-flowcept

As well as these, I expect other observability related projects and ideas to ebb and flow, and I expect some projects which don't regard themselves as observability projects to also want to engage - for example, in the Parsl world I have occasionally encountered wrapping workflow managers which were interested in fine grained information.

I'm also very aware of competing and differing priorities: someone might want to be building something very research-oriented for a PhD, while someone else might be building a more boring but more stable system. Both ends of this spectrum are legitimate and I don't want to exclude either. But they are different, and there's no universal winner here.

Changes to Academy core

I wanted to implement a fairly minimal set of hooks, based around the ideas that:

the core Academy code knows what it has done/is about to do (think: where we put log statements in the source code

the core Academy code does not (and should not) know who/what will do something with that information

I specifically did not want to implement an overall user-facing observability system.

This led me to think about a fairly conventional plugin hook system, and then the realisation that Python already has such a system: the logging framework!

You can read the Python side of the story in Logging Facility for Python in the Python Standard Library documentation.

From the Academy side of things I care about two pieces of logging, mirroring the bullet points above: creating log events with structured information, and plugging in handlers to do interesting stuff with those log events.

Creating machine readable log events

Lots of people have encountered Python logging before, and it usually looks something like this:

import logging

logger = logging.getLogger(__name__)

logger.info("Starting")
logger.debug("x = %s", x)

Get a logger object, which knows how to inject log events into the logging system, and which has a hierarchical name usually based on the Python module structure.

Then make two log entries, at different levels: one at INFO level and one at DEBUG.

The debug entry happens to include a variable value - this is incorporated into the human readable log string, which is great if your log entries are for a human to look at manually. But that leads inevitably to users performing furtive, guilty regular expressions on log files to extract variables like this. This is an incredibly fragile way of doing things, and kinda sad - you can see that the variable x already exists as a separate value

It turns out Python already has a mechanism for attaching key/value pairs to log events. You can write something like:

logger.debug("x = %s", x, extra={"x": x})

and that extra dictionary will flow into the logging system, although in many cases then be ignored.

Interesting log handlers

Sending a log event into the logging system doesn't write it out or store it anywhere. That's the job of an orthogonal piece of the logger library: log handlers.

A handler is an object that gets to know whenever a log event is logged by one of those logger.info or logger.debug calls I mentioned above.

It is deliberately undefined what it should do with that event: more traditional handlers do things like write to the console or to a log file (and Academy has supported those output modes for a long time), but I also added another handler which writes out each event as a JSON object including the extra keys - which makes things much easier for machine processing. And the Flowcept plugin also interfaces as a log handler, looking for events with relevant extra info.

Configuring log handlers

In plain Python, log handlers are configured per-process. It's very easy to configure them for your submit side process, where you are providing the main module and can initialize anything you want. But in Academy, some of your code might be running elsewhere and inside some other framework: for example an Academy agent might be running via Globus Compute inside a Parsl worker process running on an HPC node, and that Parsl worker process might also have run things before and will run even more things after.

So the next piece I implemented was a configuration mechanism to help users configure the same log handler across multiple Python processes, and for a relevant time period: for example, the lifetime of an agent inside a worker.

This is driven through an Academy-specific LogConfig abstract class, which is designed to carry the configuration for a particular kind of Python log handler around between different processes.

Adding interesting `extra` info

Everywhere something interesting happens in the Academy core code, something that might be interesting to an observability handler, the codebase needs to have a log call, ideally with lots of juicy information in the extra dictionary. We've already added some of these: for example, agent identities, and when actions are invoked, a identifier to help tie together information about the action on the submit side, in the exchange and on the agent side. I expect that information to get richer as we discover more things we want to know (and you should open an issue or a pull request if you see something interesting to record).

A JSON based example

Here's something that is easier to do now that was hard to do before: look at the timings and message flows of agent interactions on both the submit side and agent side.

I'll only sketch new features in the code here, not give full executable examples.

First run an agentic workflow (a submit side and some agents) with logs flowing into files in JSON format:

async with await Manager.from_exchange_factory(...,
    log_config=JSONPoolLogging(),
    ) as manager:
        # run your agents and talk to them

This workflow will run with the JSON pool handler configured both in the submit process and also wherever your agents run. JSON pool logging will put all of your log files into a directory under ~/local/share/academy/logs. Each separate process will get its own file. If your agent ran somewhere with a different home file system, you can copy the JSON log files into one place using your favourite file transfer tool.

With all the events in JSON format, you can process them in different ways: I often load them into a Python process, I've also tried putting them into an SQLite database, but here I'll use jq from the command line:

$ cd ~/local/share/academy/logs/da7d98d6-5196-4aec-adf1-91d4c1878393/
$ ls
2a158e17-b35f-49ec-9b30-5101924390a4.jsonlog
44d9d9da-0f64-4b92-946b-9258f80f3f8d.jsonlog

Here's an example of querying which Python modules have anything: there are academy modules, but also other modules that also use Python's logging system - support libraries such as asyncio and agent code which in my test case lives in a module called fibiterate7.

$ jq -s '[.[].name] | unique' *.jsonlog
[
  "__main__",
  "academy.exchange.client",
  "academy.exchange.cloud.client",
  "academy.handle",
  "academy.logging.configs.jsonpool",
  "academy.manager",
  "academy.runtime",
  "asyncio",
  "fibiterate7lib",
  "parsl.dataflow.dflow",
  "parsl.dataflow.memoization",
  "parsl.executors.base",
  "parsl.executors.high_throughput.executor",
  "parsl.executors.high_throughput.zmq_pipes",
  "parsl.executors.status_handling",
  "parsl.executors.threads",
  "parsl.jobs.job_status_poller",
  "parsl.jobs.strategy",
  "parsl.monitoring.monitoring",
  "parsl.multiprocessing",
  "parsl.process_loggers",
  "parsl.providers.local.local",
  "parsl.usage_tracking.usage",
  "parsl.utils"
]

Here's an example tracking one particular message flow across both log files using a message ID that I extracted by hand, sorted by log timestamp (which Python calls created), showing the process (4472 is my submit side, 4521 is where the agent was running) and the human readable message.

$ jq -s 'sort_by(.created) | 
  .[] |
  select(."academy.action_tag"=="8e28bb12-e5cc-46c7-8298-7d466eff4098") |
  {process,created,message}'
  *.jsonlog
{
  "process": "4472",
  "created": "1778498011.6303687",
  "message": "Invoking action __anext__ with tag
             id 8e28bb12-e5cc-46c7-8298-7d466eff4098"
}
{
  "process": "4472",
  "created": "1778498011.6309419",
  "message": "Sending action request from UserId
             to AgentId<7cc2f0a6> (action='__anext__')"
}
{
  "process": "4472",
  "created": "1778498011.635122",
  "message": "Waiting for result of action __anext__ with
             tag id 8e28bb12-e5cc-46c7-8298-7d466eff4098"
}
{
  "process": "4521",
  "created": "1778498011.6369267",
  "message": "Invoking action __anext__ with invocation id
             8e28bb12-e5cc-46c7-8298-7d466eff4098"
}
{
  "process": "4521",
  "created": "1778498012.1386456",
  "message": "Completed action __anext__ with invocation id
             8e28bb12-e5cc-46c7-8298-7d466eff4098"
}
{
  "process": "4472",
  "created": "1778498012.1435366",
  "message": "Successfully completed action __anext__ with
             tag id 8e28bb12-e5cc-46c7-8298-7d466eff4098"
}

What you can do

Right now you can choose how your logs are delivered: to the console, to traditional log files, or in JSON format. I hope the latter encourages you away from the path of regexping-your-logs.

If you're interested in building and experimenting, a couple of interesting paths forward for me are: more interesting ways to move and store logging events (for example, Diaspora and Flowcept), and more interesting ways to present that information (are you interesting in performance visualization? in analysing how results came to be?).