ARTIQ Real-Time I/O concepts

The ARTIQ Real-Time I/O design employs several concepts to achieve its goals of high timing resolution on the nanosecond scale and low latency on the microsecond scale while still not sacrificing a readable and extensible language.

In a typical environment two very different classes of hardware need to be controlled. One class is the vast arsenal of diverse laboratory hardware that interfaces with and is controlled from a typical PC. The other is specialized real-time hardware that requires tight coupling and a low-latency interface to a CPU. The ARTIQ code that describes a given experiment is composed of two types of “programs”: regular Python code that is executed on the host and ARTIQ kernels that are executed on a core device. The CPU that executes the ARTIQ kernels has direct access to specialized programmable I/O timing logic (part of the gateware). The two types of code can invoke each other and transitions between them are seamless.

The ARTIQ kernels do not interface with the real-time gateware directly. That would lead to imprecise, indeterminate, and generally unpredictable timing. Instead the CPU operates at one end of a bank of FIFO (first-in-first-out) buffers while the real-time gateware at the other end guarantees the all or nothing level of excellent timing precision.

A FIFO for an output channel holds timestamps and event data describing when and what is to be executed. The CPU feeds events into this FIFO. A FIFO for an input channel contains timestamps and event data for events that have been recorded by the real-time gateware and are waiting to be read out by the CPU on the other end.

Timeline and terminology

The set of all input and output events on all channels constitutes the timeline. A high-resolution wall clock (rtio_counter_mu) counts clock cycles and manages the precise timing of the events. Output events are executed when their timestamp matches the current clock value. Input events are recorded when they reach the gateware and stamped with the current clock value accordingly.

The kernel runtime environment maintains a timeline cursor (called now_mu) used as the timestamp when output events are submitted to the FIFOs. Both now_mu and rtio_counter_mu are counted in integer machine units, or mu, rather than SI units. The machine unit represents the maximum resolution of RTIO timing in an ARTIQ system. The duration of a machine unit is the reference period of the system, and may be changed by the user, but normally corresponds to a duration of one nanosecond.

The timeline cursor now_mu can be moved forward or backward on the timeline using artiq.language.core.delay() and artiq.language.core.delay_mu() (for delays given in SI units or machine units respectively). The absolute value of now_mu on the timeline can be retrieved using artiq.language.core.now_mu() and it can be set using artiq.language.core.at_mu(). The difference between the cursor and the wall clock is referred to as slack. A system is considered in a situation of positive slack when the cursor is ahead of the wall clock, i.e., in the future; respectively, it is in negative slack if the cursor is behind the wall clock, i.e. in the past.

RTIO timestamps, the timeline cursor, and the rtio_counter_mu wall clock are all counted relative to the core device startup/boot time. The wall clock keeps running across experiments.

Absolute timestamps can be large numbers. They are represented internally as 64-bit integers. With a typical one-nanosecond machine unit, this covers a range of hundreds of years. Conversions between such a large integer number and a floating point representation can cause loss of precision through cancellation. When computing the difference of absolute timestamps, use self.core.mu_to_seconds(t2-t1), not self.core.mu_to_seconds(t2)-self.core.mu_to_seconds(t1) (see mu_to_seconds()). When accumulating time, do it in machine units and not in SI units, so that rounding errors do not accumulate.

Note

Absolute timestamps are also referred to as RTIO fine timestamps, because they run on a significantly finer resolution than the timestamps of the so-called coarse RTIO clock, the actual clocking signal provided to or generated by the core device. The frequency of the coarse RTIO clock is set by the core device clocking settings but is most commonly 125MHz, which corresponds to eight one-nanosecond machine units per coarse RTIO cycle.

The coarse timestamp of an event is its timestamp as according to the lower resolution of the coarse clock. It is in practice a truncated version of the fine timestamp. In general, ARTIQ offers precision on the fine level, but operates at the coarse level; this is rarely relevant to the user, but understanding it may clarify the behavior of some RTIO issues (e.g. sequence errors).

The following basic example shows how to place output events on the timeline. It emits a precisely timed 2 µs pulse:

ttl.on()
delay(2*us)
ttl.off()

The device ttl represents a single digital output channel (artiq.coredevice.ttl.TTLOut). The artiq.coredevice.ttl.TTLOut.on() method places an rising edge on the timeline at the current cursor position (now_mu). Then the cursor is moved forward 2 µs and a falling edge is placed at the new cursor position. Later, when the wall clock reaches the respective timestamps, the RTIO gateware executes the two events.

The following diagram shows what is going on at the different levels of the software and gateware stack (assuming one machine unit of time is 1 ns):

This sequence is exactly equivalent to:

ttl.pulse(2*us)

This method artiq.coredevice.ttl.TTLOut.pulse() advances the timeline cursor (using delay() internally) by exactly the amount given. ther methods such as on(), off(), set() do not modify the timeline cursor. The latter are called zero-duration methods.

Output errors and exceptions

Underflows

A RTIO ouput event must always be programmed with a timestamp in the future. In other words, the timeline cursor now_mu must be in advance of the current wall clock rtio_counter_mu: the past cannot be altered. The following example tries to place a rising edge event on the timeline. If the current cursor is in the past, an artiq.coredevice.exceptions.RTIOUnderflow exception is thrown. The experiment attempts to handle the exception by moving the cursor forward and repeating the programming of the rising edge:

try:
    ttl.on()
except RTIOUnderflow:
    # try again at the next mains cycle
    delay(16.6667*ms)
    ttl.on()

Once the timeline cursor has overtaken the wall clock, the exception does not reoccur and the event can be scheduled successfully. This can also be thought of as adding positive slack to the system.

To track down RTIOUnderflow exceptions in an experiment there are a few approaches:

Exception backtraces show where underflow has occurred while executing the code.

The integrated logic analyzer shows the timeline context that lead to the exception. The analyzer is always active and supports plotting of RTIO slack. This makes it possible to visually find where and how an experiment has ‘run out’ of positive slack.

Sequence errors

A sequence error occurs when a sequence of coarse timestamps cannot be transferred to the gateware. Internally, the gateware stores output events in an array of FIFO buffers (the ‘lanes’). Within each particular lane, the coarse timestamps of events must be strictly increasing.

If an event with a timestamp coarsely equal to or lesser than the previous timestamp is submitted, or if the current lane is nearly full, the scaleable event dispatcher (SED) selects the next lane, wrapping around once the final lane is reached. If this lane also contains an event with a timestamp equal to or beyond the one being submitted, the placement fails and a sequence error occurs.

Note

For performance reasons, unlike RTIOUnderflow, most gateware errors do not halt execution of the kernel, because the kernel cannot wait for potential error reports before continuing. As a result, sequence errors are not raised as exceptions and cannot be caught. Instead, the offending event – in this case, the event that could not be queued – is discarded, the experiment continues, and the error is reported in the core log. To check the core log, use the command artiq_coremgmt log.

By default, the ARTIQ SED has eight lanes, which normally suffices to avoid sequence errors, but problems may still occur if many (>8) events are issued to the gateware with interleaving timestamps. Due to the strict timing limitations imposed on RTIO gateware, it is not possible for the SED to rearrange events in a lane once submitted, nor to anticipate future events when making lane choices. This makes sequence errors fairly ‘unintelligent’, but also generally fairly easy to eliminate by manually rearranging the generation of events (not rearranging the timing of the events themselves, which is rarely necessary.)

It is also possible to increase the number of SED lanes in the gateware, which will reduce the frequency of sequencing issues, but will correspondingly put more stress on FPGA resources and timing.

Other notes:

Strictly increasing (coarse) timestamps never cause sequence errors.
Strictly increasing fine timestamps within the same coarse cycle may still cause sequence errors.
The number of lanes is a hard limit on the number of RTIO output events that may be emitted within one coarse cycle.
Zero-duration methods (such as artiq.coredevice.ttl.TTLOut.on()) do not advance the timeline and so will always consume additional lanes if they are scheduled simultaneously. Adding a delay of at least one coarse RTIO cycle will prevent this (e.g. delay_mu(np.int64(self.core.ref_multiplier))).
Whether a particular sequence of timestamps causes a sequence error or not is fully deterministic (starting from a known RTIO state, e.g. after a reset). Adding a constant offset to the sequence will not affect the result.

Note

To change the number of SED lanes, it is necessary to recompile the gateware and reflash your core device. Use the sed_lanes field in your system description file to set the value, then follow the instructions in Building and developing ARTIQ. Alternatively, if you have an active firmware subscription with M-Labs, contact helpdesk@ for edited binaries.

Collisions

A collision occurs when events are submitted to a given RTIO output channel at a resolution the channel is not equipped to handle. Some channels implement ‘replacement behavior’, meaning that RTIO events submitted to the same timestamp will override each other (for example, if a ttl.off() and ttl.on() are scheduled to the same timestamp, the latter automatically overrides the former and only ttl.on() will be submitted to the channel). On the other hand, if replacement behavior is absent or disabled, or if the two events have the same coarse timestamp with differing fine timestamps, a collision error will be reported.

Like sequence errors, collisions originate in gateware and do not stop the execution of the kernel. The offending event is discarded and the problem is reported asynchronously via the core log.

Busy errors

A busy error occurs when at least one output event could not be executed because the output channel was already busy executing an event. This differs from a collision error in that a collision is triggered when a sequence of events overwhelms communication with a channel, and a busy error is triggered when execution is overwhelmed. Busy errors are only possible in the context of single events with execution times longer than a coarse RTIO clock cycle; the exact parameters will depend on the nature of the output channel (e.g. the specific peripheral device).

Offending event(s) are discarded and the problem is reported asynchronously via the core log.

Input channels and events

Input channels detect events, timestamp them, and place them in a buffer for the experiment to read out. The following example counts the rising edges occurring during a precisely timed 500 ns interval. If more than 20 rising edges are received, it outputs a pulse:

if input.count(input.gate_rising(500*ns)) > 20:
    delay(2*us)
    output.pulse(500*ns)

Note that many input methods will necessarily involve the wall clock catching up to the timeline cursor or advancing before it. This is to be expected: managing output events means working to plan the future, but managing input events means working to react to the past. For input channels, it is the past that is under discussion.

In this case, the gate_rising() waits for the duration of the 500ns interval (or gate window) and records an event for each rising edge. At the end of the interval it exits, leaving the timeline cursor at the end of the interval (now_mu = rtio_counter_mu). count() unloads these events from the input buffers and counts the number of events recorded, during which the wall clock necessarily advances (rtio_counter_mu > now_mu). Accordingly, before we place any further output events, a delay() is necessary to re-establish positive slack.

Similar situations arise with methods such as TTLInOut.sample_get and TTLInOut.watch_done.

Overflow exceptions

The RTIO input channels buffer input events received while an input gate is open, or when using the sampling API (TTLInOut.sample_input) at certain points in time. The events are kept in a FIFO until the CPU reads them out via e.g. count(), timestamp_mu() or sample_get(). The size of these FIFOs is finite and specified in gateware; in practice, it is limited by the resources available to the FPGA, and therefore differs depending on the specific core device being used. If a FIFO is full and another event comes in, this causes an overflow condition. The condition is converted into an RTIOOverflow exception that is raised on a subsequent invocation of one of the readout methods. Overflow exceptions are generally best dealt with simply by reading out from the input buffers more frequently. In odd or particular cases, users may consider modifying the length of individual buffers in gateware.

Note

It is not possible to provoke an RTIOOverflow on a RTIO output channel. While output buffers are also of finite size, and can be filled up, the CPU will simply stall the submission of further events until it is once again possible to buffer them. Among other things, this means that padding the timeline cursor with large amounts of positive slack is not always a valid strategy to avoid RTIOUnderflow exceptions when generating fast event sequences. In practice only a fixed number of events can be generated in advance, and the rest of the processing will be carried out when the wall clock is much closer to now_mu.

For larger numbers of events which run up against this restriction, the correct method is to use Direct Memory Access (DMA). In edge cases, enabling event spreading (see below) may fix the problem.

Event spreading

By default, the SED only ever switches lanes for timestamp sequence reasons, as described above in Sequence errors. If only output events of strictly increasing coarse timestamps are queued, the SED fills up a single lane and stalls when it is full, regardless of the state of other lanes. This is preserved to avoid nondeterminism in sequence errors and corresponding unpredictable failures (since the timing of ‘fullness’ depends on the timing of when events are queued, which can vary slightly based on CPU execution jitter).

For better utilization of resources and to maximize buffering capacity, event spreading may be enabled, which allows the SED to switch lanes immediately when they reach a certain high watermark of ‘fullness’, increasing the number of events that can be queued before stalls ensue. To enable event spreading, use the sed_spread_enable config key and set it to 1:

$ artiq_coremgmt config write -s sed_spread_enable 1

This will change where and when sequence errors occur in your kernels, and might cause them to vary from execution to execution of the same experiment. It will generally reduce or eliminate RTIOUnderflow exceptions caused by queueing stalls and significantly increase the threshold on sequence length before DMA becomes necessary.

Note that event spreading can be particularly helpful in DRTIO satellites, as it is the space remaining in the fullest FIFO that is used as a metric for when the satellite can receive more data from the master. The setting is not system-wide and can and must be set independently for each core device in a system. In other words, to enable or disable event spreading in satellites, flash the satellite core configuration directly; this will have no effect on any other satellites or the master.

Seamless handover

The timeline cursor persists across kernel invocations. This is demonstrated in the following example where a pulse is split across two kernels:

def run():
  k1()
  k2()

@kernel
def k1():
  ttl.on()
  delay(1*s)

@kernel
def k2():
  ttl.off()

Here, run() calls k1() which exits leaving the cursor one second after the rising edge and k2() then submits a falling edge at that position.

Synchronization

The seamless handover of the timeline (cursor and events) across kernels and experiments implies that a kernel can exit long before the events it has submitted have been executed. Generally, this is preferable: it frees up resources to the next kernel and allows work to be carried on from kernel to kernel without interruptions.

However, as a result, no guarantees are made about the state of the system when a new kernel enters. Slack may be positive, negative, or zero; input channels may be filled to overflowing, or empty; output channels may contain events currently being executed, contain events scheduled for the far future, or contain no events at all. Unexpected negative slack can cause RTIOUnderflows. Unexpected large positive slack may cause a system to appear to ‘lock’, as all its events are scheduled for a distant future and the CPU must wait for the output buffers to empty to continue.

As a result, when beginning a new experiment in an uncertain context, we often want to clear the RTIO FIFOs and initialize the timeline cursor to a reasonable point in the near future. The method artiq.coredevice.core.Core.reset() (self.core.reset()) is provided for this purpose. The example idle kernel implements this mechanism.

On the other hand, if a kernel exits while some of its events are still waiting to be executed, there is no guarantee made that the events in question ever will be executed (as opposed to being flushed out by a subsequent core reset). If a kernel should wait until all its events have been executed, use the method wait_until_mu() with a timestamp after (or at) the last event:

In many cases, now_mu() will return an appropriate timestamp:

self.core.wait_until_mu(now_mu())