AprilTags look eerily similar to QR codes. They pass off as grossly simplified QR codes actually, with big black squares.
These are used as fiducial markers in robotics and elsewhere.
A fiducial marker or fiducial is an object placed in the field of view of an imaging system that appears in the image produced, for use as a point of reference or a measure
Though interesting, it begs the question: Why are QR codes not sufficient given that they can contain much more information? Find out in this post!
What are AprilTags?
First of all, April is a reputable robotics lab from the University of Michigan that has even been awarded the Classic Papers: Articles That Have Stood The Test of Time label from Google Scholar for the robotics field. AprilTag is a library and algorithm that enables a system to localize features of the 6 degrees of freedom (DOF) from a single image. These are the 6 DOF, 3 for translation and 3 for rotation.
It comes in different formats, called families.
The circle one can be used to display the tag in a circular area. The custom one has space to include yet another tag.
And each tag has an ID. Nvidia’s Isaac engine for example returns detected tags in the format <tagFamily>_<tagID>, example: tag36h11_7 among other info.
In the above table [5], we can see the differences between the tag families, but more importantly, we see that we have only 30 tags for the 16H5 family.
The id is actually, well an id for the tag. Since we have a fixed number of tags per family, AprilTags are already generated for you in this repo, you can download the images, print them, and assign meanings to the IDs, just a note that Gihub may not list all images.
There is also a similar system called ArUco which is easier to deal with, but, AprilTag is better suited for serious applications where speed and robustness are prioritized.
These systems started with the ARToolkit [2], by matching images with patterns in a database, which grew with the number of tags. Over time better algorithms were developed.
Meaning of AprilTag names
In 36H11, 36 means the number of bits and 11 is the hamming distance. More bits means more tags are available to choose from and a high hamming distance means the tag can be properly identified despite errors. Hamming distance is a measure between strings or the amount of substitutions needed to change from one string or vector to the other. AprilTag maintains a minimum hamming distance between tags to ensure better detection, but it also means that fewer tags are available. AprilTag also ensures that a minimum hamming distance is maintained even when the tags are rotated.
The more bits a family has, the more pixels are needed to be decoded. So, fewer bit tags can be read at longer distances if the tags are of the same sizes. 16H5 for example can be read at longer distances than 52H13.
The difference between AprilTag and QR code
In contrast with QR codes, an AprilTag “is designed to be automatically detected and localized even when it is at very low resolution, unevenly lit, oddly rotated, or tucked away in the corner of an otherwise cluttered image” [1]
Only the localization markers of QR codes take up 268 pixels, in contrast, AprilTags take 49 to 100 pixels including the payload [1].
Since AprilTag is a fiducial system and not a 2D barcode, it can detect multiple tags in an image.
The evolution of AprilTag
Up to now, AprilTag has published 4 papers. Here is a summary of each one of them. Just a note that it involves image processing algorithms.
AprilTag: A robust and flexible visual fiducial system (2011)
This paper introduces the AprilTag system and improvements over existing systems. It consists of 2 parts, the detector and the coding system.
First it pre-processes the image. The main focus here is (third) below, where a graph-based clustering method is used.
Then after it has an image consisting of lines (fourth above), it detects quads, i.e. lines forming a four-sided shape. It handles occulations by detecting quads even with significant gaps around the edges.
Then it detects the position and orientation of the image by mapping between the camera and tag coordinates using the camera focal’s length and the physical size of the tag.
Then it maps the tag coordinates back to the image and thresholds the pixels using a custom function, turning it into black and white.
Then the payload is extracted from it and determined whether or not it is valid.
The algorithm for generating lexicodes ensures a minimum hamming distance even when the tags are rotated. It also ensures that the resulting images are not too simplistic, for example, being only a black square. This was better than previous systems. Generating code tags for bigger families can take days, but many useful code families are already included with the software.
One of the biggest advantages of AprilTag was that, despite being robust, it was also open source. Many previous systems were closed-source, leading to less scrutiny and probing. This also meant that AprilTag got widely adopted.
AprilTag 2: Efficient and robust fiducial detection (2016)
AprilTag2 redesigned the tag detector to improve the detection speed and sensitivity while also trading off the ability to detect partially occluded tags, which they found not to be used much in real life. It improved the detection algorithm and developed a new performant boundary segmentation method.
.This paper worked on the thresholding part, by transitioning to an adaptive thresholding method (b) above which excludes cells with insufficient contrast colored gray here. It does so to save computation time.
To identify quads, instead of detecting the next dark cells (which can be broken by a white cell) of the black-and-white image to form lines, it groups white and black areas into clusters and detects edges based on these edges, allowing the breaking white pixel to form part of both segments.
Then it tries to fit a quad to each cluster of points, finding partitions by finding corner points and checking all corner points. It stores the corner points in a winding manner around the centroid such that neighboring points are stored next to each other, which means they are sorted. The method also computes the first and second moments of statistics for any range in constant time. Leveraging these values, it passes lines around candidate quads, identifies the corner points by fitting lines into windows of adjacent points, and chooses the four corners that result in the smallest mean squared line fit errors.
Then it filters out poor quads and returns valid quads for decoding by comparing the content of the quad with known codewords.
The detected code is XORed with each code in a family ideally to find the tag with the least hamming distance, then it is identified as that tag. However, it is only considered against codes having two-bit errors. These enumerated tags are pre-stored to speed computation.
For better pose estimation, an edge refinement method was developed. Though the quad identification method described above is good for locating tags, the edges are impacted by shadows and glares. This method samples gradient intensity along edges to identify robust edge lines, thereby computing quad corners.
Flexible Layouts for Fiducial Tags (2019)
Reffered to as AprilTag3, this paper introduces flexible tags layout with thinner borders, custom layouts like this one:
and circular ones. It also surprisingly increases the detection speed and range at which tags can be identified.
This paper treated AprilTag layouts as strings, with white (w), black (b), data (d) and ignore (x).
Data bits are also now allowed to be on the border, with these kinds of tags possible, giving each tag an additional 16 data bits. This also increases the hamming distance and generates more tags than traditional tags.
To have tags different from natural images to be more easily detected, tags should be generated using a suitable complexity metric, here the Ising model was chosen.
Unlinked to the flexible tag system, the detector system received several improvements.
The decimation step helps reduce the image size using box filter. Then as previous improvements grouped components, to better find connected components, the union-find algorithm was developed. Then calculations for fitting the quadrilaterals was improved, the same overall method was retained. Then tag decoding was improved by doing better perspective correction. For detecting small tags, Bilinear Interpolation was used to estimate pixel values more smoothly and accurately as it calculates the value of a pixel based on the weighted average of 4 nearest pixels. Then the values are sharpened to counteract blurring using Laplacian Kernel.
AprilCal: Assisted and repeatable camera calibration (2013)
This one is a paper showing that an Apriltag-aided method (AprilCal) is better than traditional camera calibration methods.
Conclusion
For me, this whole dive into AprilTag is a compelling testimony to the abilities of computer vision. It shows the usefulness of computer vision as a branch of computer science. Reading with a magnifying glass through the evolution of AprilTag shows that a system can be thought out from scratch, and incrementally improved using conceptual ideas only. AprilTag copied ideas from existing systems, implemented a better system, and open-sourced it. Subsequent papers from students showed true scholarship in paper publishing. Each paper provided considerable improvements in both method and speed of processing, with paradigm-shifting concepts. This showed that AprilLab has successfully cultivated within its confine a true spirit of heavy-weight research.
This newsletter will explore more in the robotics space, please, stay tuned.
References
[1] AprilTag: A robust and flexible visual fiducial system
[2] AprilTag 2: Efficient and robust fiducial detection
[3] Flexible Layouts for Fiducial Tags
[4] AprilCal: Assisted and repeatable camera calibration