Tracking API
The Spectacular AI tracking API (a.k.a. “VIO” API) consists of the common output types relevant for real-time 6-degree-of-freedom pose tracking.
Coordinate systems
The SDK uses the following coordinate conventions, which are also elaborated in the diagram below
World coordinate system: Right-handed Z-is-up
Camera coordinate system: OpenCV convention (see here) for a nice illustration), which is also right-handed
These conventions are different from, e.g., Intel RealSense SDK (cf. here)), ARCore, Unity and most OpenGL tutorials, most of which use an “Y-is-up” coordinate system, often different camera coordinates systems, and sometimes different pixel (or “NDC”) coordinate conventions.
By default, the spectacularAI.Pose
object returned by the Spectacular AI SDK uses the left camera as the local reference frame.
To get the pose for another camera, use either spectacularAI.VioOutput.getCameraPose()
(OR spectacularAI.depthai.Session.getRgbCameraPose()
on OAK-Ds)
6-DoF pose output
- class spectacularAI.Camera(self: spectacularAI.Camera, arg0: List[List[float[3]][3]], arg1: int, arg2: int)
Represents the intrinsic parameters of a particular camera. If the input image is distorted, the camera and projection matrices correspond to the undistorted / rectified image.
Build a pinhole camera
- getIntrinsicMatrix(self: spectacularAI.Camera) numpy.ndarray
3x3 intrinsic camera matrix (OpenCV convention, undistorted).
- getProjectionMatrixOpenGL(self: spectacularAI.Camera, arg0: float, arg1: float) numpy.ndarray
4x4 projection matrix for OpenGL (undistorted)
- pixelToRay(self: spectacularAI.Camera, arg0: spectacularAI.PixelCoordinates) object
Convert pixel coordinates to camera coordinates
- Parameters:
arg0
PixelCoordinates
: pixel coordinates.- Returns:
Vector3d
ray in camera coordinates on succesful conversion. None otherwise.
- rayToPixel(self: spectacularAI.Camera, arg0: spectacularAI.Vector3d) object
Convert camera coordinates to pixel coordinates
- Parameters:
arg0
Vector3d
: ray in camera coordinates.- Returns:
PixelCoordinates
pixel coordinates on succesful conversion (note that the pixel can be outside image boundaries). None otherwise.
- class spectacularAI.CameraPose
Represents the pose (position & orientation) and other parameters of a particular camera.
- getCameraToWorldMatrix(self: spectacularAI.CameraPose) numpy.ndarray
4x4 homogeneous camera-to-world matrix
- getPosition(self: spectacularAI.CameraPose) spectacularAI.Vector3d
Vector3d
position of the camera
- getWorldToCameraMatrix(self: spectacularAI.CameraPose) numpy.ndarray
4x4 homogeneous world-to-camera matrix
- pixelToWorld(self: spectacularAI.CameraPose, arg0: spectacularAI.PixelCoordinates) object
Convert pixel coordinates to rays in world coordinates
- Parameters:
arg0
PixelCoordinates
: pixel coordinates to convert.- Returns:
A tuple with
Vector3d
origin of the ray in world coordinates,Vector3d
direction of ray from the origin on succesful conversion. None otherwise.
- worldToPixel(self: spectacularAI.CameraPose, arg0: spectacularAI.Vector3d) object
Convert world coordinates to pixel coordinates
- Parameters:
arg0
Vector3d
: point in world coordinates coordinates.- Returns:
PixelCoordinates
pixel coordinates on succesful conversion (note that the pixel can be outside image boundaries). None otherwise.
- class spectacularAI.VioOutput
Main output structure
- asJson(self: spectacularAI.VioOutput) str
a JSON representation of this object
- getCameraPose(self: spectacularAI.VioOutput, arg0: int) spectacularAI.CameraPose
CameraPose
corresponding to a camera whose index is given as the parameter. Index 0 corresponds to the primary camera and index 1 the secondary camera.
- property globalPose
GnssVioOutput
Global pose, only returned if GNSS information is provided viaSession
.addGnss(…)
- property positionCovariance
position uncertainty, 3x3 covariance matrix
- property status
current
TrackingStatus
- property tag
input tag from
addTrigger
. Set to 0 for other outputs.
- property velocityCovariance
velocity uncertainty, 3x3 covariance matrix
- class spectacularAI.GnssVioOutput
GNSS-VIO output
- property coordinates
current
WgsCoordinates
- property enuPositionCovariance
enu position uncertainty, 3x3 covariance matrix
- getEnuCameraPose(self: spectacularAI.GnssVioOutput, arg0: int, arg1: spectacularAI.WgsCoordinates) spectacularAI::CameraPose
Get the global pose of a particular camera. The “world” coordinate system of the camera pose is an East-North-Up system, whose origin is at the given WGS84 coordinates.
- property orientation
current
Quaternion
- property velocityCovariance
velocity uncertainty, 3x3 covariance matrix
Defined in #include <spectacularAI/output.hpp>
-
namespace spectacularAI
Typedefs
-
using VioOutputPtr = std::shared_ptr<const VioOutput>
-
struct AutoExposureOutput
Auto exposure output structure. Each auto exposure output corresponds to a specific image with timestamp and (optional) tag. Contains exposure information that can be used to implement different auto exposure control algorithms, depending on the device.
Public Members
-
int deltaBrightness
Determines how much brightness should be changed (±0-255)
-
uint8_t brightness
Approx. image brightness (0-255)
-
uint32_t minExposureTime
Minimum exposure time for auto exposure (microseconds)
-
uint32_t maxExposureTime
Maximum exposure time for auto exposure (microseconds)
-
uint32_t minGain
Minimum auto exposure gain (ISO)
-
uint32_t maxGain
Maximum auto exposure gain (ISO)
-
double timestamp
Image timestamp (seconds)
-
int tag
Input frame tag (optional)
-
int deltaBrightness
-
struct Camera
Public Functions
-
virtual bool pixelToRay(const PixelCoordinates &pixel, Vector3d &ray) const = 0
Convert pixel coordinates to camera coordinates
- Parameters:
pixel –
ray –
- Returns:
true if conversion succeeded
-
virtual bool rayToPixel(const Vector3d &ray, PixelCoordinates &pixel) const = 0
Convert camera coordinates to pixel coordinates
- Parameters:
ray –
pixel –
- Returns:
true if conversion succeeded
-
virtual Matrix3d getIntrinsicMatrix() const = 0
Defined as the rectified intrinsic matrix if there’s distortion.
- Returns:
OpenCV convention fx 0 ppx 0 fy ppy 0 0 1
-
virtual Matrix4d getProjectionMatrixOpenGL(double nearClip, double farClip) const = 0
Project from camera coordinates to normalized device coordinates (NDC).
- Parameters:
nearClip –
farClip –
-
virtual ~Camera()
-
virtual bool pixelToRay(const PixelCoordinates &pixel, Vector3d &ray) const = 0
-
struct CameraPose
Pose of camera and conversions between world, camera, and pixel coordinates.
Public Functions
-
Matrix4d getWorldToCameraMatrix() const
Matrix that converts homogeneous world coordinates to homogeneous camera coordinates
-
Matrix4d getCameraToWorldMatrix() const
Matrix that converts homogeneous camera coordinates to homogeneous world coordinates
-
Vector3d getPosition() const
Position in world coordinates
-
bool pixelToWorld(const PixelCoordinates &pixel, Vector3d &origin, Vector3d &ray) const
Convert pixel coordinates to rays in world coordinates.
- Parameters:
pixel –
origin – current camera position in world coordinates
ray – direction of ray from the origin
- Returns:
true if conversion succeeded
-
bool worldToPixel(const Vector3d &point, PixelCoordinates &pixel) const
Convert world coordinates to pixel coordinates
- Parameters:
point –
pixel –
- Returns:
true if conversion succeeded
-
Matrix4d getWorldToCameraMatrix() const
-
struct GnssVioOutput
Public Functions
-
virtual CameraPose getEnuCameraPose(int cameraId, WgsCoordinates enuOrigin) const = 0
Get the global pose of a particular camera. The “world” coordinate system of the camera pose is an East-North-Up system, whose origin is at the given WGS84 coordinates.
- Parameters:
cameraId – index of the camera whose pose is returned
enuOrigin – the origin of the ENU coordinate system
-
virtual ~GnssVioOutput()
Public Members
-
WgsCoordinates coordinates
Position in global coordinates in WGS84 (mean)
-
Quaternion orientation
Orientation of the device as quaternion representing camera-to-ENU transformation, i.e., a camera-to-world where the world coordinates are defined in an East-North-Up coordinate system. Note that the origin of the ENU coordinates system does not matter in this context.
See the helper methods for obtaining the Euler angles.
-
Vector3d velocity
Velocity in ENU coordinates (m/s)
-
Vector3d angularVelocity
Angular velocity in ENU coordinates
-
Matrix3d enuPositionCovariance
Uncertainty of the estimate as a covariance in an East-North Up (ENU) coordiante system. See the helper methods for obtaining a single vertical and horizontal number.
-
Matrix3d velocityCovariance
Uncertainty of velocity as a 3x3 covariance matrix (in ENU coordinates)
-
virtual CameraPose getEnuCameraPose(int cameraId, WgsCoordinates enuOrigin) const = 0
-
struct VioOutput
Main output structure
Public Functions
-
virtual CameraPose getCameraPose(int cameraId) const = 0
Current pose in camera coordinates
- Parameters:
cameraId – 0 for primary, 1 for secondary camera
-
virtual std::string asJson() const
Returns the output to a JSON string if supported. Otherwise returns an empty string.
-
virtual ~VioOutput()
Public Members
-
TrackingStatus status
Current tracking status
-
Pose pose
The current pose, with the timestamp in the clock used for input sensor data and camera frames.
-
Vector3d velocity
Velocity vector (xyz) in m/s in the coordinate system used by pose.
-
Vector3d angularVelocity
Angular velocity vector in SI units (rad/s) and the coordinate system used by pose.
-
Vector3d acceleration
Linear acceleration in SI units (m/s^2).
-
Matrix3d positionCovariance
Uncertainty of the current position as a 3x3 covariance matrix
-
Matrix3d velocityCovariance
Uncertainty of velocity as a 3x3 covariance matrix
-
std::vector<Pose> poseTrail
List of poses, where the first element corresponds to current pose and the following (zero or more) values are the recent smoothed historical positions
-
std::vector<FeaturePoint> pointCloud
Point cloud (list of FeaturePoints) that correspond to features currently seen in the primary camera frame.
-
int tag
The input frame tag. This is the value given in addFrame… methods
-
std::shared_ptr<const GnssVioOutput> globalPose
GNSS-VIO output if available, otherwise {} (nullptr)
-
virtual CameraPose getCameraPose(int cameraId) const = 0
-
using VioOutputPtr = std::shared_ptr<const VioOutput>
Common types
- class spectacularAI.Pose
Represents the pose (position & orientation) of a device at a given time. This typically corresponds the pose of the IMU (configurable). See
CameraPose
for exact poses of the cameras.- asMatrix(self: spectacularAI.Pose) numpy.ndarray
4x4 matrix that converts homogeneous local coordinates to homogeneous world coordinates
- fromMatrix(self: float, arg0: List[List[float[4]][4]]) spectacularAI.Pose
Create a pose from a timestamp and 4x4 local-to-world matrix
- property orientation
Quaternion
orientation of the IMU / camera, local-to-world
- property time
float
timestamp in seconds, synchronized with device monotonic time (not host)
- class spectacularAI.Vector3d(*args, **kwargs)
Vector in R^3. Can represent, e.g., velocity, position or angular velocity. Each property is a
float
.Overloaded function.
__init__(self: spectacularAI.Vector3d) -> None
__init__(self: spectacularAI.Vector3d, arg0: float, arg1: float, arg2: float) -> None
- property x
- property y
- property z
- class spectacularAI.Vector3f(*args, **kwargs)
Vector in R^3. Single precision.
Overloaded function.
__init__(self: spectacularAI.Vector3f) -> None
__init__(self: spectacularAI.Vector3f, arg0: float, arg1: float, arg2: float) -> None
- property x
- property y
- property z
- class spectacularAI.Quaternion
Quaternion representation of a rotation. Hamilton convention. Each property is a
float
.- property w
- property x
- property y
- property z
- class spectacularAI.TrackingStatus(self: spectacularAI.TrackingStatus, value: int)
Members:
INIT
TRACKING
LOST_TRACKING
- INIT = <TrackingStatus.INIT: 0>
- LOST_TRACKING = <TrackingStatus.LOST_TRACKING: 2>
- TRACKING = <TrackingStatus.TRACKING: 1>
- property name
- property value
- class spectacularAI.ColorFormat(self: spectacularAI.ColorFormat, value: int)
Members:
NONE
GRAY
RGB
RGBA
GRAY16
- GRAY = <ColorFormat.GRAY: 1>
- GRAY16 = <ColorFormat.GRAY16: 7>
- NONE = <ColorFormat.NONE: 0>
- RGB = <ColorFormat.RGB: 2>
- RGBA = <ColorFormat.RGBA: 3>
- property name
- property value
- class spectacularAI.Bitmap
Represents a grayscale or RGB bitmap
- getColorFormat(self: spectacularAI.Bitmap) spectacularAI.ColorFormat
- getHeight(self: spectacularAI.Bitmap) int
int
bitmap height
- getWidth(self: spectacularAI.Bitmap) int
int
bitmap width
- toArray(self: spectacularAI.Bitmap) numpy.ndarray
Returns array representation of the bitmap
- class spectacularAI.Frame
A camera frame with a pose
- property cameraPose
CameraPose
corresponding this camera
- property index
Camera index
- class spectacularAI.WgsCoordinates(self: spectacularAI.WgsCoordinates)
Represents the pose (position & orientation) of a device at a given time.
- property altitude
- property latitude
- property longitude
- class spectacularAI.PixelCoordinates(*args, **kwargs)
Coordinates of an image pixel (x, y), subpixel accuracy (
float
).Overloaded function.
__init__(self: spectacularAI.PixelCoordinates) -> None
__init__(self: spectacularAI.PixelCoordinates, arg0: float, arg1: float) -> None
- property x
- property y
- class spectacularAI.FeaturePoint(self: spectacularAI.FeaturePoint)
Sparse 3D feature point observed from a certain camera frame.
- property id
An
int
ID to identify same points in different frames.
- property pixelCoordinates
PixelCoordinates
of the observation in the camera frame.
Types in #include <spectacularAI/types.hpp>
Defines
-
SPECTACULAR_AI_API
Identifies functions and methods that are a part of the API
-
SPECTACULAR_AI_CORE_API
Identifies functions and methods that are a part of the core API
-
namespace cv
-
namespace spectacularAI
Typedefs
-
using Matrix3d = std::array<std::array<double, 3>, 3>
A 3x3 matrix, row major (accessed as m[row][col]). Also note that when the matrix is symmetric (like covariance matrices), there is no difference between row-major and column-major orderings.
-
using Matrix4d = std::array<std::array<double, 4>, 4>
A 4x4 matrix, row major (accessed as m[row][col]). Typically used with homogeneous coordinates.
Enums
-
enum class TrackingStatus
6-DoF pose tracking status
Values:
-
enumerator INIT
Initial status when tracking starts and is still initializing
-
enumerator TRACKING
Tracking is accurate (but not globally referenced)
-
enumerator LOST_TRACKING
Tracking has failed. Outputs are no longer produced until the system recovers, which will be reported as another tracking state
-
enumerator INIT
-
enum class ColorFormat
Specifies the pixel format of a bitmap
Values:
-
enumerator NONE
-
enumerator GRAY
-
enumerator RGB
-
enumerator RGBA
-
enumerator RGBA_EXTERNAL_OES
-
enumerator BGR
-
enumerator BGRA
-
enumerator GRAY16
-
enumerator FLOAT32
-
enumerator ENCODED_H264
-
enumerator NONE
-
struct Bitmap
Simple wrapper for a bitmap in RAM (not GPU RAM).
Public Functions
-
cv::Mat asOpenCV(bool flipColors = true)
Create an OpenCV matrix that may refer to this image as a shallow copy. use .clone() to make a deep copy if needed. If flipColors = true (default) automatically converts from RGB to BGR if necessary.
-
virtual ~Bitmap()
-
virtual int getWidth() const = 0
Image width in pixels
-
virtual int getHeight() const = 0
Image height in pixels
-
virtual ColorFormat getColorFormat() const = 0
Pixel color format / channel configuration
-
virtual const std::uint8_t *getDataReadOnly() const = 0
Data in row major order. Rows must be contiguous. 8-32 bits per pixel, as defined by ColorFormat.
-
virtual std::uint8_t *getDataReadWrite() = 0
Public Static Functions
-
static std::unique_ptr<Bitmap> create(int width, int height, ColorFormat colorFormat)
Create an bitmap with undefined contents
-
static std::unique_ptr<Bitmap> createReference(int width, int height, ColorFormat colorFormat, std::uint8_t *data, int rowStride = 0)
Create bitmap wrapper for plain data (not copied)
-
cv::Mat asOpenCV(bool flipColors = true)
-
struct CameraRayTo3DMatch
3D ray from camera to 3D point in world coodinates match
-
struct CameraRayToWgsMatch
3D ray from camera to WGS-84 coodinates match
-
struct FeaturePoint
Sparse 3D feature point observed from a certain camera frame.
Public Members
-
int64_t id
An integer ID to identify same points in different revisions of the point cloud, e.g., the same point observed from a different camera frame.
-
PixelCoordinates pixelCoordinates
Pixel coordinates of the observation in the camera frame. If the pixel coordinates are not available, set to { -1, -1 }.
The coordinates are not available in case of trigger outputs that do not correspond to any camera frame.
-
Vector3d position
Global position of the feature point
-
int status = 0
Implementation-defined status/type
-
int64_t id
-
struct PixelCoordinates
Coordinates of an image pixel, subpixel accuracy
Public Members
-
float x
-
float y
-
float x
-
struct Pose
Represents the pose (position & orientation) of a device at a given time
Public Functions
-
Matrix4d asMatrix() const
Matrix that converts homogeneous local coordinates to homogeneous world coordinates
Public Members
-
double time
Timestamp in seconds. Monotonically increasing
-
Vector3d position
3D position in a right-handed metric coordinate system where the z-axis points up
-
Quaternion orientation
Orientation quaternion in the same coordinate system as position
-
Matrix4d asMatrix() const
-
struct Quaternion
Quaternion representation of a rotation. Hamilton convention.
Public Members
-
double x
-
double y
-
double z
-
double w
-
double x
-
struct Vector3d
Vector in R^3, can represent, e.g., velocity, position or angular velocity
Public Members
-
double x
-
double y
-
double z
-
double x
-
struct Vector3f
Vector in R^3 (single precision)
Public Members
-
float x
-
float y
-
float z
-
float x
-
struct WgsCoordinates
Global coordinates, WGS-84
Public Members
-
double latitude
Latitude in degrees
-
double longitude
Longitude in degrees
-
double altitude
Altitude in meters
-
double latitude
-
using Matrix3d = std::array<std::array<double, 3>, 3>
Additional utilities in #include <spectacularAI/util.hpp>
-
namespace spectacularAI
Functions
-
double getHorizontalUncertainty(const Matrix3d &covariance)
Get a scalar horizontal uncertainty value in meters from a covariance matrix.
This is an RMS (“1-sigma”) value. In the the general anisotropic case, where the covariance matrix is a rotated version of diag(sx^2, sy^2), it is defined as sqrt(sx^2 + sy^2). This is also equivalent to the Frobenius norm of the 2x2 horizontal part of the covariance matrix. If the horizontal uncertainty is isotropic, this is the same as the 1D standard deviation along the X or Y axis.
-
double getVerticalUncertainty(const Matrix3d &covariance)
Vertical uncertainty (standard deviation / p67) from a covariance matrix
-
Matrix3d buildCovarianceMatrix(double horizontalSigma, double verticalSigma)
Construct a covariance matrix from vertical and horizontal uncertainty components specified as 1-sigma values (i.e., 1D standard deviations). The resulting covariance is isotropic in the XY plane.
-
double getQuaternionHeading(const Quaternion &q)
Get the yaw/heading angle of a local-to-world quaternion in degrees
-
double getQuaternionPitch(const Quaternion &q)
Pitch angle of a local-to-world quaternion in degrees
-
double getQuaternionRoll(const Quaternion &q)
Roll angle of a local-to-world quaternion in degrees
-
double getHorizontalUncertainty(const Matrix3d &covariance)