= read_training_image(60, downsampling_factor=2)
image ; plt.imshow(image)
stonehenge
The Stonehenge Dataset
We use a dataset from the Stanford NeRF Navigation project. It consists of 500 images, split into 200 images for training, and then validation and tests sets of 150 images each.
Loading Images
The code below reads an image into memory:
read_stonehenge_image
read_stonehenge_image (path:str, downsampling_factor:int=1)
*Read image from the Stonehenge dataset, and return as a PIL image.
Returns: PIL.Image: Image object representing the image at the specified index.*
read_training_image
read_training_image (index:int, downsampling_factor:int=1)
Read image from the stonehenge dataset, and return as a PIL image.
Camera Matrices
A NeRF is trained with a set of rays. For a given image, every pixel in the image corresponds to a ray, and the origin of the ray is exactly the optical center of the camera. To calculate the direction of the ray need two pieces of information for ech image:
- the intrinsic calibration of the camera, most importantly the focal length, tells us how to convert pixel coordinates into a direction in the camera coordinate frame.
- the extrinsic calibration, position and orientation with which the image was taken, is needed to transform directions in the camera frame into the scene coordinate frame.
For the Stonehenge dataset, all this hard work has been done for us, and/or the images have been simulated with exactly known camera parameters, both intrinsic and extrinsic. The dataset creators provide undistorted images accompanied by a \(3 \times 4\) camera matrix \(M\).
The Stonehenge dataset came with its camera matrices: they were all written in a json file, which we can parse into a python dictionary:
load_json
load_json (path)
*Load and parse a JSON file from a relative path.
Returns: dict: Parsed JSON data as a Python dictionary.*
# Open the transforms JSON file and read its content
= load_json("transforms_train.json")
data len(data), 4)
test_eq(list(data.keys()), ['Far', 'Near', 'camera_angle_x', 'frames']) test_eq(
The camera matrix associated with the image below can then be extracted by converting to numpy:
extract_camera_matrix
extract_camera_matrix (camera_data, index:int, image_size:tuple)
Read the 3x4 camera matrix associated with a training image.
extract_extrinsics
extract_extrinsics (camera_data:dict, index:int)
Extract the extrinsic matrix from the given camera_data.
calculate_intrinsics
calculate_intrinsics (image_size:tuple, camera_angle_x:float)
Calculate the intrinsic matrix given the image size and camera angle.
# Testing:
= extract_camera_matrix(data, 47, (800,800))
M 3, 4))
test_eq(M.shape, (= np.array(
expected
[1152, 17, 255, -1000],
[-113, 385, 1111, -1000],
[0, -1, 1, -2],
[
]
)=1.0) test_close(M, expected, eps
Checking Stonehenge Dimensions
We can use a camera matrix to project arbitrary points into the scene. We will use this to project a circle to ascertain the dimensions in which the Stonehenge scene fits.
# Make a function to load the camera matrix and project a circle into the image:
def project_circle(camera_data, index, image_size, circle):
"""Project the circle into the image."""
= extract_camera_matrix(camera_data, index, image_size)
M = M @ circle
homogeneous return homogeneous[:2] / homogeneous[-1]
# Create a circle of given radius on the ground plane:
= 1.5
R = np.linspace(0, 2*np.pi, 100)
theta = R * np.cos(theta)
x = R * np.sin(theta)
y = np.zeros_like(x)
z = np.vstack((x, y, z, np.ones_like(x)))
circle
# Check its shape, should be 4D homogenous coordinates:
4, 100)) test_eq(circle.shape, (
# Apply it to image 47 check the shape:
= project_circle(data, 47, (800, 800), circle)
circle_47 2, 100)) test_eq(circle_47.shape, (
# Show two images side by side and plot the circles on the image:
= plt.subplots(1, 2, figsize=(12, 6))
fig, (ax1, ax2) = 70, 140
i1, i2 =2))
ax1.imshow(read_training_image(i1, downsampling_factor*project_circle(data, i1, (400, 400), circle), color="red")
ax1.plot(=2))
ax2.imshow(read_training_image(i2, downsampling_factor*project_circle(data, i2, (400, 400), circle), color="red"); ax2.plot(
Calculating Rays
Recall that a 3D point \(P\) can be projected into an image via
\[ \tilde{p} = K R^T (P - t) \]
where \(\tilde{p}\) are homogeneous 2D image coordinates. We can re-write this as
\[ \tilde{p} = M\tilde{P} \]
where \(\tilde{P} = \begin{bmatrix}P \\1 \end{bmatrix}\) and the camera matrix \(M\) is given by
\[ M = [A|a] = [K R^T | - K R^T t] \]
That means that if we are given the camera matrix \(M\) we can always recover the ray origins as \[ t = -A^{-1} a \]
and a random 3D point \(P\) on the ray corresponding to \(\tilde{p}\) as
\[ P = A^{-1}(\tilde{p} - a) \]
since \(\tilde{p} = AP + a\).
We can then calculate the ray origin and ray direction for any pixel:
= M[:, :3], M[:, 3]
A, a = -np.linalg.inv(A) @ a
t = np.array([3.5, 2.5, 1]) # middle of a 7x5 image (width x height) in homogeneous coordinates.
p_ = np.linalg.inv(A) @ (p_ - a)
P = P / np.linalg.norm(P)
D 0.54, -1.85, 1.60]), eps=0.1)
test_close(t, np.array([0.14, -0.82, 0.55]), eps=0.1) test_close(D, np.array([
calculate_rays
calculate_rays (M:numpy.ndarray, image_size:tuple)
Calculate a batch of rays associated with every pixel in a given image. When given size (W, H), returns two tensors of shape (H, W, 3) and (H, W, 3).
Let’s try this for a simple 7x5 image (where 7 is width and 5 is height), and check that the batch dimensions are as we expect (width, height, 3), and that for the center pixel they agree with our manual calculation:
= calculate_rays(M, (7, 5))
origins, directions
# Check dimensions:
5, 7, 3))
test_eq(origins.shape, (5, 7, 3))
test_eq(directions.shape, (
# Check values for middle pixel, on optical axis:
2, 3, :], t)
test_close(origins[2, 3, :], -D) test_close(directions[
Check that we get the same direction in top-left corner for different resolutions:
= 400, 400
size = extract_camera_matrix(data, 10, size)
M = calculate_rays(M, size)
_, D400 print(D400[0,0,:])
= 800, 800
size = extract_camera_matrix(data, 10, size)
M = calculate_rays(M, size)
_, D800 print(D800[0,0,:])
0,0,:], D800[0,0,:], eps=1e-3)
test_close(D400[
= 200, 200
size = extract_camera_matrix(data, 10, size)
M = calculate_rays(M, size)
_, D200 print(D200[0,0,:])
0,0,:], D200[0,0,:], eps=1e-3) test_close(D400[
[-0.57143088 0.80036988 -0.18131408]
[-0.57134447 0.80046027 -0.18118731]
[-0.57160367 0.800189 -0.18156763]
Below we cycle through 20 of the 200 training images, showing 64 rows for each.
= go.Figure()
fig for i in range(0, 199, 10):
= extract_camera_matrix(data, i, (image.size))
M = calculate_rays(M, (image.size))
origins, directions = origins[::100,::100,:].reshape(-1, 3)
T = directions[::100,::100,:].reshape(-1, 3)
D
# Adding line segments for each ray
for start, end in zip(T, T + D):
=[start[0], end[0]],
fig.add_trace(go.Scatter3d(x=[start[1], end[1]],
y=[start[2], end[2]],
z='lines',
mode=dict(color='red')))
line
=False)
fig.update_layout(showlegend fig.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json
And here we show samples for one ray from each of those 20 cameras, with the rendering volume shown as well:
= np.linspace(data["Near"], data["Far"], 64)
t_values
def sample_along_ray(origins, directions):
return origins[..., None, :] + t_values[:, None] * directions[..., None, :]
= go.Figure()
fig = image.size
W, H for i in range(0, 199):
= extract_camera_matrix(data, i, image.size)
M = calculate_rays(M, image.size)
origins, directions = np.random.randint(0, W)
i = np.random.randint(0, H)
j = origins[i, j, :]
T = directions[i, j, :]
D = sample_along_ray(T, D)
samples
# Adding line segments for each ray
fig.add_trace(
go.Scatter3d(=samples[:, 0],
x=samples[:, 1],
y=samples[:, 2],
z="lines",
mode=dict(color="red"),
line
)
)
fig.add_trace(
go.Scatter3d(=[T[0].item()],
x=[T[1].item()],
y=[T[2].item()],
z="markers",
mode=dict(color="red", size=2),
marker
)
)
# add a cuboid from -1 to 1 in x and y, and 0 to 0.5 in z:
fig.add_trace(
go.Mesh3d(=[-1, 1, 1, -1, -1, 1, 1, -1],
x=[-1, -1, 1, 1, -1, -1, 1, 1],
y=[-0.5, -0.5, -0.5, -0.5, 0.5, 0.5, 0.5, 0.5],
z=[0, 0, 4, 4],
i=[1, 2, 5, 6],
j=[2, 3, 6, 7],
k=0.5,
opacity="lightblue",
color
)
)
=False, margin=dict(l=0, r=0, b=0, t=0))
fig.update_layout(showlegend fig.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json
Ray-centric Dataset
The code below encodes all training images into rays:
create_rays
create_rays (camera_data, M=199, downsampling_factor=1)
Create rays for the training images. Args: camera_data: Dictionary containing the camera data. M: Number of training images to use. downsampling_factor: Downsampling factor to apply to the images. Returns: x_samples: Tensor of shape (M, H, W, 6) containing the origins and directions of the rays. y_samples: Tensor of shape (M, H, W, 3) containing the color data for each pixel.
= create_rays(data, 2, downsampling_factor=4)
x_samples, y_samples # Check shapes:
2, 200, 200, 6))
test_eq(x_samples.shape, (2, 200, 200, 3))
test_eq(y_samples.shape, (# Check type:
test_eq(x_samples.dtype, np.float32) test_eq(y_samples.dtype, np.float32)
# Uncomment the following line to save the file:
# np.savez_compressed('training_rays.npz', x=x_samples, y=y_samples)
# x_samples, y_samples = create_rays(data, 100, downsampling_factor=4)
# np.savez_compressed('training_rays-100-4.npz', x=x_samples, y=y_samples)
load_npz_from_url
load_npz_from_url (url)
*Loads a .npz file from the given URL and returns the contained arrays as a tuple.
Parameters: url (str): URL of the .npz file.
Returns: tuple: A tuple containing the loaded numpy arrays, or None if the request fails.*
# Example usage to download the full Zenodo version, takes 5m!
# url = 'https://zenodo.org/records/10443662/files/training_rays.npz'
# loaded_arrays = load_npz_from_url(url)
# if loaded_arrays is not None:
# xs, ys = loaded_arrays
# # check that they have the right shape:
# test_eq(xs.shape, (199, 800, 800, 6))
# test_eq(ys.shape, (199, 800, 800, 3))
# # check that they are float32:
# test_eq(xs.dtype, np.float32)
# test_eq(ys.dtype, np.float32)
# Make sure 199 images with down-sampling factor 4 are available, takes 18 seconds.
# url = 'https://zenodo.org/records/10765346/files/training_rays-199-4.npz'
# loaded_arrays = load_npz_from_url(url)
# if loaded_arrays is not None:
# xs, ys = loaded_arrays
# # check that they have the right shape:
# test_eq(xs.shape, (199, 200, 200, 6))
# test_eq(ys.shape, (199, 200, 200, 3))
# # check that they are float32:
# test_eq(xs.dtype, np.float32)
# test_eq(ys.dtype, np.float32)
download_rays
download_rays (M=100, downsampling_factor=4)
# Example usage to download the downs-sampled github version, 10-12 seconds
= download_rays()
xs, ys
# check that they have the right shape:
100, 200, 200, 6))
test_eq(xs.shape, (100, 200, 200, 3))
test_eq(ys.shape, (
# check that they are float32:
test_eq(xs.dtype, np.float32) test_eq(ys.dtype, np.float32)