Brendan Keesing   

FOMOGRAPHY     Projects     Blog     About


How FOMO Takes Photos

May 9, 2021

Having spent far too long working on a photography game, I’ve spent a lot of time thinking and researching different ways to detect a good photograph of a subject in a virtual setting. I’m not convinced there’s a perfect way to do it, but I have managed to get satisfying results using a slew of different techniques.

HSV

The idea here is to judge the image based on a few criteria, then add it up and give some sort of percentage score at the end.

Detecting Visibility And Size

Firstly, we want to detect where the subject has been rendered to in the final image. Ideally, we want to get a percentage value of how many pixels the subject takes up compared to the entire screen. I’ve found there to be two options for this.

Stencil Buffer Detection (Slow and Accurate)

The first is the slow yet accurate way. Render the subject to the stencil buffer, use a compute shader to count how many pixels it touches, then divide that by how many pixels are in the texture (width x height). This can be hideously slow, especially if your photographs are a high resolution. It may be worth rendering a separate depth-and-stencil pass at a lower resolution. The extra rendering pass may be slower, but it will give less work for the compute shader. It also gives the opportunity to cull depth-writing objects that you may not want to block the view of the subject. One downside of this solution is that you can only see what’s on screen and have no idea how much is offscreen. There may be a solution for this, but it’s getting pretty complicated, isn’t it?

Special Points

There’s another much simpler solution that I am using in FOMO. The basic idea is to set up special points on the object that are then checked for visibility. Setting up these points is the tricky part, as different subjects may have different requirements.

For non-morphing-meshes, you can automatically generate a special point for each vertex on the mesh.

HSV

As you can see, this results in way too many points. More points is more accurate, yet will kill performances because more points will need to be checked. So we will need to run an algorithm to simplify these points. This is the solution that I use:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void SimplifyPoints(List<Vector3> points, int maxPoints, float minDistance)
{
    // remove points that are close together
    for (int i = 0; i < points.Count; ++i)
    {
        for (int s = i + 1; s < points.Count; ++s)
        {
            if ((points[i] - points[s]).sqrMagnitude < minDistance)
            {
                points.RemoveAt(s);
                --s;
            }
        }
    }

    // make sure there aren't more than maxPoints
    while (points.Count > maxPoints)
    {
        int closestA = 0;
        int closestB = 1;
        float closestdist = Vector3.Distance(points[closestA], points[closestB]);
        for (int a = 0; a < points.Count; ++a)
        {
            for (int b = a + 1; b < points.Count; ++b)
            {
                float dist = Vector3.Distance(points[a], points[b]);
                if (dist >= closestdist)
                    continue;

                closestdist = dist;
                closestA = a;
                closestB = b;
            }
        }

        points[closestA] = (points[closestA] + points[closestB]) * 0.5f;
        points.RemoveAt(closestB);
    }
}

This lets you specify a minimum distance between points, and the maximum number of points there can be. With this tight level of control, you will get something closer to this:

HSV

Awesome! But what about skinned meshes, like a character, animal or some sort of pocket-sized monster?

The obvious (yet very slow) solution is to bake the skinned mesh to a static mesh, then do all the steps above. Unlike previously, we need this to happen at runtime (instead of dev time), so expect serious stutters for anything more complicated than a cube!

A better solution I’ve found is to place a point at the root of each bone in the rig.

HSV

This is a very rough approximation. And here’s the kicker: from all my experiments, you really don’t need many points. For many of the animals in FOMO, I get by with just 3 to 6 points on each, and it works wonderfully. Remember that this is not something the player has a keen sense of; so long as it roughly does the job, it’ll still be quite fun.

Weighting Different Parts

Here’s another cool thing I’ve realised: With a lot of subjects, you want to have a bias towards a part of the subject. For example, when taking a photo of a human, just taking a photo of the foot is not great (but still technically part of a human), yet taking a photo of just the face is fine because we love faces so much. So if we stick more points around the face, and less around the feet, the player will score more points if the face is in view, and it won’t matter so much if the feet are in view.

What’s even greater is that if we push a point in each bone for the rig, we will naturally end up with more points in the face, as most rigs usually have more bones in the face.

HSV

As you can see, this can backfire if you have lots of bones in the hands.

Detecting the Points

Alright, so we’ve generated all these beautiful points, but what do we do with them? This is the fun part, and also surprisingly easy.

Is the point within the camera frustum? Convert the 3D point to normalized screen coordinates by multiplying it with the view-projection matrix. Then check if it is with the -1 to +1 range. This Checking on the Z axis will also discard points that are behind the camera.

1
2
3
4
5
Vector3 vp = camera.WorldToViewportPoint(pointPosition);
if (vp.x < 0 || vp.x > 1 || vp.y < 0 || vp.y > 1 || vp.z < camera.nearClipPlane)
{
    // point is offscreen
}

Is the point being obscured by something? Just raycast to it! This will require you to flag different objects and obscurable or not. It can also have a big impact on performance. This is why we tried to reduce the number of points earlier. I also ignore the subject itself that is being tested. That way it won’t occlude its own points.

What percentage of the screen are the points taking up? This can be tricky. You could take the normalized screen space coordinates of all the points, generate a convex shape, then calculate how much overlap it has with the normalized viewport. My math isn’t quite up for that.

Instead, I just calculate a box around all the points by getting the min and max values on the X and Y axes. It’s a very crude approximation, but it’s super fast and, realistically, players probably won’t notice a thing.

HSV

Detecting Facing Direction

When we take a photo of a human head, we usually prefer a photo from the front rather than the back. This can be simply tested with a dot product.

1
float forwardAmount = Mathf.InverseLerp(1, -1, Vector3.Dot(cameraForward, subjectForward));

Detecting Focus

This can be quite tricky, and depends entirely on how your depth of field is being calculated. The basic idea is to calculate the center/average point of all your points, then get the distance between that point and the camera.

1
2
3
4
5
6
Vector3 averagePosition = new Vector3(0, 0, 0);
for (int i = 0; i < points.Count; ++i)
    averagePosition += points[i];
averagePosition /= points.Count;

float distanceToCamera = (averagePosition - camera.transform.position).magnitude;

The next part is very math heavy and beyond the scope of this tutorial, so I’ll just bump the function here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
float CalculateFocus(float distance, float focusdistance, float focallength, float aperture)
{
    float F = focallength / 1000f;
    float A = focallength / aperture;
    float maxcoc = (A * F) / (focusdistance - F);
    float coc = (1.0f - focusdistance / distance) * maxcoc;
    float nearCoC = Mathf.Clamp(coc, -1.0f, 0.0f);
    float farCoC = Mathf.Clamp01(coc);

    float focus = Mathf.Clamp01(farCoC + nearCoC + 1.0f);
    return focus * focus;
}

This returns a value between 0 and 1, where 1 is fully focused, and 0 is fully unfocused.

Detecting Lighting/Exposure

Game developers have a long history of battling with lighting, and this will be no different. I’ve come up with many solutions. Here’s some:

  • Avoid it altogether.
  • Have a trigger around lights. If the object is in a light, it’s adequately lit.
  • Raycast from each point toward each light source. The percentage of points that can reach any light is how lit it is.
  • Generate a histogram from the rendered image and see how close the average rendered pixel is to middle gray. This has the benefit of being an objective truth for any lighting conditions. The downside is that we don’t alway want an objective truth. Sometimes a 90% black image is aesthetically pleasing, and sometimes it is not. I ended up going with the histogram solution as it required the least amount of setup and works in more dynamic lighting conditions.

Detecting Composition

So we’ve got all this data about the photograph. We know how visible it is, if it’s centered or if it’s off screen. But how do we detect if it’s actually good?

We could detect if the subject is along the rule of third lines, or the golden ratio. We could detect the direction of lights and hardcode what we think is a good light direction. We could generate more histograms to check for a balance of hues and tones.

Obviously, this can get very complicated, and even if we get it all working, it will be, at best, subjectively pleasing.

In FOMO, I avoided scoring based on composition. So long as the camera can actually see what it has taken a photograph of, it’s up to the player to decide which photo they want to stick in their album.

Adding Up The Score

Okay, we’ve got a whole bunch of scores (hopefully between 0 and 1). So we just add them up and divide by the number, right?

Mostly, yeah. We can, however, weight the different scores to prefer different criteria. For example, I recommend underweighting focus, lighting and composition. These can be subjectively interpreted and may result in the player screaming “What the hell is wrong with that? It’s a masterpiece!”.

I also recommend being more lenient with the scoring. No photograph will ever meet 100% of the criteria. Actually, most aesthetically pleasing photos will be lucky to hit 50%. In this case, try sticking a sqrt() over the final score. Afterall, photography is an art as much as it is a science.



Twitter YouTube GitHub Email RSS