There's a grammatical explanation, and it comes back to clauses and clause components.
What happens when we try to highlight this snippet in terms of clause components (subject, verb group, objects, modifiers)?
We have two clauses (because we have two verb groups, and every clause has a single verb group). So let's start by highlighting Scarlett's clause:
Bod heard Scarlett choking back a scream.
And Bod's clause:
Bod heard Scarlett choking back a scream.
Except here we run into a problem! The verb 'to hear' needs an object—something that is being heard—but Scarlett's clause has already used up all of the other words in the sentence.
And here's where the hierarchy comes in; Scarlett's event is the thing that is being heard—it's the object to the main clause:
Bod heard Scarlett choking back a scream.
So, the supporting event is part of the main event. Like the leg of a table, holding the whole thing up.