Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions libc-top-half/musl/src/internal/atomic.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,13 +182,26 @@ static inline void a_dec(volatile int *p)
#define a_store a_store
static inline void a_store(volatile int *p, int v)
{
#ifdef __wasilibc_unmodified_upstream

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better so not defined a_barrier in wasm32/atomic_arch.h if its not doing the right thing?

#ifdef a_barrier
a_barrier();
*p = v;
a_barrier();
#else
a_swap(p, v);
#endif
#else
/**
* The wasm opcodes generated by a_barrier mode are like below:
* atomic.fence
* local.get
* i32.const
* i32.store
* atomic.fence
* which are not atomic operations, so we use a_swap instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly the *p = v operation above is non-atomic since p and v are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.

I guess atomic.fence doesn't seem to be doing what the musl developers think that a_barrier is supposed to do? Do you know why?

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly the *p = v operation above is non-atomic since p and v are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.

maybe it's assumed to be compiled to an atomic-enough instruction. like mov on x86.

or maybe a_store doesn't require to be that atomic.

I guess atomic.fence doesn't seem to be doing what the musl developers think that a_barrier is supposed to do? Do you know why?

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

are you aware of any doc which explains what musl expects?
i'm not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not easy to just use atomic.fence pairs to make the a_store an atomic operation, especially for interpreter, since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes, there will be many extra load/store/jump operations inserted among these operations, so it should be not atomic. For JIT/AOT mode, maybe the machine code generated can be atomic after some optimizations are applied, e.g. the code generated may be just like memory barrier + store instruction + memory barrier. Maybe it is why we found the hang issue only in interpreter mode, and haven't found the issue in AOT mode.

And yes, it is better to not define a_barrier, seems it is just converted into atomic.fence wasm opcode, and is not what must expects to do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes

instruction fetch doesn't matter because it's read-only.
operand stack doesn't matter as it's local to the thread.
i don't see any fundamental differences between the interpreter and jit/aot here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like memory barrier + store instruction + memory barrier, which ensures the memory ordering of access.
The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces of I32.STORE and ATOMIC.FENCE:

handler_of_I32.STORE:
    ...
    store instruction
    fetch next opcode and jump to its handler
...
...
handler_of_ATOMIC.FENCE:
    memory barrier
    fetch next opcode and jump to its handler

Or:

handler_of_ATOMIC.FENCE:
    memory barrier
    fetch next opcode and jump to its handler
...
...
handler_of_I32.STORE:
    ...
    store instruction
    fetch next opcode and jump to its handler

Note there is only one memory barrier instruction generated and there may be many other instructions between store and memory barrier instruction. The memory order cannot be promised, interpreter may run memory barrier first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

Thanks @sbc100, I removed the a_barrier definition in wasm32/atomic_arch.h.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like memory barrier + store instruction + memory barrier, which ensures the memory ordering of access. The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces of I32.STORE and ATOMIC.FENCE:

handler_of_I32.STORE:
    ...
    store instruction
    fetch next opcode and jump to its handler
...
...
handler_of_ATOMIC.FENCE:
    memory barrier
    fetch next opcode and jump to its handler

Or:

handler_of_ATOMIC.FENCE:
    memory barrier
    fetch next opcode and jump to its handler
...
...
handler_of_I32.STORE:
    ...
    store instruction
    fetch next opcode and jump to its handler

Note there is only one memory barrier instruction generated and there may be many other instructions between store and memory barrier instruction. The memory order cannot be promised, interpreter may run memory barrier first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.

i don't understand your concern.
as far as the interpreter executes handler_of_I32.STORE and handler_of_ATOMIC.FENCE in the order as in the original wasm bytecode, it should be fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, isn't the memory ordering related to the place of barrier inside the compiled code? And will it prevent the compiler from generation some re-ordered instructions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory barrier is cpu instruction which takes effect when it's executed.

*/
a_swap(p, v);
#endif
}
#endif

Expand Down