Skip to content

A faster dot4... #16

@lassade

Description

@lassade

Just want to point that dot4 can be improved with the dpps instruction (that I just discovered), it requres SSE4.1 (99.84% of cpus in the Steam Hardware Survey, April 2025)

pub fn dot4(v0: Vec, v1: Vec) Vec {
    return asm (
        \\dpps    $0xff, %xmm1, %xmm0 
        : [ret] "={xmm0}" (-> Vec), // output
        : [v0] "{xmm0}" (v0), // inputs
          [v1] "{xmm1}" (v1),
    );
}

Didn't test if it's how mutch faster it is...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions