Skip to content

Merge column-wise and row-wise macros #29

@piever

Description

@piever

@nalimilan had a beautiful suggestion here: JuliaData/DataFrames.jl#1514 (comment).

There may actually be very little need to have separate row-wise and column-wise macros. The row-wise macro could simply also accepts columns (as regular vectors) with a different syntax.

For example, now if we need to filter values for which :SepalLength is greater than 5 in the dataset iris we'd do:

@where iris :SepalLength > 5

Whereas if we need to compare with something that require the all column, we'd need to switch to @where_vec and add a . for broadcasting:

@where_vec iris :SepalLength .> mean(:SepalLength)

The idea would be to find a syntax so that we'd only use the row-wise macro but find a way to refer to columns (at macro expand time the symbol is replaced with the corresponding column):

@where iris :SepalLength > mean($SepalLength)

This would be mostly non-breaking but at the same time would make column-wise macros redundant.

I like the idea a lot but am unsure about the syntax. As of now in row-wise macros _ refers to the row, symbols refer to fields and cols(c) can be used to instruct the macro that c is a variable that evaluates to a symbol, so should be replaced with the field (consistent with DataFramesMeta and StatPlots). In column wise macros _ refers to the table and symbols correspond to columns, and cols(c) has the corresponding role.

What would be an extra syntax to use in row macros?

Candidates:

  • $SepalLength
  • col(:SepalLength) but could be to confusing given cols
  • Some sort of dot overloading, like _I_.SepalLength where _I_ would be replaced by a table like object with dot overloading to extract columns? It does look a bit ugly though.

In the first to cases I'm also a bit confused how one would do if the column is passed programmatically (by, say, c=:SepalLength)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions