@nalimilan had a beautiful suggestion here: JuliaData/DataFrames.jl#1514 (comment).
There may actually be very little need to have separate row-wise and column-wise macros. The row-wise macro could simply also accepts columns (as regular vectors) with a different syntax.
For example, now if we need to filter values for which :SepalLength is greater than 5 in the dataset iris we'd do:
@where iris :SepalLength > 5
Whereas if we need to compare with something that require the all column, we'd need to switch to @where_vec and add a . for broadcasting:
@where_vec iris :SepalLength .> mean(:SepalLength)
The idea would be to find a syntax so that we'd only use the row-wise macro but find a way to refer to columns (at macro expand time the symbol is replaced with the corresponding column):
@where iris :SepalLength > mean($SepalLength)
This would be mostly non-breaking but at the same time would make column-wise macros redundant.
I like the idea a lot but am unsure about the syntax. As of now in row-wise macros _ refers to the row, symbols refer to fields and cols(c) can be used to instruct the macro that c is a variable that evaluates to a symbol, so should be replaced with the field (consistent with DataFramesMeta and StatPlots). In column wise macros _ refers to the table and symbols correspond to columns, and cols(c) has the corresponding role.
What would be an extra syntax to use in row macros?
Candidates:
$SepalLength
col(:SepalLength) but could be to confusing given cols
- Some sort of dot overloading, like
_I_.SepalLength where _I_ would be replaced by a table like object with dot overloading to extract columns? It does look a bit ugly though.
In the first to cases I'm also a bit confused how one would do if the column is passed programmatically (by, say, c=:SepalLength)
@nalimilan had a beautiful suggestion here: JuliaData/DataFrames.jl#1514 (comment).
There may actually be very little need to have separate row-wise and column-wise macros. The row-wise macro could simply also accepts columns (as regular vectors) with a different syntax.
For example, now if we need to filter values for which
:SepalLengthis greater than5in the datasetiriswe'd do:Whereas if we need to compare with something that require the all column, we'd need to switch to
@where_vecand add a.for broadcasting:The idea would be to find a syntax so that we'd only use the row-wise macro but find a way to refer to columns (at macro expand time the symbol is replaced with the corresponding column):
This would be mostly non-breaking but at the same time would make column-wise macros redundant.
I like the idea a lot but am unsure about the syntax. As of now in row-wise macros
_refers to the row, symbols refer to fields andcols(c)can be used to instruct the macro thatcis a variable that evaluates to a symbol, so should be replaced with the field (consistent with DataFramesMeta and StatPlots). In column wise macros_refers to the table and symbols correspond to columns, andcols(c)has the corresponding role.What would be an extra syntax to use in row macros?
Candidates:
$SepalLengthcol(:SepalLength)but could be to confusing givencols_I_.SepalLengthwhere_I_would be replaced by a table like object with dot overloading to extract columns? It does look a bit ugly though.In the first to cases I'm also a bit confused how one would do if the column is passed programmatically (by, say,
c=:SepalLength)