Quantcast
Channel: Get the value of the precedent day in a pivot table - Code Review Stack Exchange
Viewing all articles
Browse latest Browse all 2

Get the value of the precedent day in a pivot table

$
0
0

I have a pivot table of approximately 2 millions lines coming from a dataframe with the same structure as below:

raw = pd.DataFrame([[123456,datetime(2020,7,1),'A',10 ],                   [123456,datetime(2020,7,1),'B',25 ],                   [123456,datetime(2020,7,1),'C',0 ],                   [123456,datetime(2020,7,2),'A',17 ],                   [123456,datetime(2020,7,2),'B',23 ],                   [123456,datetime(2020,7,2),'C',float('NaN') ],                   [789012,datetime(2020,7,2),'A',11 ],                   [789012,datetime(2020,7,2),'B',19 ],                   [789012,datetime(2020,7,3),'A',8 ],                   [789012,datetime(2020,7,3),'B',21 ]],                    columns=['GROUP_ID','DATE', 'NAME', 'VALUE'])    GROUP_ID    DATE    NAME VALUE0   123456  2020-07-01   A   10.01   123456  2020-07-01   B   25.02   123456  2020-07-01   C    0.03   123456  2020-07-02   A   17.04   123456  2020-07-02   B   23.05   123456  2020-07-02   C    NaN6   789012  2020-07-02   A   11.07   789012  2020-07-02   B   19.08   789012  2020-07-03   A    8.09   789012  2020-07-03   B   21.0

As you can see, the VALUE column can be Nan.The pivot table is created like this:

pt = raw.pivot_table(index=['GROUP_ID', 'DATE'], columns=['NAME'], values=['VALUE'])                      VALUE            NAME      A      B       CGROUP_ID    DATE            123456  2020-07-01  10.0    25.0    0.0        2020-07-02  17.0    23.0    NaN789012  2020-07-02  11.0    19.0    NaN        2020-07-03   8.0    21.0    NaN

The idea is to create a level 0 column VALUE_PREV where I can have the value of C for the day before.I first did this, and it took 10 seconds:

dfA = pt.stack().unstack(level='DATE').shift(1, axis=1).stack(level='DATE')dfA = dfA[dfA.index.get_level_values('NAME') == 'C']dfA = dfA.unstack(level='NAME').rename(columns={'VALUE':'VALUE_PREV'})ptA = pt.merge(dfA, how='outer', on=['GROUP_ID', 'DATE'])                     VALUE                 VALUE_PREV            NAME     A         B     C     CGROUP_ID    DATE                123456  2020-07-01  10.0    25.0    0.0   NaN        2020-07-02  17.0    23.0    NaN   0.0789012  2020-07-02  11.0    19.0    NaN   NaN        2020-07-03   8.0    21.0    NaN   NaN

So I was wondering if there is a quicker way to do this or at least something less heavy to write / understand?

Edit : if the VALUE C is NaN at t, VALUE_PREV C at t+1 MUST be NaN and not 0


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images