Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the label [approx. one year ago] is not in the [index] #1953

Closed
QuantGuy01 opened this issue Sep 19, 2017 · 14 comments
Closed

the label [approx. one year ago] is not in the [index] #1953

QuantGuy01 opened this issue Sep 19, 2017 · 14 comments

Comments

@QuantGuy01
Copy link

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • Operating System: (Windows Version or $ uname --all)
    Linux uvm01 4.4.0-93-generic ENH: Add iterative_batch_transform decorator #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Python Version: $ python --version
    Python 3.4.5 :: Continuum Analytics, Inc.

  • Python Bitness: $ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
    64

  • How did you install Zipline: (pip, conda, or other (please explain))
    conda

  • Python packages: $ pip freeze or $ conda list
    alembic 0.7.7 py34_0 Quantopian
    bcolz 0.12.1 np111py34_0 Quantopian
    bottleneck 1.2.0 np111py34_0
    click 6.7 py34_0
    contextlib2 0.5.4 py34_0
    cyordereddict 0.2.2 py34_0 Quantopian
    cython 0.25.2 py34_0
    decorator 4.0.11 py34_0
    empyrical 0.2.2 py34_0 Quantopian
    hdf5 1.8.17 2
    intervaltree 2.1.0 py34_0 Quantopian
    libgfortran 3.0.0 1
    logbook 0.12.5 py34_0 Quantopian
    lru-dict 1.1.4 py34_0 Quantopian
    mako 1.0.6 py34_0
    markupsafe 1.0 py34_0
    mkl 2017.0.3 0
    multipledispatch 0.4.9 py34_0
    networkx 1.11 py34_0
    numexpr 2.6.1 np111py34_2
    numpy 1.11.3 py34_0
    openssl 1.0.2l 0
    pandas 0.18.1 np111py34_0
    pandas-datareader 0.2.1 py34_0
    patsy 0.4.1 py34_0
    pip 9.0.1 py34_1
    pytables 3.3.0 np111py34_0
    python 3.4.5 0
    python-dateutil 2.6.1 py34_0
    pytz 2017.2 py34_0
    readline 6.2 2
    requests 2.14.2 py34_0
    requests-file 1.4.1 py34_0
    scipy 0.18.1 np111py34_1
    setuptools 27.2.0 py34_0
    six 1.10.0 py34_0
    sortedcontainers 1.4.4 py34_0 Quantopian
    sqlalchemy 1.1.5 py34_0
    sqlite 3.13.0 0
    statsmodels 0.6.1 np111py34_1
    tk 8.5.18 0
    toolz 0.8.2 py34_0
    wheel 0.29.0 py34_0
    xz 5.2.3 0
    zipline 1.1.1 np111py34_0 Quantopian
    zlib 1.2.11 0

Now that you know a little about me, let me tell you about the issue I am
having:

Description of Issue

  • What did you expect to happen?
    I have a skeleton script with all excess code stripped out to demonstrate the issue. I expect the script to execute successfully with no output.

  • What happened instead?
    $ python ZiplineDebug.py

Traceback (most recent call last):
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1395, in _has_valid_type
    error()
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2016-09-19 00:00:00+00:00] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ZiplineDebug.py", line 29, in <module>
    environ=os.environ)
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/utils/run_algo.py", line 360, in run_algorithm
    environ=environ,
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/utils/run_algo.py", line 179, in _run
    overwrite_sim_params=False,
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/algorithm.py", line 709, in run
    for perf in self.get_generator():
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/gens/tradesimulation.py", line 230, in transform
    handle_benchmark(normalize_date(dt))
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/gens/tradesimulation.py", line 190, in handle_benchmark
    benchmark_source.get_value(date)
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/zipline/sources/benchmark_source.py", line 75, in get_value
    return self._precalculated_series.loc[dt]
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1466, in _getitem_axis
    self._has_valid_type(key, axis)
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1403, in _has_valid_type
    error()
  File "/home/username/anaconda3/envs/zipline/lib/python3.4/site-packages/pandas/core/indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2016-09-19 00:00:00+00:00] is not in the [index]'

Here is how you can reproduce this issue on your machine:

Reproduction Steps

1.Run my script
2.
3.
...

Script:

ZiplineDebug.py.txt

from zipline.api import symbol
from zipline import run_algorithm


import pandas as pd
import os
from datetime import datetime
import pytz


def initialize(context):
#    context.asset = symbol('AAPL')
    pass

def handle_data(context, data):
    pass


tz = pytz.timezone("US/Mountain")
start = datetime(2016, 9, 17, tzinfo=tz)
end = datetime(2017, 9, 2, tzinfo=tz)

perfData = run_algorithm(start=start,
                         end=end,
                         bundle="quantopian-quandl",
                         initialize=initialize,
                         capital_base=40000.00,
                         handle_data=handle_data,
                         environ=os.environ)

What steps have you taken to resolve this already?

I tried various dates and various other means of importing data (quandl, csv, yahoo). I verified that the data has dates going back further than one year ago.

Anything else?

IMPORTANT this is the key point

The script does run successfully if I modify the start date to start a little later (like a few days later). The script runs with no output, which is what I expect. Try adding one month to the start date and it will run.

As days pass the start date required for the script to run successfully also has to be later. This makes me think that somehow I cannot backtest with data >1yr ago.

...

Sincerely,

QuantGuy01

@rohan-
Copy link

rohan- commented Sep 19, 2017

Just to chime in, I too have the same issue on both osx and Ubuntu. Detailed output here..

File "test.py", line 21, in <module>
  bundle='quantopian-quandl',
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/utils/run_algo.py", line 360, in run_algorithm
  environ=environ,
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/utils/run_algo.py", line 179, in _run
  overwrite_sim_params=False,
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/algorithm.py", line 709, in run
  for perf in self.get_generator():
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/gens/tradesimulation.py", line 230, in transform
  handle_benchmark(normalize_date(dt))
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/gens/tradesimulation.py", line 190, in handle_benchmark
  benchmark_source.get_value(date)
File "/home/rohan/anaconda2/lib/python2.7/site-packages/zipline/sources/benchmark_source.py", line 75, in get_value
  return self._precalculated_series.loc[dt]
File "/home/rohan/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1296, in __getitem__
  return self._getitem_axis(key, axis=0)
File "/home/rohan/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1466, in _getitem_axis
  self._has_valid_type(key, axis)
File "/home/rohan/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1403, in _has_valid_type
  error()
File "/home/rohan/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1390, in error
  (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [2015-01-02 00:00:00+00:00] is not in the [index]'

@gehongyi
Copy link

I have the same problem!

@rnehrboss
Copy link

Same problem.
Ingest quandl data.

Can't run buyapple.py example if start date < 1 year prior to now

This works:


%%zipline --start 2016-9-21 --end 2017-6-1
from zipline.api import symbol, order, record

def initialize(context):
pass

def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data[symbol('AAPL')].price)


This fails with error: "KeyError: 'the label [2016-09-20 00:00:00+00:00] is not in the [index]'"


%%zipline --start 2016-9-20 --end 2017-6-1
from zipline.api import symbol, order, record

def initialize(context):
pass

def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data[symbol('AAPL')].price)

@rnehrboss
Copy link

It appears from issue #1947 that this might have something to do with a problem with pandas datareader and yahoo or google API.

Am I wrong in thinking that my code should be using Quandl if I ingest it first?

Thanks!

@rnehrboss
Copy link

PS: My ingested bundles looks like this:
zipline bundles
quandl
quantopian-quandl 2017-09-19 16:55:07.011638
quantopian-quandl 2017-09-19 02:30:42.271764

@DavisOwen
Copy link

So this has nothing to do with your data, it seems that it's getting choked up on your benchmark data. This is a problem because quandl does not have SPY data so you have to get it elsewhere, and yes yahoo's api does not work anymore. If you look in your ~/.zipline/data directory, you will see a SPY_benchmark.csv file, and thats the data you should be looking to.

Not sure how zipline works with regards to this, but maybe if you specify a date that is on a non-trading day with no data (weekend, holiday, etc.) it will poop at you? Doubt it, but food for thought.

@QuantGuy01
Copy link
Author

Thanks about the pointers to benchmarks. I found the code doing this and it looks like Google (not Yahoo) is returning just the last year's worth of data, no matter what dates you pass it. I see other people have since commented on the same.

The latest pandas_reader version also has this same behavior. I modified the benchmarks.py code to use Yahoo and print the data to STDOUT and I then fetched the data as a one-off. I then saved the data into SPY_benchmarks.csv.

I tried just leaving Yahoo in there permanently, but it comes back with errors and I think it has something to do with it rate limiting connections. So doing a one-off grab and saving it into the csv and then changing it back to google worked for me.

Thanks for the help everyone.

@MBattagl
Copy link

I edited benchmarks.py changing "google" to "yahoo". This worked for me.

I occasionally get an error saying something about yahoo. If I re-run it goes away, and it seems to work as expected

@minimalgeek
Copy link

If you are interested, I've made a working Docker image with all the necessary stuff to setup a zipline+pyfolio environment running in Jupyter Notebooks.

Check this out, and run the command below.

https://github.com/minimalgeek/DeepLearning/tree/master/Stock/zipline

docker-compose up --build

@DavisOwen
Copy link

Now I'm confused... apparently getting data from yahoo works with pandas_datareader now? I thought the yahoo API was broken? Is it because datareader was edited to download the data as csv?

@rnehrboss
Copy link

rnehrboss commented Sep 27, 2017 via email

@rnehrboss
Copy link

Docker Notebook container works great Farago.

Do you know if a container exists, or how I would do the same for running python / with Idle to test zipline and pyfolio? I currently use Anaconda, but have problems running both zipline and pyfolio in same environment.

@minimalgeek
Copy link

No idea about Idle, I use PyCharm. That one has a plugin for Docker environments. (If you want to debug your code, you need to upgrade it to the pro version).
Anyway, in the Dockerfile you can modify the last line:
CMD jupyter notebook --ip='*' --port=8888 --no-browser --allow-root
to
CMD python <algorithm_file>.py
or something like that :)

@freddiev4
Copy link
Contributor

freddiev4 commented Oct 3, 2017

The reason for this is because Google has now limited users to about 251 days worth of data per request, so you can't run backtests over a year. There is a fix currently being worked on.

There are duplicates of this issue so I'm just going to direct everyone to this issue: #1965. I'll comment there when there is a fix on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants