[GH-ISSUE #166] Pipeline error End-of-File #164

Closed
opened 2026-02-27 15:55:24 +03:00 by kerem · 8 comments
Owner

Originally created by @bakrowork on GitHub (Jun 19, 2018).
Original GitHub issue: https://github.com/RD17/ambar/issues/166

Hi

I am running into the following error in the logs:

[pipeline] [error] [0] error parsing //Docs/xxxxxxxx.pdf JVM exception occurred: Error: End-of-File, expected line

My docs are readable by everybody, I am running a ubuntu16.04 box with docker.

Where could I find more log about this error? Does somebody run into this error?

Originally created by @bakrowork on GitHub (Jun 19, 2018). Original GitHub issue: https://github.com/RD17/ambar/issues/166 Hi I am running into the following error in the logs: ``` [pipeline] [error] [0] error parsing //Docs/xxxxxxxx.pdf JVM exception occurred: Error: End-of-File, expected line ``` My docs are readable by everybody, I am running a ubuntu16.04 box with docker. Where could I find more log about this error? Does somebody run into this error?
kerem 2026-02-27 15:55:24 +03:00
Author
Owner

@bakrowork commented on GitHub (Jun 19, 2018):

i should be something with read rights.
I just created a simple asd.txt file and I get the following error:
GD_Docs/asd.txt JVM exception occurred: InputStream must have > 0 bytes

<!-- gh-comment-id:398460229 --> @bakrowork commented on GitHub (Jun 19, 2018): i should be something with read rights. I just created a simple asd.txt file and I get the following error: ```GD_Docs/asd.txt JVM exception occurred: InputStream must have > 0 bytes```
Author
Owner

@bakrowork commented on GitHub (Jun 19, 2018):

If I am correct the last error is generated by this line:
https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java#L122

<!-- gh-comment-id:398467807 --> @bakrowork commented on GitHub (Jun 19, 2018): If I am correct the last error is generated by this line: https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java#L122
Author
Owner

@evilham commented on GitHub (Jun 27, 2018):

I am having the exact same issue :-D.

<!-- gh-comment-id:400627660 --> @evilham commented on GitHub (Jun 27, 2018): I am having the exact same issue :-D.
Author
Owner

@evilham commented on GitHub (Jun 27, 2018):

Solved!

By checking: docker-compose logs, I saw that there was a dns lookup error; then noticed that both @bakrowork and I had used something not-DNS valid as the name for the crawler.

I guess this was a docker-rookie mistake. It should be fixed with a mention in the README not to use weird stuff as the crawler name :-).

@bakrowork should now:

  1. docker-compose down
  2. rm -rf $YOUR_DATA_DIR (care not to delete your source data, but only ambar's data dir)
  3. change docker-compose.yml so that the crawler name basically only contains lowercase ascii chars.
  4. docker-compose up -d
<!-- gh-comment-id:400642365 --> @evilham commented on GitHub (Jun 27, 2018): Solved! By checking: `docker-compose logs`, I saw that there was a dns lookup error; then noticed that both @bakrowork and I had used something not-DNS valid as the name for the crawler. I guess this was a docker-rookie mistake. It should be fixed with a mention in the `README` not to use weird stuff as the crawler name :-). @bakrowork should now: 1. `docker-compose down` 2. `rm -rf $YOUR_DATA_DIR` (care not to delete your source data, but only ambar's data dir) 3. change `docker-compose.yml` so that the crawler name basically only contains lowercase ascii chars. 4. `docker-compose up -d`
Author
Owner

@bakrowork commented on GitHub (Jun 28, 2018):

Thank you @evilham

<!-- gh-comment-id:400990240 --> @bakrowork commented on GitHub (Jun 28, 2018): Thank you @evilham
Author
Owner

@AyKarsi commented on GitHub (Jul 19, 2018):

@evilham could you maybe mention that the crawler name should be lower case in the installation guide?
Could save people some hours debugging. (I'm glad I found this post eventually)

<!-- gh-comment-id:406258142 --> @AyKarsi commented on GitHub (Jul 19, 2018): @evilham could you maybe mention that the crawler name should be lower case in the installation guide? Could save people some hours debugging. (I'm glad I found this post eventually)
Author
Owner

@evilham commented on GitHub (Jul 20, 2018):

@AyKarsi I can't :-) but maybe @sochix can.

<!-- gh-comment-id:406554068 --> @evilham commented on GitHub (Jul 20, 2018): @AyKarsi I can't :-) but maybe @sochix can.
Author
Owner

@sochix commented on GitHub (Jul 20, 2018):

@AyKarsi @evilham it's already been in Installation guide, please read carefully

Replace ${crawlerName} with desired name for your crawler (only lowercase latin letters and dashes are supported). Check that service block name and crawler name are the same

<!-- gh-comment-id:406600768 --> @sochix commented on GitHub (Jul 20, 2018): @AyKarsi @evilham it's already been in Installation guide, please read carefully > Replace ```${crawlerName}``` with desired name for your crawler **(only lowercase latin letters and dashes are supported)**. Check that service block name and crawler name are the same
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#164
No description provided.