Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions modules/restapi/src/main/resources/docspell-openapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7266,8 +7266,10 @@ components:
format: glob
language:
description: |
The language used for text extraction and analysis when
processing mails.
The language for text extraction and analysis when
processing mails. Use **ISO 639-3** (3-letter codes) such
as `eng`, `deu`, `fra`; ISO 639-1 (2-letter) like `en`,
`de` and language names like `english` are also accepted.
type: string
format: language
postHandleAll:
Expand Down Expand Up @@ -8242,8 +8244,18 @@ components:
type: string
format: language
description: |
The `language` of the document may be specified, otherwise
the one from settings is used.
The language of the document for processing (OCR, text
extraction, analysis). If not specified, the collective's
default language is used.

Use **ISO 639-3** (3-letter codes) such as `eng`, `deu`,
`fra`, `spa`. ISO 639-1 (2-letter) codes like `en`, `de`
and language names like `english`, `german` are also
accepted. Supported languages include German, English,
French, Italian, Spanish, Portuguese, Czech, Danish,
Finnish, Norwegian, Swedish, Russian, Romanian, Dutch,
Latvian, Japanese, Hebrew, Hungarian, Lithuanian, Polish,
Estonian, Ukrainian, Khmer, Slovak.
attachmentsOnly:
type: boolean
default: false
Expand Down Expand Up @@ -8292,6 +8304,11 @@ components:
language:
type: string
format: language
description: |
Default document language for the collective. Use **ISO
639-3** (3-letter codes) like `eng`, `deu`, `fra`; ISO
639-1 (2-letter) like `en`, `de` and language names are
also accepted.
integrationEnabled:
type: boolean
description: |
Expand Down Expand Up @@ -8401,6 +8418,10 @@ components:
language:
type: string
format: language
description: |
Document language for uploads via this source. Use **ISO
639-3** (3-letter codes) like `eng`, `deu`, `fra`; ISO
639-1 (2-letter) and language names are also accepted.
created:
description: DateTime
type: integer
Expand Down
10 changes: 6 additions & 4 deletions website/site/content/docs/api/upload.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For example, here is a curl command uploading two files with meta
data. Since `multiple` is `false`, both files are added to one item:

``` bash
curl -XPOST -F meta='{"multiple":false, "direction": "outgoing", "tags": {"items":["Order"]}}' \
curl -XPOST -F meta='{"multiple":false, "direction": "outgoing", "language": "eng", "tags": {"items":["Order"]}}' \
-F file=@letter-en.pdf \
-F file=@letter-de.pdf \
http://192.168.1.95:7880/api/v1/open/upload/item/3H7hvJcDJuk-NrAW4zxsdfj-K6TMPyb6BGP-xKptVxUdqWa
Expand Down Expand Up @@ -89,9 +89,11 @@ specified via a JSON structure in a part with name `meta`:
files or `*.html|*.pdf` for selecting html and pdf files. This only
applies to archive files, like zip or e-mails (where this is applied
to the attachments).
- The `language` is used for processing the document(s) contained in
the request. If not specified the collective's default language is
used.
- The `language` specifies the document language for processing (OCR,
text extraction, analysis). If not specified, the collective's
default language is used. Use **ISO 639-3** (3-letter codes) such as
`eng`, `deu`, `fra`, `spa`. ISO 639-1 (2-letter) codes like `en`,
`de` and language names like `english`, `german` are also accepted.
- The `attachmentsOnly` property only applies to e-mail files (usually
`*.eml`). If this is `true`, then the e-mail body is discarded and
only the attachments are imported. An e-mail without any attachments
Expand Down
7 changes: 4 additions & 3 deletions website/site/content/docs/webapp/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,12 @@ page](@/docs/webapp/customfields.md) for more information.

An important setting is the language of your documents. This helps OCR
and text analysis. You can select between various languages. The
language can also specified with each [upload
request](@/docs/api/upload.md).
language can also be specified with each [upload
request](@/docs/api/upload.md) using ISO 639-3 codes (e.g. `eng`,
`deu`) or ISO 639-1 (e.g. `en`, `de`).

Go to the *Collective Settings* page and click *Document
Language*. This will set the lanugage for all your documents.
Language*. This will set the language for all your documents.

The language has effects in several areas: text extraction, fulltext
search and text analysis. When extracting text from images, tesseract
Expand Down