GET /api/flatpages/?format=api
HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "next": null,
    "previous": null,
    "results": [
        {
            "url": "/tipofday/",
            "title": "Tip of the Day",
            "content": "Remember: This is the staging site!"
        },
        {
            "url": "/help/api/",
            "title": "API Documentation",
            "content": "[Back to Help Menu](https://www.documentcloud.org/help)\r\n\r\n# The DocumentCloud API\r\n\r\nAll APIs besides the authentication endpoints are served from\r\n<https://api.www.documentcloud.org/api>. \r\n<br> If you develop in Python, check out [python-documentcloud](https://pypi.org/project/python-documentcloud/) which is our Python wrapper for the DocumentCloud API and its [corresponding documentation](https://documentcloud.readthedocs.io/en/latest/).\r\n\r\n## Overview\r\n\r\nThe API end points are generally organized as `/api/<resource>/` representing\r\nthe entirety of the resource, and `/api/<resource>/<id>/` representing a single\r\nresource identified by its ID. All REST actions are not available on every\r\nendpoint, and some resources may have additional endpoints, but the following\r\nare how HTTP verbs generally map to REST operations:\r\n\r\n`/api/<resource>/`\r\n\r\n| HTTP Verb | REST Operation        | Parameters                                                                   |\r\n| --------- | --------------------- | ---------------------------------------------------------------------------- |\r\n| GET       | List the resources    | May support parameters for filtering                                         |\r\n| POST      | Create a new resource | Must supply all `required` fields, and may supply all non-`read only` fields |\r\n\r\n`/api/<resource>/<id>/`\r\n\r\n| HTTP Verb | REST Operation                | Parameters                                                                                                                                                                                                     |\r\n| --------- | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n| GET       | Display the resource          |                                                                                                                                                                                                                |\r\n| PUT       | Update the resource           | Same as for creating - all required fields must be present. For updating resources `PATCH` is usually preferred, as it allows you to only update the fields needed. `PUT` support is included for completeness |\r\n| PATCH     | Partially update the resource | Same as for creating, but all fields are optional                                                                                                                                                              |\r\n| DELETE    | Destroy the resources         |                                                                                                                                                                                                                |\r\n\r\nA select few of the resources support some bulk operations on the `/api/<resource>/` route:\r\n\r\n| HTTP Verb | REST Operation      | Parameters                                                                                                                    |\r\n| --------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------------- |\r\n| POST      | Bulk create         | A list of objects, where each object is what you would `POST` for a single object                                             |\r\n| PUT       | Bulk update         | A list of objects, where each object is what you would `PUT` for a single object &mdash; except it must also include the ID   |\r\n| PATCH     | Bulk partial update | A list of objects, where each object is what you would `PATCH` for a single object &mdash; except it must also include the ID |\r\n| DELETE    | Bulk destroy        | Bulk destroys will have a filtering parameter, often required, to specify which resources to delete                           |\r\n\r\n### Responses\r\n\r\nLists response will be of the form\r\n\r\n```\r\n{\r\n    \"next\": <next url if applicable>,\r\n    \"previous\": <previous url if applicable>,\r\n    \"results\": <list of results>\r\n}\r\n```\r\n\r\nwith a 200 status code.  The document search route will also include a `count`\r\nkey, with a total count of all documents returned by the search.\r\n\r\nGetting a single resource, creating and updating will return just the object.\r\nCreate uses a 201 status code and get and update will return 200.\r\n\r\nDelete will have an empty response with a 204 status code.\r\n\r\nBatch updates will contain a list of objects updated with a 200 status code.\r\n\r\nSpecifying invalid parameters will generally return a 400 error code with a\r\nJSON object with a single `\"error\"` key, whose value will be an error message.\r\nSpecifying an ID that does not exist or that you do not have access to view\r\nwill return status 404. Trying to create or update a resource you do not have\r\npermission to will return status 403.\r\n\r\n### Pagination\r\n\r\nAll list views accept a `per_page` parameter, which specifies how many\r\nresources to list per page. It is `25` by default and may be set up to `100`\r\nfor authenticated users. For anonymous users it is restricted to `25`. You\r\nmay register for a free account at <https://accounts.muckrock.com/> to use the\r\n`100` limit. You may view subsequent pages by using the `next` URL.\r\n\r\n#### Cursor Based Pagination\r\n\r\nPage offset pagination does not scale well to a large number of pages.  For\r\nimproved performance, DocumentCloud uses a cursor based\r\npagination system.  Instead of a `page` parameter, there is a `cursor`\r\nparameter, which accepts an opaque `cursor` which specifies the last value\r\nseen.  To use this system, you must use the `next` and `previous` links as\r\nreturned by the API, as random access is not available.  This system also\r\nrestricts arbitrary ordering of the results, except for the document search\r\nroute, which will still allow re-ordering with cursor based pagination.\r\n\r\nIf the cursor based pagination breaks your workflow, you may continue to use\r\nthe old page-offset based pagination system for now.  In the future, this will\r\nbe disabled completely, and you will be forced to use the cursor based\r\npagination.  To use the page-offset based pagination, which also has a top\r\nlevel `count` key with a total count of the objects returned for all list\r\nqueries, add a `version=1.0` query parameter to your API queries.  Be aware\r\nthat this will make your queries less performant, possibly to the point of them\r\nbeing unusable.  This should only be used as a stop-gap solution while you\r\nupdate your workflow to use the new cursor based pagination.  Please reach out\r\nto [info@documentcloud.org](mailto:info@documentcloud.org) if you need\r\nassistance moving to the new version.\r\n\r\n### Sub Resources\r\n\r\nSome resources also support sub resources, which is a resource that belongs to another. The general format is:\r\n\r\n`/api/<resource>/<id>/<subresource>/`\r\n\r\nor\r\n\r\n`/api/<resource>/<id>/<subresource>/<subresource_id>/`\r\n\r\nIt generally works the same as a resource, except scoped to the parent resource.\r\n\r\nTODO: Examples\r\n\r\n### Filters\r\n\r\nFilters on list views which have choices generally allow you to specify\r\nmultiple values, and will filter on all resources that match at least one\r\nchoices. To specify multiple parameters you may either supply a comma\r\nseparated list of IDs &mdash; `?parameter=1,2` &mdash; or by specify the\r\nparameter multiple times &mdash; `?parameter=1&parameter=2`.\r\n\r\n### Rate Limits\r\n\r\nThe DocumentCloud API is rate limited to 10 requests per second.  It also\r\nallows bursts up to 20 requests.  This means if you exceed the the 10 request\r\nper second limit, it will serve you up to 20 requests more quickly, while\r\nkeeping track of your average rate.  After the 20 requests are served,\r\nadditional requests will be rejected with an HTTP status of 503 until you again\r\nfall under an average of 10 requests per second.  If you use the Python\r\nDocumentCloud library, it will automatically throttle your requests to 10 per\r\nsecond to avoid going over the rate limit.  If you are writing custom code,\r\nplease be mindful of the rate limits.\r\n\r\nThere is also a secondary limit of 500 requests per day for anonymous users.\r\nIf you exceed this limit, you will start receiving errors with an HTTP status\r\nof 429.  In order to avoid this, please register for a free account at\r\n<https://aacounts.muckrock.com/>.  Currently, there are no daily limits of\r\nregistered accounts, although this may change in the future.\r\n\r\n## Authentication\r\n\r\nAuthentication happens at the MuckRock accounts server located at\r\n<https://accounts.muckrock.com/>. The API provided there will supply you with\r\na [JWT][1] access token and refresh token in exchange for your username and\r\npassword. The access token should be placed in the `Authorization` header\r\npreceded by `Bearer` - `{'Authorization': 'Bearer <access token>'}`. The\r\naccess token is valid for 5 minutes, after which you will receive a 403\r\nforbidden error if you continue trying to use it. At this point you may use\r\nthe refresh token to obtain a new access token and refresh token. The refresh\r\ntoken is valid for one day.\r\n\r\n### POST /api/token/\r\n\r\n| Param    | Type   | Description   |\r\n| -------- | ------ | ------------- |\r\n| username | String | Your username |\r\n| password | String | Your password |\r\n\r\n#### Response\r\n\r\n    {'access': <access token>, 'refresh': <refresh token>}\r\n\r\n### POST /api/refresh/\r\n\r\n| Param   | Type   | Description   |\r\n| ------- | ------ | ------------- |\r\n| refresh | String | Refresh token |\r\n\r\n#### Response\r\n\r\n    {'access': <access token>, 'refresh': <refresh token>}\r\n\r\n## Documents\r\n\r\nThe documents API allows you to upload, browse and edit documents. To add or\r\nremove documents from a project, please see [project\r\ndocuments](#project-documents).\r\n\r\n### Fields\r\n\r\n| Field                | Type         | Options            | Description                                                                                                                                                      |\r\n| -------------------- | ------------ | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n| ID                   | Integer      | Read Only          | The ID for the document                                                                                                                                          |\r\n| access               | String       | Default: `private` | The [access level](#access-levels) for the document                                                                                                              |\r\n| asset_url            | String       | Read Only          | The base URL to load this document's [static assets](#static-assets) from                                                                                        |\r\n| canonical_url        | URL          | Read Only          | The canonical URL to view this document                                                                                                                          |\r\n| created_at           | Date Time    | Read Only          | Time stamp when this document was created                                                                                                                        |\r\n| data                 | JSON         | Not Required       | [Custom metadata](#data)                                                                                                                                         |\r\n| description          | String       | Not Required       | A brief description of the document                                                                                                                              |\r\n| edit_access          | Bool         | Read Only          | Does the current user have edit access to this document                                                                                                          |\r\n| file_url             | URL          | Create Only        | A URL to a publicly accessible document for the [URL Upload Flow](#url-upload-flow)         \r\n| file_hash             | String       | Read Only          | A sha1 hash representation of the raw PDF data as a hexadecimal string.  \r\n| force_ocr            | Bool         | Create Only        | Force OCR even if the PDF contains embedded text - only include if `file_url` is set, otherwise should set `force_ocr` on the call to the processing endpoint.  This operation clears underlying metadata about the document like authorship, creation date, etc. If this is a concern, make sure to keep a copy of the original document.   |\r\n| language             | String       | Default: `eng`     | The [language](#languages) the document is in                                                                                                                    |\r\n| ocr_engine | string | Not required | Specifies which OCR engine to use on documents. Use with force_ocr set to True. Accepted values: tess4 for tesseract and textract for Amazon textract (which requires AI credits). |\r\n| noindex              | Bool         | Not required       | Ask search engines and DocumentCloud search to not index this document                                                                                           |\r\n| organization         | Integer      | Read Only          | The ID for the [organization](#organizations) this document belongs to                                                                                           |\r\n| original_extension   | String       | Default: `pdf`     | The original file extension of the document you are seeking to upload. It must be a [supported file type](#supported-file-types)                                 |\r\n| page_count           | Integer      | Read Only          | The number of pages in this document                                                                                                                             |\r\n| page_spec            | Integer      | Read Only          | [The dimensions for all pages in the document](#page-spec)                                                                                                       |\r\n| pages                | JSON         | Write Only         | Allows you to set page text via the API.  See [set page text](#set-page-text) for more information.                                                              |\r\n| presigned_url        | URL          | Read Only          | The pre-signed URL to [directly](#direct-file-upload-flow) `PUT` the PDF file to                                                                                 |\r\n| projects             | List:Integer | Create Only        | The IDs of the [projects](#projects) this document belongs to - this may be set on creation, but may not be updated. See [project documents](#project-documents) |\r\n| publish_at           | Date Time    | Not Required       | A timestamp when to automatically make this document public                                                                                                      |\r\n| published_url        | URL          | Not Required       | The URL where this document is embedded                                                                                                                          |\r\n| related_article      | URL          | Not Required       | The URL for the article about this document                                                                                                                      |\r\n| remaining            | JSON         | Read Only          | The number of pages left for text and image processing - only included if `remaining` is included as a `GET` parameter                                           |\r\n| slug                 | String       | Read Only          | The slug is a URL safe version of the title                                                                                                                      |\r\n| source               | String       | Not Required       | The source who produced the document                                                                                                                             |\r\n| status               | String       | Read Only          | The [status](#statuses) for the document                                                                                                                         |\r\n| title                | String       | Required           | The document's title                                                                                                                                             |\r\n| updated_at           | Date Time    | Read Only          | Time stamp when the document was last updated                                                                                                                    |\r\n| user                 | Integer      | Read Only          | The ID for the [user](#users) this document belongs to                                                                                                           |\r\n\r\n[Expandable fields](#expandable-fields): user, organization, projects, sections, notes\r\n\r\n### Uploading a Document\r\n\r\nThere are two supported ways to upload documents &mdash; directly uploading the\r\nfile to our storage servers or by providing a URL to a publicly available\r\nPDF or other [supported file type](#supported-file-types). \r\nTo upload another supported file type you will need to include the original_extension field documented above. \r\n\r\n#### Direct File Upload Flow\r\n\r\n1. `POST /api/documents/` <br><br>\r\n   To initiate an upload, you will first create the document. You may specify all\r\n   writable document fields (besides `file_url`). The response will contain all\r\n   the fields for the document, with two being of note for this flow:\r\n   `presigned_url` and `id`. <br><br>\r\n   If you would like to upload files in bulk, you may `POST` a list of JSON\r\n   objects to `/api/documents/` instead of a single object. The response will\r\n   contain a list of document objects.\r\n\r\n2. `PUT <presigned_url>` <br><br>\r\n   Next, you will `PUT` the binary data for the file to the given\r\n   `presigned_url`. The presigned URL is valid for 5 minutes. You may obtain a\r\n   new URL by issuing a `GET` request to `/api/documents/\\<id\\>/`. <br><br>\r\n   If you are bulk uploading, you will still need to issue a single `PUT` to the\r\n   corresponding `presigned_url` for each file.\r\n\r\n3. `POST /api/documents/<id>/process/` <br><br>\r\n   Finally, you will begin processing of the document. Note that this endpoint\r\n   accepts only one optional parameter &mdash; `force_ocr` which, if set to true,\r\n   will OCR the document even if it contains embedded text. <br><br>\r\n   If you are uploading in bulk you can issue a single `POST` to\r\n   `/api/document/process/` which will begin processing in bulk. You should pass\r\n   a list of objects containing the document IDs of the documents you would like\r\n   to being processing. You may optionally specify `force_ocr` for each document.\r\n\r\n#### URL Upload Flow\r\n\r\n1. `POST /api/documents/`\r\n\r\nIf you set `file_url` to a URL pointing to a publicly accessible PDF, our\r\nservers will fetch the PDF and begin processing it automatically.\r\n\r\nYou may also send a list of document objects with `file_url` set to bulk upload\r\nfiles using this flow.\r\n\r\n### Endpoints\r\n\r\n- `GET /api/documents/` &mdash; List documents\r\n- `POST /api/documents/` &mdash; Create document\r\n- `PUT /api/documents/` &mdash; Bulk update documents\r\n- `PATCH /api/documents/` &mdash; Bulk partial update documents\r\n- `DELETE /api/documents/` &mdash; Bulk delete documents\r\n  - Bulk delete will not allow you to indiscriminately delete all of your\r\n    documents. You must specify which document IDs you want to delete using\r\n    the `id__in` filter.\r\n- `POST /api/documents/process/` &mdash; Bulk process documents\r\n  - This will allow you to process multiple documents with a single API call.\r\n    Expect parameters: `[{\"id\": 1, \"force_ocr\": true}, {\"id\": 2}]`\r\n    It expects a list of objects, where each object contains the ID of the\r\n    document to process, and an optional boolean, `force_ocr`, which will OCR\r\n    the document even if it contains embedded text if set to `true`\r\n- `GET /api/documents/search/` &mdash; [Search][6] documents\r\n- `GET /api/documents/<id>/` &mdash; Get document\r\n- `PUT /api/documents/<id>/` &mdash; Update document\r\n- `PATCH /api/documents/<id>/` &mdash; Partial update document\r\n- `DELETE /api/documents/<id>/` &mdash; Delete document\r\n- `POST /api/documents/<id>/process/` &mdash; Process document\r\n  - This will process a document. It is used after uploading the file in the\r\n    [direct file upload flow](#direct-file-upload-flow) or to reprocess a\r\n    document, which you may want to do in the case of an error. It accepts\r\n    one optional boolean parameter, `force_ocr`, which will OCR the document\r\n    even if it contains embedded text if it is set to `true`. Note that it\r\n    is an error to try to process a document that is already processing.\r\n- `DELETE /api/documents/<id>/process/` &mdash; Cancel processing document\r\n  - This will cancel the processing of a document. Note that it is an error\r\n    to try to cancel the processing if the document is not processing.\r\n- `GET /api/documents/<id>/search/` &mdash; [Search][6] within a document\r\n\r\n### Filters\r\n\r\n- `ordering` &mdash; Sort the results &mdash; valid options include: `created_at`,\r\n  `page_count`, `title`, and `source`. You may prefix any valid option with\r\n  `-` to sort it in reverse order.\r\n- `user` &mdash; Filter by the ID of the owner of the document.\r\n- `organization` &mdash; Filter by the ID of the organization of the document.\r\n- `project` &mdash; Filter by the ID of a project the document is in.\r\n- `access` &mdash; Filter by the [access level](#access-levels).\r\n- `status` &mdash; Filter by [status](#statuses).\r\n- `created_at__lt`, `created_at__gt` &mdash; Filter by documents created\r\n  either before or after a given date. You may specify both to find documents\r\n  created between two dates. This may be a date or date time, in the following\r\n  formats: `YYYY-MM-DD` or `YYYY-MM-DD+HH:MM:SS`.\r\n- `page_count`, `page_count__lt`, `page_count__gt` &mdash; Filter by documents\r\n  with a specified number of pages, or more or less pages then a given amount.\r\n- `id__in` &mdash; Filter by specific document IDs, passed in as comma\r\n  separated values.\r\n\r\n### Notes\r\n\r\nNotes can be left on documents for yourself, or to be shared with other users. They may contain HTML for formatting.\r\n\r\n#### Fields\r\n\r\n| Field        | Type      | Options            | Description                                                        |\r\n| ------------ | --------- | ------------------ | ------------------------------------------------------------------ |\r\n| ID           | Integer   | Read Only          | The ID for the note                                                |\r\n| access       | String    | Default: `private` | The [access level](#access-levels) for the note                    |\r\n| content      | String    | Not Required       | Content for the note, which may include HTML                       |\r\n| created_at   | Date Time | Read Only          | Time stamp when this note was created                              |\r\n| edit_access  | Bool      | Read Only          | Does the current user have edit access to this note                |\r\n| organization | Integer   | Read Only          | The ID for the [organization](#organizations) this note belongs to |\r\n| page_number  | Integer   | Required           | The page of the document this note appears on                      |\r\n| title        | String    | Required           | Title for the note                                                 |\r\n| updated_at   | Date Time | Read Only          | Time stamp when this note was last updated                         |\r\n| user         | ID        | Read Only          | The ID for the [user](#users) this note belongs to                 |\r\n| x1           | Float     | Not Required       | Left most coordinate of the note, as a percentage of page size     |\r\n| x2           | Float     | Not Required       | Right most coordinate of the note, as a percentage of page size    |\r\n| y1           | Float     | Not Required       | Top most coordinate of the note, as a percentage of page size      |\r\n| y2           | Float     | Not Required       | Bottom most coordinate of the note, as a percentage of page size   |\r\n\r\n[Expandable fields](#expandable-fields): user, organization\r\n\r\nThe coordinates must either all be present or absent &mdash; absent represents\r\na page level note which is displayed between pages.\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/documents/<document_id>/notes/` - List notes\r\n- `POST /api/documents/<document_id>/notes/` - Create note\r\n- `GET /api/documents/<document_id>/notes/<id>/` - Get note\r\n- `PUT /api/documents/<document_id>/notes/<id>/` - Update note\r\n- `PATCH /api/documents/<document_id>/notes/<id>/` - Partial update note\r\n- `DELETE /api/documents/<document_id>/notes/<id>/` - Delete note\r\n\r\n### Sections\r\n\r\nSections can mark certain pages of your document &mdash; the viewer will show\r\nan outline of the sections allowing for quick access to those pages.\r\n\r\n#### Fields\r\n\r\n| Field       | Type    | Options   | Description                                      |\r\n| ----------- | ------- | --------- | ------------------------------------------------ |\r\n| ID          | Integer | Read Only | The ID for the section                           |\r\n| page_number | Integer | Required  | The page of the document this section appears on |\r\n| title       | String  | Required  | Title for the section                            |\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/documents/<document_id>/sections/` - List sections\r\n- `POST /api/documents/<document_id>/sections/` - Create section\r\n- `GET /api/documents/<document_id>/sections/<id>/` - Get section\r\n- `PUT /api/documents/<document_id>/sections/<id>/` - Update section\r\n- `PATCH /api/documents/<document_id>/sections/<id>/` - Partial update section\r\n- `DELETE /api/documents/<document_id>/sections/<id>/` - Delete section\r\n\r\n### Errors\r\n\r\nSometimes errors happen &mdash; if you find one of your documents in an error\r\nstate, you may check the errors here to see a log of the latest, as well as\r\nall previous errors. If the message is cryptic, please contact us &mdash; we\r\nare happy to help figure out what went wrong.\r\n\r\n#### Fields\r\n\r\n| Field      | Type      | Options   | Description                            |\r\n| ---------- | --------- | --------- | -------------------------------------- |\r\n| ID         | Integer   | Read Only | The ID for the error                   |\r\n| created_at | Date Time | Read Only | Time stamp when this error was created |\r\n| message    | String    | Required  | The error message                      |\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/documents/<document_id>/errors/` - List errors\r\n\r\n### Data\r\n\r\nDocuments may contain user supplied metadata. You may assign multiple values\r\nto arbitrary keys. This is represented as a JSON object, where each key has a\r\nlist of strings as a value. The special key `_tag` is used by the front end to\r\nrepresent tags. These values are useful for searching and organizing documents.\r\nYou may directly set or update the data from the document endpoints, but these\r\nadditional endpoints are supplied to add or remove data on a per key basis.\r\n\r\n#### Fields\r\n\r\n| Field  | Type        | Options      | Description                      |\r\n| ------ | ----------- | ------------ | -------------------------------- |\r\n| values | List:String | Required     | The values associated with a key |\r\n| remove | List:String | Not Required | Values to be removed             |\r\n\r\n`remove` is only used for `PATCH`ing. `values` is not required when `PATCH`ing.\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/documents/<document_id>/data/` - List values for all keys\r\n  - The response for this is a JSON object with a property for each key,\r\n    which will always be a list of strings, corresponding to the values\r\n    associated with that key. Example:\r\n    ```\r\n    {\r\n      \"_tag\": [\"important\"],\r\n      \"location\": [\"boston\", \"new york\"]\r\n    }\r\n    ```\r\n- `GET /api/documents/<document_id>/data/<key>/` - Get values for the given key\r\n  - The response for this is a JSON list of strings. Example: `[\"one\", \"two\"]`\r\n- `PUT /api/documents/<document_id>/data/<key>/` - Set values for the given key\r\n  - This will override all values currently under key\r\n- `PATCH /api/documents/<document_id>/data/<key>/` - Add and/or remove values for the given key\r\n- `DELETE /api/documents/<document_id>/data/<key>/` - Delete all values for a given key\r\n\r\n### Redactions\r\n\r\nRedactions allow you to obscure parts of the document which are confidential\r\nbefore publishing them. The pages which are redacted will be fully flattened\r\nand reprocessed, so that the original content is not present in lower levels of\r\nthe image or as text data. Redactions are not reversible, and may only be\r\ncreated, not retrieved or edited.  Redacting a document strips available\r\nmetadata from a document about authorship, creation date, etc. If this is a concern\r\nto you, you may want to hold onto an original copy before redaction, as it is irreversible. \r\n\r\n#### Fields\r\n\r\n| Field       | Type    | Options  | Description                                                           |\r\n| ----------- | ------- | -------- | --------------------------------------------------------------------- |\r\n| page_number | Integer | Required | The page of the document this redaction appears on                    |\r\n| x1          | Float   | Required | Left most coordinate of the redaction, as a percentage of page size   |\r\n| x2          | Float   | Required | Right most coordinate of the redaction, as a percentage of page size  |\r\n| y1          | Float   | Required | Top most coordinate of the redaction, as a percentage of page size    |\r\n| y2          | Float    | Required | Bottom most coordinate of the redaction, as a percentage of page size |\r\n\r\n#### Endpoints\r\n\r\n- `POST /api/documents/<document_id>/redactions/` - Create redaction\r\n\r\n### Modifications\r\n\r\nModifications allow you to perform page modification operations on a document, including moving pages, rotating pages, copying pages, deleting pages, and inserting pages from other documents. Applying modifications effectively shuffles, removes, and copies pages, preserving and duplicating page information as needed (this includes page text and any annotations and sections attached to the page). No page text needs to be reprocessed or re-OCR'd. After successfully applying modifications, the document cannot be reverted.\r\n\r\n#### Modification Specification\r\n\r\nTo support a flexible host of potential modifications, you must pass in the modifications as a JSON array that lists the operations to take place. The modification specification defines the pages that should compose the document post-modification and any operations such as rotation to apply to the pages. Each element of the modification array can have the following fields (instructive examples will be listed after the official specification):\r\n\r\n| Field         | Description                                                                                                                                                                                                                                                                                                                                                                                           |\r\n| ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n| page          | A comma-separated string of page ranges, which can include individual pages or hyphenated inclusive runs of pages. Page numbers are 0-based (the first page of the document is page `0`, and `0-9` refers to the first through the 10th page of the document). Valid examples of page ranges include `\"7\"`, `\"0-499\"`, `\"0-5,8,11-13\"`, and `0,0,0` (page numbers can be repeated to duplicate them). |\r\n| id            | If unspecified, pull pages from the current document. Otherwise, pull pages from the document with the specified id.                                                                                                                                                                                                                                                                                  |\r\n| modifications | An array of JSON objects defining modifications to take place. The only currently defined page modification operation is `rotate`, which rotates pages clockwise, counterclockwise, or halfway. Rotation is specified as `{\"type\": \"rotate\", \"angle\": <angle>}`, where `<angle>` is one of `cc`, `ccw`, or `hw` (corresponding to clockwise, counterclockwise, and halfway, respectively).            |\r\n\r\n#### Example Specifications\r\n\r\nThe following examples assume you are modifying the Mueller Report, a 448-page document.\r\n\r\n| Example                                                                                                                                                                                                                                                                                                                        | Description                                                                            |\r\n| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- |\r\n| <pre>[{<br>&nbsp;&nbsp;\"page\": \"0-447\"<br>}]</pre>                                                                                                                                                                                                                                                                             | Leave the Mueller Report unchanged                                                     |\r\n| <pre>[{<br>&nbsp;&nbsp;\"page\": \"0-23,423-447\"<br>}]</pre>                                                                                                                                                                                                                                                                      | Remove the middle 400 pages of the Mueller Report                                      |\r\n| <pre>[{<br>&nbsp;&nbsp;\"page\": \"0-23,423-447\"<br>}]</pre>                                                                                                                                                                                                                                                                      | Duplicate the first 50 pages of the Mueller Report at the end of the document          |\r\n| <pre>[{<br>&nbsp;&nbsp;\"page\": \"0-447\",<br>&nbsp;&nbsp;\"modifications\": [{<br>&nbsp;&nbsp;&nbsp;&nbsp;\"type\": \"rotate\",<br>&nbsp;&nbsp;&nbsp;&nbsp;\"angle\": \"ccw\"<br>&nbsp;&nbsp;}]<br>}]</pre>                                                                                                                                | Rotate all the pages of the Mueller Report counter-clockwise                           |\r\n| <pre>[<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;\"page\": \"0-49\",<br>&nbsp;&nbsp;&nbsp;&nbsp;\"modifications\": [{<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\"type\": \"rotate\",<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\"angle\": \"hw\"<br>&nbsp;&nbsp;&nbsp;&nbsp;}]<br>&nbsp;&nbsp;},<br>&nbsp;&nbsp;{ \"page\": \"50-447\" }<br>]</pre> | Rotate just the first 50 pages of the Mueller Report 180 degrees                       |\r\n| <pre>[<br>&nbsp;&nbsp;{ \"page\": \"0-447\" },<br>&nbsp;&nbsp;{<br>&nbsp;&nbsp;&nbsp;&nbsp;\"page\": \"0-49\",<br>&nbsp;&nbsp;&nbsp;&nbsp;\"id\": \"2000000\"<br>&nbsp;&nbsp;},<br>]</pre>                                                                                                                                                 | Import 50 pages of another document with id `2000000` at the end of the Mueller report |\r\n\r\n#### Endpoints\r\n\r\n- `POST /api/documents/<document_id>/modifications/` - Create modifications\r\n\r\n### Entities\r\n\r\nEntities can be extracted using Google Cloud's Natural Language API. Entity\r\nextraction must be initalized manually per document and entities are read-only.\r\n\r\n#### Fields\r\n\r\nTop level fields\r\n\r\n| Field      | Type   | Description                                                                        |\r\n| ---------- | ------ | ---------------------------------------------------------------------------------- |\r\n| entity     | Object | Object containing information about this particular entity                         |\r\n| relevance  | Float  | An estimate as to how relevant this entity is to this document                     |\r\n| occurences | List   | A list of occurence objects specifying where in the document this entity was found |\r\n\r\nFields for the entity object\r\n\r\n| Field         | Type   | Description                                                    |\r\n| ------------- | ------ | -------------------------------------------------------------- |\r\n| name          | String | The name of the entity                                         |\r\n| kind          | String | The [kind](#kind) of entity                                    |\r\n| description   | String | A short description of the entity                              |\r\n| mid           | String | The Knowledge Graph ID                                         |\r\n| wikipedia_url | URL    | The Wikipedia URL for this entity                              |\r\n| metadata      | Object | Additional metadata for the entity, based on its [kind](#kind) |\r\n\r\nFields for the occurence objects\r\n\r\n| Field       | Type    | Description                                                                 |\r\n| ----------- | ------- | --------------------------------------------------------------------------- |\r\n| page        | Integer | The page of the document this occurs on                                     |\r\n| offset      | Integer | The character offset into the document this occurs on                       |\r\n| content     | String  | The content of this occurence (the occurence may not match the entity name) |\r\n| page_offset | Integer | The character offset into the page this occurs on                           |\r\n| kind        | String  | `proper` for proper nouns, `common` for common nouns or `unknown`           |\r\n\r\n##### Kind\r\n\r\nEntity kinds include\r\n\r\n- `unknown`\r\n- `person`\r\n- `location`\r\n- `organization`\r\n- `event`\r\n- `work_of_art`\r\n- `consumer_good`\r\n- `other`\r\n- `phone_number` &mdash; metadata may include number, national_prefix, area_code and extension\r\n- `address` &mdash; metadata may include street_number, locality, street_name, postal_code, country, broad_region, narrow_region, and sublocality\r\n- `date` &mdash; metadata may include year, month and day\r\n- `price` &mdash; metadata may include value and currency\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/documents/<document_id>/entities/` - List entities for this document\r\n- `POST /api/documents/<document_id>/entities/` - Begin extracting entities for this document (POST body is empty)\r\n- `DELETE /api/documents/<document_id>/entities/` - Delete all entities for this document\r\n\r\n#### Filters\r\n\r\n- `kind` &mdash; Filter for entities with the given kind (may give multiple, comma seperated)\r\n- `occurences` &mdash; Filter for entities with the given occurence kind (`proper` or `common`)\r\n- `relevance__gt` &mdash; Filter for documents with the given relevance or higher\r\n- `mid` &mdash; Boolean filter for entities which do or do not have a MID\r\n- `wikipedia_url` &mdash; Boolean filter for entities which do or do not have a Wikipedia URL\r\n\r\n## Projects\r\n\r\nProjects are collections of documents. They can be used for organizing groups\r\nof documents, or for collaborating with other users by sharing access to\r\nprivate documents.\r\n\r\n### Sharing Documents\r\n\r\nProjects may be used for sharing documents. When you add a collaborator to a\r\nproject, you may select one of three access levels:\r\n\r\n- `view` - This gives the collaborator permission to view your documents that\r\n  you have added to the project\r\n- `edit` - This gives the collaborator permission to view or edit your\r\n  documents you have added to the project\r\n- `admin` - This gives the collaborator both view and edit permissions, as well\r\n  as the ability to add their own documents and invite other collaborators to\r\n  the project\r\n\r\nAdditionally, you may add public documents to a project, for organizational\r\npurposes. Obviously, no permissions are granted to your or your collaborators\r\nwhen you add documents you do not own to your project &mdash; this is tracked\r\nby the `edit_access` field on the [project membership](#project-documents).\r\nWhen you add documents you or your organization do own, it will be added with\r\n`edit_access` enabled by default. You may override this using the API if you\r\nwould like to add your documents to a project, but not extend permissions to\r\nany of your collaborators. Also note that documents shared with you for\r\nediting via another project may not be added to your own project with\r\n`edit_access` enabled. This means the original owner of a document may revoke\r\nany access they have granted to others via projects at any time.\r\n\r\n### Fields\r\n\r\n| Field             | Type      | Options          | Description                                                                       |\r\n| ----------------- | --------- | ---------------- | --------------------------------------------------------------------------------- |\r\n| ID                | Integer   | Read Only        | The ID for the project                                                            |\r\n| created_at        | Date Time | Read Only        | Time stamp when this project was created                                          |\r\n| description       | String    | Not Required     | A brief description of the project                                                |\r\n| edit_access       | Bool      | Read Only        | Does the current user have edit access to this project                            |\r\n| add_remove_access | Bool      | Read Only        | Does the current user have permission to add and remove documents to this project |\r\n| private           | Bool      | Default: `false` | Private projects may only be viewed by their collaborators                        |\r\n| slug              | String    | Read Only        | The slug is a URL safe version of the title                                       |\r\n| title             | String    | Required         | Title for the project                                                             |\r\n| updated_at        | Date Time | Read Only        | Time stamp when this project was last updated                                     |\r\n| user              | ID        | Read Only        | The ID for the [user](#users) who created this project                            |\r\n\r\n### Endpoints\r\n\r\n- `GET /api/projects/` - List projects\r\n- `POST /api/projects/` - Create project\r\n- `GET /api/projects/<id>/` - Get project\r\n- `PUT /api/projects/<id>/` - Update project\r\n- `PATCH /api/projects/<id>/` - Partial update project\r\n- `DELETE /api/projects/<id>/` - Delete project\r\n\r\n### Filters\r\n\r\n- `user` &mdash; Filter by projects where this user is a collaborator\r\n- `document` &mdash; Filter by projects which contain the given document\r\n- `private` &mdash; Filter by private or public projects. Specify either\r\n  `true` or `false`.\r\n- `slug` &mdash; Filter by projects with the given slug.\r\n- `title` &mdash; Filter by projects with the given title.\r\n\r\n### Project Documents\r\n\r\nThese endpoints allow you to browse, add and remove documents from a project\r\n\r\n#### Fields\r\n\r\n| Field       | Type    | Options                            | Description                                                                     |\r\n| ----------- | ------- | ---------------------------------- | ------------------------------------------------------------------------------- |\r\n| document    | Integer | Required                           | The ID for the [document](#document) in the project                             |\r\n| edit_access | Bool    | Default: `true` if you have access | If collaborators of this project should be granted edit access to this document |\r\n\r\n[Expandable fields](#expandable-fields): document\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/projects/<project_id>/documents/` - List documents in the project\r\n- `POST /api/projects/<project_id>/documents/` - Add a document to the project\r\n- `PUT /api/projects/<project_id>/documents/` - Bulk update documents in the project\r\n  - This will set the documents in the project to exactly match the list you\r\n    pass in. This means any documents currently in the project not in the\r\n    list will be removed, and any in the list not currently in the project\r\n    will be added.\r\n- `PATCH /api/projects/<project_id>/documents/` - Bulk partial update documents\r\n  in the project\r\n  - This endpoint will not create or delete any documents in the project. It\r\n    will simply update the metadata for each document passed in. It expects\r\n    every document in the list to already be included in the project.\r\n- `DELETE /api/projects/<project_id>/documents/` - Bulk remove documents from\r\n  the project\r\n  - You should specify which document IDs you want to delete using the\r\n    `document_id__in` filter. This endpoint _will_ allow you to remove all\r\n    documents in the project if you call it with no filter specified.\r\n- `GET /api/projects/<project_id>/documents/<document_id>/` - Get a document in the project\r\n- `PUT /api/projects/<project_id>/documents/<document_id>/` - Update document in the project\r\n- `PATCH /api/projects/<project_id>/documents/<document_id>/` - Partial update document in the project\r\n- `DELETE /api/projects/<project_id>/documents/<document_id>/` - Remove document from the project\r\n\r\n#### Filters\r\n\r\n- `document_id__in` &mdash; Filter by specific document IDs, passed in as comma\r\n  separated values.\r\n\r\n### Collaborators\r\n\r\nOther users who you would like share this project with. See [Sharing\r\nDocuments](#sharing-documents)\r\n\r\n#### Fields\r\n\r\n| Field  | Type    | Options         | Description                                                       |\r\n| ------ | ------- | --------------- | ----------------------------------------------------------------- |\r\n| access | String  | Default: `view` | The [access level](#sharing-documents) for this collaborator      |\r\n| email  | Email   | Create Only     | Email address of user to add as a collaborator to this project    |\r\n| user   | Integer | Read Only       | The ID for the [user](#user) who is collaborating on this project |\r\n\r\n[Expandable fields](#expandable-fields): user\r\n\r\n#### Endpoints\r\n\r\n- `GET /api/projects/<project_id>/users/` - List collaborators on the project\r\n- `POST /api/projects/<project_id>/users/` - Add a collaborator to the project\r\n  &mdash; you must know the email address of a user with a DocumentCloud\r\n  account in order to add them as a collaborator on your project\r\n- `GET /api/projects/<project_id>/users/<user_id>/` - Get a collaborator in the project\r\n- `PUT /api/projects/<project_id>/users/<user_id>/` - Update collaborator in the project\r\n- `PATCH /api/projects/<project_id>/users/<user_id>/` - Partial update collaborator in the project\r\n- `DELETE /api/projects/<project_id>/users/<user_id>/` - Remove collaborator from the project\r\n\r\n## Organizations\r\n\r\nOrganizations represent a group of users. They may share a paid plan and\r\nresources with each other. Organizations can be managed and edited from the\r\n[MuckRock accounts site][3]. You may only view organizations through the\r\nDocumentCloud API.\r\n\r\n### Fields\r\n\r\n| Field      | Type    | Options   | Description                                                                                             |\r\n| ---------- | ------- | --------- | ------------------------------------------------------------------------------------------------------- |\r\n| ID         | Integer | Read Only | The ID for the organization                                                                             |\r\n| avatar_url | URL     | Read Only | A URL pointing to an avatar for the organization &mdash; normally a logo for the company                |\r\n| individual | Bool    | Read Only | Is this organization for the sole use of an individual                                                  |\r\n| name       | String  | Read Only | The name of the organization                                                                            |\r\n| slug       | String  | Read Only | The slug is a URL safe version of the name                                                              |\r\n| uuid       | UUID    | Read Only | UUID which links this organization to the corresponding organization on the [MuckRock Accounts Site][3] |\r\n\r\n### Endpoints\r\n\r\n- `GET /api/organizations/` - List organizations\r\n- `GET /api/organizations/<id>/` - Get an organization\r\n\r\n## Users\r\n\r\nUsers can be managed and edited from the [MuckRock accounts site][3]. You may\r\nview users and change your own [active organization](#active-organization) from\r\nthe DocumentCloud API.\r\n\r\n### Fields\r\n\r\n| Field         | Type         | Options   | Description                                                                             |\r\n| ------------- | ------------ | --------- | --------------------------------------------------------------------------------------- |\r\n| ID            | Integer      | Read Only | The ID for the user                                                                     |\r\n| avatar_url    | URL          | Read Only | A URL pointing to an avatar for the user                                                |\r\n| name          | String       | Read Only | The user's full name                                                                    |\r\n| organization  | Integer      | Required  | The user's [active organization](#active-organization)                                  |\r\n| organizations | List:Integer | Read Only | A list of the IDs of the organizations this user belongs to                             |\r\n| username      | String       | Read Only | The user's username                                                                     |\r\n| uuid          | UUID         | Read Only | UUID which links this user to the corresponding user on the [MuckRock Accounts Site][3] |\r\n\r\n[Expandable fields](#expandable-fields): organization\r\n\r\n### Endpoints\r\n\r\n- `GET /api/users/` - List users\r\n- `GET /api/users/<id>/` - Get a user\r\n- `PUT /api/users/<id>/` - Update a user\r\n- `PATCH /api/users/<id>/` - Partial update a user\r\n\r\n## Add-Ons\r\n\r\nAdd-Ons allow you to easily add custom features to DocumentCloud.  [Learn more\r\nabout Add-Ons][7].  Add-Ons are added by installing the [GitHub App][8] in the\r\nrepository you would like to use as an add-on.  The API allows you to view,\r\nedit and run your add-ons.\r\n\r\n### Fields\r\n\r\n| Field         | Type         | Options          | Description                                                                         |\r\n| ------------- | ------------ | ---------------- | ----------------------------------------------------------------------------------- |\r\n| ID            | Integer      | Read Only        | The ID for the add-on                                                               |\r\n| access        | String       | Read Only        | The [access level](#access-levels) for the add-on (will be settable in the future)  |\r\n| active        | Bool         | Default: `false` | Whether this add-on is active for you                                               |\r\n| created_at    | Date Time    | Read Only        | Time stamp when this add-on was created                                             |\r\n| name          | String       | Read Only        | The name of the add-on (set in the configuration)                                   |\r\n| organization  | Integer      | Not Required     | The ID for the [organization](#organizations) this add-on belongs to                |\r\n| parameters    | JSON         | Read Only        | The contents of the config.yaml file from the repository, converted to JSON         |\r\n| repository    | String       | Read Only        | The full name of the GitHub repository, including the account name                  |\r\n| updated_at    | Date Time    | Read Only        | Time stamp when the add-on was last updated                                         |\r\n| user          | Integer      | Read Only        | The ID for the [user](#users) this add-on belongs to                                |\r\n\r\nYour active add-ons are showed to you in the web interface.\r\n\r\n### Endpoints\r\n\r\n- `GET /api/addons/` - List add-ons\r\n- `GET /api/addons/<id>/` - Get an add-on\r\n- `PUT /api/addons/<id>/` - Update an add-on\r\n- `PATCH /api/addons/<id>/` - Partial update an add-on\r\n\r\n### Filters\r\n\r\n- `active` &mdash; Filter by only your active or inactive add-ons \r\n- `query` &mdash; Searches for add-ons which contain the query in their name or description\r\n\r\n### Add-On Runs\r\n\r\nAdd-on runs represent an invocation of an add-on.  You create one to run the\r\nadd-on.  The add-on itself can then update the add-on run as a means of\r\nsupplying feedback to the caller.\r\n\r\n#### Fields\r\n\r\n| Field         | Type         | Options          | Description                                                                                                                                                                            |\r\n| ------------- | ------------ | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\r\n| UUID          | UUID         | Read Only        | The ID for the add-on run                                                                                                                                                              |\r\n| addon         | Integer      | Required         | The ID of the add-on that is being ran                                                                                                                                                 |\r\n| created_at    | Date Time    | Read Only        | Time stamp when this add-on was created                                                                                                                                                |\r\n| dismissed     | Bool         | Default: `false` | Add-on runs are shown to the user until they are dismissed                                                                                                                             |\r\n| file_name     | String       | Write Only       | The add-on must set this to the name of the file supplied to `presigned_url` after uploading the file to make it accessible to the user                                                |\r\n| file_url      | URL          | Read Only        | The URL of a file uploaded via `presigned_url`                                                                                                                                         |\r\n| message       | String       | Not Required     | Add-ons may set infromational messages to the user while running                                                                                                                       |\r\n| parameters    | JSON         | Write Only       | The add-on specific data                                                                                                                                                               |\r\n| presigned_url | URL          | Read Only        | Only included if you set the `upload_file` query parameter to the name of the file to upload.  This is a URL the add-on can directly `PUT` a file to in order to return it to the user |\r\n| progress      | Integer      | Not Required     | Long running add-ons may set this as a percentage of their progress                                                                                                                    |\r\n| status        | String       | Read Only        | The status of the run - `queued`, `in_progress`, `success`, or `failure`                                                                                                               |\r\n| updated_at    | Date Time    | Read Only        | Time stamp when the add-on was last updated                                                                                                                                            |\r\n| user          | Integer      | Read Only        | The ID for the [user](#users) who ran the add-on                                                                                                                                       |\r\n\r\n#### Endpoints\r\n\r\n- `POST /api/addon_runs/` - Create a new add-on run - this will start the run using GitHub actions\r\n- `GET /api/addon_runs` - List add-on runs\r\n- `GET /api/addon_runs<uuid>/` - Get an add-on run\r\n- `PUT /api/addon_runs/<uuid>/` - Update an add-on run\r\n- `PATCH /api/addon_runs/<uuid>/` - Partial update an add-on run\r\n\r\n#### Filters\r\n\r\n- `dismissed` &mdash; Filter by dismissed or not dismissed add-on runs\r\n\r\n\r\n## oEmbed\r\n\r\nGenerate an embed code for a document, page, or annotation using our [oEmbed][4] service.\r\n\r\n### Fields\r\n\r\n| Field     | Type    | Options  | Description                                       |\r\n| --------- | ------- | -------- | ------------------------------------------------- |\r\n| url       | URL     | Required | The URL for the document, page or annotation  to get an embed code for |\r\n| maxwidth  | Integer |          | The maximum width of the embedded resource        |\r\n| maxheight | Integer |          | The maximum height of the embedded resource       |\r\n\r\n### Endpoints\r\n\r\n- `GET /api/oembed/` - Get an embed code for a given URL\r\n\r\n### Examples\r\nNote: The hash symbol (#) in the URL will need to be encoded, which converts it to %23. \r\n\r\nGenerate an embed code for a page: \r\n\r\n- `GET api/oembed?url=https://www.documentcloud.org/documents/23745991-gpo-j6-video-exh-vc9mp4%23document/p1`\r\n\r\nGenerate an embed code for an annotation:\r\n\r\n- `GET /api/oembed?url=https://www.documentcloud.org/documents/23745991-gpo-j6-video-exh-vc9mp4%23document/p1/a2242636`\r\n\r\n## Appendix\r\n\r\n### Access Levels\r\n\r\nThe access level allows you to control who has access to your document by\r\ndefault. You may also explicitly share a document with additional users by\r\ncollaborating with them on a [project](#projects).\r\n\r\n- `public` &ndash; Anyone on the internet can search for and view the document\r\n- `private` &ndash; Only people with explicit permission (via collaboration) have access\r\n- `organization` &ndash; Only the people in your organization have access\r\n\r\nFor notes, the `organization` access level will extend access to all users with\r\nedit access to the document &mdash; this includes [project](#projects)\r\ncollaborators.\r\n\r\n### Statuses\r\n\r\nThe status informs you to the current status of your document.\r\n\r\n- `success` &ndash; The document has been succesfully processed\r\n- `readable` &ndash; The document is currently processing, but is readable during the operation\r\n- `pending` &ndash; The document is processing and not currently readable\r\n- `error` &ndash; There was an [error](#errors) during processing\r\n- `nofile` &ndash; The document was created, but no file was uploaded yet\r\n\r\n### Supported File Types\r\n\r\n| Format                                     | Extension                                       | Type                     | Notes                                                                       |\r\n| ------------------------------------------ | ----------------------------------------------- | ------------------------ | --------------------------------------------------------------------------- |\r\n| AbiWord                                    | ABW, ZABW                                       | Document                 |                                                                             |\r\n| Adobe PageMaker                            | PMD, PM3, PM4, PM5, PM6, P65                    | Document, DTP            |                                                                             |\r\n| AppleWorks word processing                 | CWK                                             | Document                 | Formerly called ClarisWorks                                                 |\r\n| Adobe FreeHand                             | AGD, FHD                                        | Graphics / Vector        |                                                                             |\r\n| Apple Keynote                              | KTH, KEY                                        | Presentation             |                                                                             |\r\n| Apple Numbers                              | Numbers                                         | Spreadsheet              |                                                                             |\r\n| Apple Pages                                | Pages                                           | Document                 |                                                                             |\r\n| BMP file format                            | BMP                                             | Graphics / Raster        |                                                                             |\r\n| Comma-separated values                     | CSV, TXT                                        | Text                     |                                                                             |\r\n| CorelDRAW 6-X7                             | CDR, CMX                                        | Graphics / Vector        |                                                                             |\r\n| Computer Graphics Metafile                 | CGM                                             | Graphics                 | Binary-encoded only; not those using clear-text or character-based encoding |\r\n| Data Interchange Format                    | DIF                                             | Spreadsheet              |                                                                             |\r\n| DBase, Clipper, VP-Info, FoxPro            | DBF                                             | Database                 |                                                                             |\r\n| DocBook                                    | XML                                             | XML                      |                                                                             |\r\n| Encapsulated PostScript                    | EPS                                             | Graphics                 |                                                                             |\r\n| Enhanced Metafile                          | EMF                                             | Graphics / Vector / Text |                                                                             |\r\n| FictionBook                                | FB2                                             | eBook                    |                                                                             |\r\n| Gnumeric                                   | GNM, GNUMERIC                                   | Spreadsheet              |                                                                             |\r\n| Graphics Interchange Format                | GIF                                             | Graphics / Raster        |                                                                             |\r\n| Hangul WP 97                               | HWP                                             | Document                 | Newer \"5.x\" documents are not supported                                     |\r\n| HPGL plotting file                         | PLT                                             | Graphics                 |                                                                             |\r\n| HTML                                       | HTML, HTM                                       | Document, text           |                                                                             |\r\n| Ichitaro 8/9/10/11                         | JTD, JTT                                        | Document                 |                                                                             |\r\n| JPEG                                       | JPG, JPEG                                       | Graphics                 |                                                                             |\r\n| Lotus 1-2-3                                | WK1, WKS, 123, wk3, wk4                         | Spreadsheet              |                                                                             |\r\n| Macintosh Picture File                     | PCT                                             | Graphics                 |                                                                             |\r\n| MathML                                     | MML                                             | Math                     |                                                                             |\r\n| Microsoft Excel 2003 XML                   | XML                                             | Spreadsheet              |                                                                             |\r\n| Microsoft Excel 4/5/95                     | XLS, XLW, XLT                                   | Spreadsheet              |                                                                             |\r\n| Microsoft Excel 97–2003                    | XLS, XLW, XLT                                   | Spreadsheet              |                                                                             |\r\n| Microsoft Excel 2007-2016                  | XLSX                                            | Spreadsheet              |                                                                             |\r\n| Microsoft Office 2007-2016 Office Open XML | DOCX, XLSX, PPTX                                | Multiple formats         |                                                                             |\r\n| Microsoft PowerPoint 97–2003               | PPT, PPS, POT                                   | Presentation             |                                                                             |\r\n| Microsoft PowerPoint 2007-2016             | PPTX                                            | Presentation             |                                                                             |\r\n| Microsoft Publisher                        | PUB                                             | Document, DTP            |                                                                             |\r\n| Microsoft RTF                              | RTF                                             | Document                 |                                                                             |\r\n| Microsoft Word 2003 XML (WordprocessingML) | XML                                             | Document                 |                                                                             |\r\n| Microsoft Word                             | DOC, DOT, DOCX                                  | Document                 |                                                                             |\r\n| Microsoft Works                            | WPS, WKS, WDB                                   | Multiple                 | Microsoft Works for Mac formats since 4.1                                   |\r\n| Microsoft Write                            | WRI                                             | Document                 |                                                                             |\r\n| Microsoft Visio                            | VSD                                             | Graphics / Vector        |                                                                             |\r\n| Netpbm format                              | PGM, PBM, PPM                                   | Graphics / Raster        |                                                                             |\r\n| OpenDocument                               | ODT, FODT, ODS, FODS, ODP, FODP, ODG, FODG, ODF | Multiple formats         |                                                                             |\r\n| Open Office Base                           | ODB                                             | Database forms, data     |                                                                             |\r\n| OpenOffice.org XML                         | SXW, STW, SXC, STC, SXI, STI, SXD, STD, SXM     | Multiple formats         |                                                                             |\r\n| PCX                                        | PCX                                             | Graphics                 |                                                                             |\r\n| Photo CD                                   | PCD                                             | Presentation             |                                                                             |\r\n| PhotoShop                                  | PSD                                             | Graphics                 |                                                                             |\r\n| Plain text                                 | TXT                                             | Text                     | Various encodings supported                                                 |\r\n| Portable Document Format                   | PDF                                             | Document                 | Including hybrid PDF                                                        |\r\n\r\n### Languages\r\n\r\n- ara &ndash; Arabic\r\n- zho &ndash; Chinese (Simplified)\r\n- tra &ndash; Chinese (Traditional)\r\n- hrv &ndash; Croatian\r\n- dan &ndash; Danish\r\n- nld &ndash; Dutch\r\n- eng &ndash; English\r\n- fra &ndash; French\r\n- deu &ndash; German\r\n- heb &ndash; Hebrew\r\n- hun &ndash; Hungarian\r\n- ind &ndash; Indonesian\r\n- ita &ndash; Italian\r\n- jpn &ndash; Japanese\r\n- kor &ndash; Korean\r\n- nor &ndash; Norwegian\r\n- por &ndash; Portuguese\r\n- ron &ndash; Romanian\r\n- rus &ndash; Russian\r\n- spa &ndash; Spanish\r\n- swe &ndash; Swedish\r\n- ukr &ndash; Ukrainian\r\n\r\n### Page Spec\r\n\r\nThe page spec is a compressed string that lists dimensions in pixels for every\r\npage in a document. Refer to [ListCrunch][2] for the compression format. For\r\nexample, `612.0x792.0:0-447`\r\n\r\n### Static Assets\r\n\r\nThe static assets for a document are loaded from different URLs depending on\r\nits [access level](#access-levels). Append the following to the `asset_url`\r\nreturned to load the static asset:\r\n\r\n| Asset          | Path                                                           | Description                                                     |\r\n| ----------     | -------------------------------------------------------------  | --------------------------------------------------------------- |\r\n| Document       | documents/\\<id\\>/\\<slug\\>.pdf                                  | The original document                                           |\r\n| Full Text      | documents/\\<id\\>/\\<slug\\>.txt                                  | The full text of the document, obtained from the PDF or via OCR |\r\n| JSON Text      | documents/\\<id\\>/\\<slug\\>.txt.json                             | The text of the document, in a custom JSON format (see below)   |\r\n| Page Text      | documents/\\<id\\>/pages/\\<slug\\>-p\\<page number\\>.txt           | The text for each page in the document                          |\r\n| Page Positions | documents/\\<id\\>/pages/\\<slug\\>-p\\<page number\\>.position.json | The position of text on each page, in a custom JSON format      |\r\n| Page Image     | documents/\\<id\\>/pages/\\<slug\\>-p\\<page number\\>-\\<size\\>.gif  | An image of each page in the document, in various sizes         |\r\n\r\n\\<size\\> may be one of `large`, `normal`, `small`, or `thumbnail`\r\n\r\n#### TXT JSON Format\r\n\r\nThe TXT JSON file is a single file containing all of the text, but broken out\r\nper page. This is useful if you need the text per page for every page, as you\r\ncan download just a single file. There is a top level object with an `updated`\r\nkey, which is a Unix time stamp of when the file was last updated. There may\r\nbe an `is_import` key, which will be set to `true` if this document was\r\nimported from legacy DocumentCloud. The last key is `pages` which contains the\r\nper page info. It is a list of objects, one per page. Each page object will\r\nhave a `page` key, which is a 0-indexed page number. There is a `contents` key\r\nwhich contains the text for the page. There is an `ocr` key, which is the\r\nversion of OCR software used to obtain the text. Finally there is an `updated`\r\nkey, which is a Unix time stamp of when this page was last updated.\r\n\r\n#### Position JSON Format\r\n\r\nThe position JSON file constains position information for each word of text on\r\nthe page.  It is an optional file, which may be generated depending on the type\r\nof OCR run on the document.  If it exists, it will be a JSON array, which\r\ncontains a JSON object for each word of text.  The object for each word will\r\nhave the following fields:\r\n\r\n* `text` - The text for the current word\r\n* `x1`, `x2`, `y1`, `y2` - The coordinates of the bounding box for this word on\r\n  the page.  Each value will be between 0 and 1 and represents a percentage of\r\n  the width or height of the page.\r\n\r\n\r\n#### Set Page Text\r\n\r\nThe format to set the page text is similar to the text formats described above.\r\nThe `pages` field may be set to a JSON array of page objects, with the\r\nfollowing fields:\r\n\r\n| Field                | Type                  | Options      | Description                                                                  |\r\n| -------------------- | --------------------- | ------------ | ---------------------------------------------------------------------------- |\r\n| page_number          | Integer               | Required     | The page number you would like to set the page text for, zero indexed        |\r\n| text                 | String                | Required     | The updated text for the given page                                          |\r\n| ocr                  | String                | Not Required | An optional identifier for the OCR engine used to generate this text         |\r\n| positions            | Array of JSON Objects | Not Required | Optionally set the position of each word of text, see next table for details |\r\n\r\nThe `position` field in each `pages` object is a JSON array of position\r\nobjects, with the following fields:\r\n\r\n| Field    | Type   | Options      | Description                                                      |\r\n| -------- | ------ | ------------ | ---------------------------------------------------------------- |\r\n| text     | String | Required     | A single word on the page                                        |\r\n| x1       | Float  | Required     | Left most coordinate of the word, as a percentage of page size   |\r\n| x2       | Float  | Required     | Right most coordinate of the word, as a percentage of page size  |\r\n| y1       | Float  | Required     | Top most coordinate of the word, as a percentage of page size    |\r\n| y2       | Float  | Required     | Bottom most coordinate of the word, as a percentage of page size |\r\n| metadata | JSON   | Not Required | Any extra metadata that you would like to store with this word   |\r\n\r\nExample JSON setting just the page text:\r\n\r\n```\r\n[\r\n    {\"page_number\": 0, \"text\": \"Page 1 text\"},\r\n    {\"page_number\": 1, \"text\": \"Page 2 text\"}\r\n]\r\n```\r\n\r\nExample JSON setting the page text and word positions:\r\n\r\n```\r\n[\r\n    {\r\n        \"page_number\": 0,\r\n        \"text\": \"Page 1 text\",\r\n        \"ocr\": \"my-ocr-engine\",\r\n        \"positions\": [\r\n            {\r\n                \"text\": \"Page\",\r\n                \"x1\": 0.1,\r\n                \"x2\": 0.2,\r\n                \"y1\": 0.1,\r\n                \"y2\": 0.2,\r\n                \"metadata\": {\"type\": \"word\"}\r\n            },\r\n            {\r\n                \"text\": \"1\",\r\n                \"x1\": 0.3,\r\n                \"x2\": 0.4,\r\n                \"y1\": 0.1,\r\n                \"y2\": 0.2,\r\n                \"metadata\": {\"type\": \"word\"}\r\n            },\r\n            {\r\n                \"text\": \"text\",\r\n                \"x1\": 0.5,\r\n                \"x2\": 0.6,\r\n                \"y1\": 0.1,\r\n                \"y2\": 0.2,\r\n                \"metadata\": {\"type\": \"word\"}\r\n            }\r\n        ]\r\n    }\r\n]\r\n```\r\n\r\n\r\n### Expandable Fields\r\n\r\nThe API uses expandable fields in a few places, which are implemented by the\r\n[Django REST - FlexFields][5] package. It allows related fields, which would\r\nnormally be returned by ID, be expanded into the fully nested representation.\r\nThis allows you to save additional requests to the server when you need the\r\nrelated information, but for the server to not need to serve this information\r\nwhen it is not needed.\r\n\r\nTo expand one of the expandable fields, which are document in the fields\r\nsection for each resource, add the `expand` query parameter to your request:\r\n\r\n`?expand=user`\r\n\r\nTo expand multiple fields, separate them with a comma:\r\n\r\n`?expand=user,organization`\r\n\r\nYou may also expand nested fields if the expanded field has its own expandable\r\nfields:\r\n\r\n`?expand=user.organization`\r\n\r\nTo expand all fields:\r\n\r\n`?expand=~all`\r\n\r\n[1]: https://jwt.io/\r\n[2]: https://pypi.org/project/listcrunch/\r\n[3]: https://accounts.muckrock.com/\r\n[4]: https://oembed.com\r\n[5]: https://github.com/rsinger86/drf-flex-fields\r\n[6]: https://www.documentcloud.org/help/search/\r\n[7]: https://www.documentcloud.org/help/add-ons/\r\n[8]: https://github.com/apps/documentcloud-add-on"
        },
        {
            "url": "/about/",
            "title": "About",
            "content": "DocumentCloud is a platform founded on the belief that if journalists were more open about their sourcing, the public would be more inclined to trust their reporting. The platform is a tool to help journalists share, analyze, annotate and, ultimately, publish source documents to the open web.\r\nDocumentCloud is a part of the [MuckRock Foundation](http://muckrock.com), 501(c)3 nonprofit organization committed to trust and transparency. [Contact us](mailto:info@documentcloud.org)  with questions about accounts or using DocumentCloud.\r\n\r\nDocumentCloud was founded in 2009 with a grant from the  [Knight News Challenge](http://www.newschallenge.org/) . After two years as an independent nonprofit organization, DocumentCloud  [became a project of Investigative Reporters and Editors](https://blog.documentcloud.org/blog/2011/06/new-home-at-ire/)  in June 2011. In August 2017, DocumentCloud was  [spun off from IRE](https://technical.ly/philly/2017/07/27/knight-foundation-grant-documentcloud-temple-university-temple-university/)  and again became an an independent nonprofit organization, before [merging with the MuckRock Foundation the next year](https://www.muckrock.com/news/archives/2018/jun/11/muckrock-documentcloud-merge-announcement/)."
        },
        {
            "url": "/home/",
            "title": "Home",
            "content": "DocumentCloud is an all-in-one platform used by newsrooms around the world to manage primary source documents. You can use the platform to organize, analyze, annotate, search and embed these resources as needed.\r\n\r\n### Who can use DocumentCloud?\r\n\r\nAnyone is welcome to [search public documents](https://www.documentcloud.org/app) and organize interesting documents into projects.\r\n\r\nNewsrooms and independent journalists can upload documents for annotation and publishing. If you are seeking to upload and publish documents, you need to create an account at [https://accounts.muckrock.com](https://accounts.muckrock.com/)  and request verification.\r\n\r\nArchives and academic projects may also be able to upload documents for annotation and publishing. To get started,  please create an account at [https://accounts.muckrock.com](https://accounts.muckrock.com/)  and request verification.\r\n\r\nDocumentCloud has a [full API](https://www.documentcloud.org/help/api) that you can use to manage large-volume projects and an extensive suite of Add-Ons that facilitate bulk operations and powerful analysis.\r\n\r\nLearn more about how to use our API in our  [full API documentation](https://www.documentcloud.org/help/api) | [Check out our Add-Ons](https://www.documentcloud.org/help/add-ons/)\r\n\r\nIf you want to stay abreast of updates to our services, definitely subscribe to our newsletter. DocumentCloud is a project of the [MuckRock Foundation](http://muckrock.com/), a 501c3 organization that is committed to trust, transparency, and civic engagement."
        },
        {
            "url": "/help/",
            "title": "Help",
            "content": "# DocumentCloud Help\r\n\r\nBelow are resources to help you get the most out of DocumentCloud. Still not finding what you're looking for? [Email us](mailto:info@documentcloud.org) or join the [MuckRock Slack](https://www.muckrock.com/slack/).\r\n\r\n## Help Resources\r\n\r\n* [DocumentCloud FAQ](/help/faq/)\r\n* [DocumentCloud Search](/help/search/)\r\n* [DocumentCloud Add-Ons](/help/add-ons/)\r\n* [DocumentCloud API](/help/api/)\r\n* [DocumentCloud Premium](/help/premium/)"
        },
        {
            "url": "/help/faq/",
            "title": "FAQ",
            "content": "[Back to Help Menu](/help)\r\n\r\n# Frequently Asked Questions\r\n\r\n### General\r\n\r\n * **What is DocumentCloud?** DocumentCloud is a web-based software platform for organizing, researching, annotating, analyzing, and publishing primary source documents. We offer a set of tools that help you find and tell stories in your documents.\r\n\r\n * **How much does it cost?** Thanks to generous funding from the Knight Foundation, DocumentCloud since its launch in 2010 has been offered for free exclusively to verified journalism organizations. We are currently developing a paid model to ensure the DocumentCloud platform’s sustainability.\r\n\r\n * **Who can have accounts?** Anyone may have a DocumentCloud/MuckRock account to view documents, but if you are seeking to upload documents, you need to [request account verification](/help/faq#verification). \r\n\r\n * **Where can I get help?** You can find answers to most questions you have either in this FAQ, one of our other [help pages](/help/) or on our [YouTube channel](https://www.youtube.com/@MuckRockNews/playlists).  If you can’t find the answer or if you’re having trouble using DocumentCloud, please email us at info@documentcloud.org\r\n\r\n * **Where can I read your Terms of Service?** The current Terms of Service for DocumentCloud and MuckRock are viewable on [MuckRock's TOS page](https://www.muckrock.com/tos/).\r\n\r\n### Privacy\r\n\r\n * **What is DocumentCloud’s privacy policy?** Please read our complete [Privacy Policy](https://www.muckrock.com/privacy-policy/) for details.\r\n\r\n * **Who can see documents in my account?** By default, any document you upload is set to “private” access and is viewable only by you. If you set the access level to “private to [your organization name],” then other DocumentCloud users in your organization can view the document, but not the public. If you set it to \"public\", then all users whether registered with a DocumentCloud account or not may view your documents. \r\n\r\n* **Is metadata about uploaded files preserved?** If you are uploading a PDF, then yes, by default the metadata such as authorship, creation date, etc for the document is preserved. However, if you select to perform OCR on the document by selecting \"Force OCR\" or you redact the document, the document's underlying metadata will be wiped. If this is a concern to you, you may want to keep an original copy of the document(s). You can do so using the PDF Exporter Add-On or downloading documents individually. Files that are not PDFs are converted into PDFs using LibreOffice and therefore do not preserve metadata by default. \r\n\r\n### Accounts\r\n\r\n * **Do I have to use my real name for my account?** Yes, the DocumentCloud [Terms of Service](https://www.muckrock.com/tos/) require that accounts use real names and valid email addresses. However, we allow organizations to create one shared account for posting documents and another shared account for use with automation technology or our API. These accounts should have an appropriate name, such as “[Organization name] Documents.” \r\n\r\n * **How do I get an account?** Sign up for an account at the [plan selection plan](https://accounts.muckrock.com/selectplan/). To begin uploading documents to DocumentCloud, however, you must [request account verification](/help/faq#verification).\r\n\r\n * **Can I have more than one account with the same email?** No, you cannot have more than one account with the same email. You can have one account with multiple emails and you can have one account that belongs to multiple organizations, however one organization will be the default. We recommend one individual has one account with all of their emails and organizations associated with that singular account. \r\n\r\n * **How do I log in and log out?** To log in, on the DocumentCloud home page, enter your account email address and password in the login box. To log out, click the “Log Out” link at the top right of the workspace. *Note: Always log out when you’re done working if you’re logged in on a shared computer.*\r\n\r\n * **How do I reset my password?** If you’ve forgotten your password, go to our [password reset page](https://accounts.muckrock.com/accounts/password/reset/). If you are not receiving your password reset link, please check your Spam folder. If you are still having trouble resetting your password, email us at info@documentcloud.org and we will send you the password reset link. \r\n\r\n * **How do I change my email address associated with my account?** To change your email address, visit our [e-mail update page](https://accounts.muckrock.com/accounts/email/)\r\n\r\n* **I lost access to the email tied to my DocumentCloud account and no longer have the password to login. What can I do?** Email us at info@documentcloud.org and let us know the email associated with the DocumentCloud account. We can change the email tied to your DocumentCloud account for you and then you may proceed to reset the password accordingly. \r\n\r\n * **Can I delete my account?** Accounts cannot be deleted, but they can be disabled by another user within your organization who has administrator-level privileges. We (and the public) value the documents you uploaded and made public and are glad to continue to host them.\r\n\r\n### Organizations\r\n\r\n * **How does DocumentCloud organize accounts?** Currently, we create accounts under the umbrella of an organization. That is, each user account is tied to at least one organization. This allows, for example, users within that organization to collaborate privately on documents.\r\n\r\n* **How do I remove users from my organization?** If you're an administrator for the organization, you can manage members by going to the [main account management page](http://accounts.muckrock.com/) and clicking on the organization name you want to change. Then, click \"Manage Members\". \r\n\r\n* **How do I add users to my organization?** If you're an administrator for the organization, the easiest way to add additional users is to send the users the following instructions: \r\n    * Register for a free MuckRock account. \r\n    * Click “Request to Join” from the organization page, which should look something like this: https://accounts.muckrock.com/organizations/daily-bugle/ \r\n    * *Note: Some organizations run into a temporary limit of five users — to raise this limit, click “Upgrade,”   leave the plan on free, and put in a larger number of users, and then click “Update.” You can still have an unlimited number of users with a free account, and we’re working on improving the flow of adding users in a future update.*\r\n\r\n * **How can I get accounts for others in my organization?** Anyone in your organization who has a DocumentCloud account with administrator privileges can send invite links to add users to a DocumentCloud/MuckRock organization. Check around your organization- if you are not sure who has administrator privileges or the admin has since left the organization, please email us at info@documentcloud.org for additional support. \r\n\r\n * **How many user accounts can an organization have under its account?** Currently, there is no limit.\r\n\r\n * **What happens if my organization closes?** Please notify DocumentCloud by email at info@documentcloud.org if your organization is closing, changing its name or experiencing another significant change. We value the documents uploaded and made public by our contributors and ask that you do not delete them. We will be glad to work with you on a transition plan. If the current holder of the documents is not a part of the new organization, you may have to do two transfers where the documents are transferred from User A from Organization A -> User B who is part of Organization A & B -> User C who is a member of Organization B. \r\n\r\n* **What happens to my documents if my organization closes?** Documents that are owned by an organization can be downloaded locally using the PDF Exporter Add-On that is enabled by default. Organizational administrators may opt to have the ownership of documents be transferred to users within the organization by selecting the documents, clicking \"Edit\" and then \"Change Owner\" or using the \"Move Account\" Add-On to transfer documents to another user or organization. **Note:** The Move Account Add-On requires that the account transferring the documents over to a new organization must be a member of that organization. \r\n\r\n* **What happens to my DocumentCloud account if my organization closes?** You retain your DocumentCloud account and can [change the email address](https://accounts.muckrock.com/accounts/email/) associated with the account to that of another organization or your personal email address. \r\n\r\n * **What happens to my documents and account if I leave my current organization?** You are able at any time to download the original documents you uploaded to our service. You may use the PDF Exporter Add-On, which is enabled by default, to download documents locally. Generally, we defer to each organization regarding disposition of the documents you uploaded to DocumentCloud while in their employment. Every organization has its own rules governing ownership of material generated while in their employment or service.\r\n\r\n* **Our organization is changing its name. How do we change our name on MuckRock/DocumentCloud?** Please email us at info@documentcloud.org and we would be more than happy to change your organization's name in our system. \r\n\r\n* **Our organization is being purchased by a larger organization or merged with another team. How do we transfer the ownership of our documents over to the new team?**  You may transfer documents in batches of 25 at a time by selecting the documents, clicking on \"Edit\" -> \"Change Owner\" and selecting the new organization's name as well as a user in that organization to transfer the documents to. You may also use the \"Move Account\" Add-On to transfer large sets of documents over to the other organization. **Note:** The Move Account Add-On requires that the account transferring the documents over to a new organization must be a member of that organization. If the current holder of the documents is not a part of the new organization, you may have to do two transfers where the documents are transferred from User A from Organization A -> User B who is part of Organization A & B -> User C who is a member of Organization B. \r\n\r\n\r\n### Verification \r\n* **How do I verify my account?** If you are part of an organization, you should first [check](https://accounts.muckrock.com/organizations/) that your organization does not already exist as an entity on MuckRock & DocumentCloud. <br><br> If you do find your organization, you can click \"Request to join\" from the left sidebar of the organization or contact an administrator to add you directly.  Members of organizations that have already been verified do not need to independently be verified. <br><br>If your organization does not exist on MuckRock & DocumentCloud services already, you can [create an organization](https://accounts.muckrock.com/organizations/~create). <br><br>\r\nIf you are an established freelancer, you can skip searching for your organization or creating an organization on MuckRock and DocumentCloud entirely. <br><br> If you are a freelancer or part of a new organization that needs verification, you can find the \"Request Verification to Upload\" button to the right of the Upload button when you log into DocumentCloud. If you have joined a verified organization, you won't see this button as you do not need to be verified individually as a member of a verified organization. \r\n\r\n### DocumentCloud Premium\r\n\r\n* **What is DocumentCloud Premium?** DocumentCloud premium features are available to both [paid professional and organizational](https://accounts.muckrock.com/selectplan/) accounts on MuckRock. DocumentCloud premium features include access to AI credits to perform advanced analysis on documents, access to Amazon's Textract OCR engine, and a growing feature list. For a full feature list, read more at the [DocumentCloud Premium page](/help/premium/). Paid professional and organizational accounts on MuckRock also gain access to monthly request credits on MuckRock, the ability to embargo requests, and bulk purchasing rates. Upgrading your account is done by visiting the [plan selection page](https://accounts.muckrock.com/selectplan/). \r\n\r\n### Search\r\n\r\n * **What can I find in the [public catalog](/app?q=)?** We feature more than a million documents provided by contributors ranging from The New York Times to The Guardian and hundreds of large and small news organizations, freelance journalists and others who report using public documents.\r\n\r\n * **Do I need an account to search the [public catalog](/app?q=)?** No, you do not. We are proud to provide a valuable public resource at no cost.\r\n\r\n * **Will getting a DocumentCloud account allow me to see more documents?** No, whether you have an account or not, the only documents available for viewing are those explicitly shared by the users who upload them. The only exception is if you get added to a DocumentCloud/MuckRock organization that has documents set with permissions \"private to your organization\" - then you will get access to those documents as well. \r\n\r\n * **How can I search documents?** Using the search bar in the workspace, type the text you’d like to find in documents or search by attributes including Title, User, Project, Organization, Access, and more. Learn more in our documentation on [searching](/help/search)\r\n\r\n * **Can I contribute documents to your public catalog?** To start contributing, [register](https://accounts.muckrock.com/selectplan/) for an account. You will need to go through [account verification](/help/faq#verification) to being uploading documents. \r\n\r\n### Uploading\r\n\r\n* **What kinds of file types can I upload?** The most common file type our users upload is PDF, but DocumentCloud can also convert over  [70 file types](https://www.muckrock.com/news/archives/2020/nov/10/release-notes-beta-document-types/) into PDFs. This includes Word documents, Excel spreadsheets, PowerPoint presentations, HTML and image files. We cannot process video, audio or closed-format files such as Outlook PST files. \r\n\r\n* **Is there a limit on the size of a file I can upload?** Yes, 500 MB is the largest file you can upload. If your file is larger than 500MB, you can try to use the PDF Compression DocumentCloud Add-On to compress the document before upload. If it is still too large, you will need to split the document up into smaller files. \r\n\r\n* **Are there restrictions on the content of documents I can upload?** DocumentCloud is intended to be a repository of public documents. Our [Terms of Service](https://www.muckrock.com/tos/) prohibits uploading copyrighted material that is not yours.\r\n\r\n* **How long does it take to process a document?** Processing times will vary depending on the size of the file and whether or not it needs to be OCR'd. Small documents are usually processed in a minute or less; larger documents or large sets of documents might take slightly longer.\r\n\r\n* **What does DocumentCloud do with the documents I upload?** When you upload a document, we save the original file. We extract images of each page in several sizes for our workspace and embeds. If there is a text layer embedded in the document, we retrieve that and store it in a database for searching. If there isn’t, we OCR the document to capture text.\r\n\r\n* **When I upload a document, can anyone else see it?** By default, documents are set to private access upon upload. That means only you can see it. You have the option of setting access to “private to your organization,” meaning anyone else with a DocumentCloud account in your organization can see it, or “public,” meaning it’s viewable by everyone.\r\n\r\n* **Can I OCR my documents in languages other than English?** Yes, we offer OCR in over 90 languages via the Tesseract OCR engine’s language packs. Select the document you want to OCR in the search viewer, click \"Edit\" -> \"Force Reprocess\" -> \"Force OCR\" and select the appropriate OCR language. \r\n\r\n* **I don’t see the language I need for OCR. Can you add it?** We are glad to add languages supported by Tesseract, the OCR engine we use. Please see the [list of languages supported by Tesseract](https://en.wikipedia.org/wiki/Tesseract_%28software%29) and contact us by email at info@documentcloud.org to discuss your needs.\r\n\r\n* **What do I do when my documents are stuck in processing?**\r\nDocuments should almost never take more than a minute to process, but if they do there are a few steps you can take. First, try refreshing the page. Sometimes a document has processed, but it didn’t let your browser know for some reason. If you see this regularly, let us know, and include your browser and operating system. Second, try uploading a new copy of the document. This is a short term fix but sometimes just re-uploading will fix the issue. Finally, if the above doesn't work and a document has been processing for more than five minutes, [please get in touch](mailto:info@documentcloud.org). If possible, include a sample of the file you were uploading, anything special that might have been related (such as uploading a large number of documents, changing the status of documents, very large documents), and the browser and operating system you were using. \r\n\r\n* **What do I do if I have a lot of documents that have failed to upload?**\r\nIf you filter your uploads by typing status: in the search bar and select error or nofile and notice a lot of documents that did not upload correctly, you can delete them by running the [Clear Failed Uploads Add-On](/app?q=#add-ons/MuckRock/clear-failed-uploads). This will allow you to clear all failed uploads without having to delete them 25 documents at a time. If you are uploading large sets of documents, it is encouraged to use the [DocumentCloud batch upload script](https://www.muckrock.com/news/archives/2023/jan/27/introducing-documentclouds-bulk-upload-script/) to avoid receiving a lot of errors during processing. The Batch upload script includes a flag ```--reupload_errors``` that you can use to go through the documents that have failed to upload the first time and re-attempt. The script also keeps track of your uploads in a database file which you can use [DB Browser](https://sqlitebrowser.org/) to view the tables in the database, filter by failed uploads or run queries against the database. \r\n\r\n\r\n\r\n### Working with Documents\r\n\r\n* **What information can I add to my documents?** You can add several pieces of information either before or after uploading. These include a source, description, published URL and related article URL. To access these fields after you’ve uploaded a document, select the document and choose “Edit Document Information.”\r\n\r\n* **What is the difference between Related Article URL and Published URL?** Use the Related Article URL to tell readers the location of the article that uses this document as source material. Adding a URL in this field creates a Related Article link in the sidebar of the full viewer. The Published URL is the page where the document is embedded. If a document might be accessed at more than one URL, however, you can specify the URL we should send users to if they find the document through a search of DocumentCloud.\r\n\r\n* **How can I add custom data (tags) to organize and search my documents?** DocumentCloud allows you to define and search your own set of custom data (key/value pairs) associated with specific documents. To edit data for individual documents in the workspace, select the documents you wish to update, and choose “Edit Document Data” from the “Edit” menu. See “Filter Fields” in our [search documentation](/help/search#filter-fields), specifically data_ and tag: search fields to learn more.\r\n\r\n* **How do I change the order of pages in a document I uploaded?** Click on the document to open it in the document workspace. In the sidebar, click “Modify Pages” You’ll see thumbnails of all the pages in your document. Select the pages you would like to move, then select \"Move\". \r\n\r\n * **How do I insert or replace pages in a document I uploaded?** To insert one or more pages:\r\n\t* Click on the document to open it in the document workspace.\r\n\t* In the sidebar, click “Modify Pages” You’ll see thumbnails of all the pages in your document.\r\n\t* To insert new pages at a specific position within the document, click between the pages you'd like the pages to be inserted and select \"Insert from other document\" \r\n\t* When you’re ready, click the “Apply Modifications” button.\r\n\r\n * **How do I remove pages from a document I uploaded?** We recommend you retain a backup of your document before removing pages. To remove one or more pages:\r\n\t* Click on the document to open it in the document workspace. In the sidebar, click “Remove Pages.” You’ll see thumbnails of all the pages in your document.\r\n\t* Select the pages you’d like to delete from your document, and then click “Modify Pages”, select the pages you'd like removed, and then click \"Remove\" and then when you are certain you'd like those pages to be removed select \"Apply modifications\"\r\n\t* Note that once you remove pages they are permanently deleted and your original document is replaced.\r\n\r\n * **How do I redact portions of my document?** We recommend you retain a backup of your document before making redactions. To redact a portion of a document:\r\n\r\n\t* Click on the document to open it in the document workspace. In the sidebar, click “Redact Document.”\r\n\t* Click and drag to draw a black rectangle over each portion of the document you’d like to redact. (You can redact more than one section at a time.)\r\n\t* When finished, click “Save Redactions.”\r\n\r\n * **Does redacting a document also remove the text extracted from it?** When you redact a portion of a document, we erase all data related to the redacted information, create a new redacted document, and delete the original document. Any text that was part of the redacted portion is deleted.\r\n\r\n * **Can I change the orientation of a page in a document?** Yes, click on the document, click \"Modify Pages\", select the page(s) you'd like to rotate, select \"Rotate\" until you have achieved the desired orientation, and then hit \"Apply Modifications\" \r\n\r\n * **How do I delete documents?** To delete an entire document, select the document by clicking the check mark next to the document from the search menu, click on the \"Edit\" button and then \"Delete\". \r\n\r\n * **Once I delete a document, can I get it back?** No, once you delete a document it’s permanently deleted from our platform.\r\n\r\n### Analyzing Data in Documents\r\n\r\n * **How can I see entities extracted from my documents?** Select a document in the workspace. Under the “Edit” menu, select “Entities” and select \"Extract entities\" \r\n\r\n### Working With OCR and Document Text\r\n\r\n * **What kind of OCR software does DocumentCloud use?** We use Tesseract, an open-source OCR engine. Google currently sponsors development. [DocumentCloud Premium](/help/premium) users also have access to Amazon's Textract OCR which performs much better on scanned documents, handwritten text, and table extraction. \r\n\r\n * **Do you OCR every document I upload?** No. If your document contains embedded text and you have not selected the Force OCR option, we save the underlying text in our database. We use OCR  when there’s no text layer. You may force a document to be OCR'd by selecting the document with the check mark in the search view, clicking \"Edit\" -> \"Force Reprocess\" -> \"Force OCR\" and selecting the appropriate language for the document. Note, that if you Force OCR this way, it will run OCR on the entire document, not only the portions of the document that do not have a text layer. \r\n\r\n* **If I run OCR on a document when I upload it to DocumentCloud, can I recover the original underlying text layer?** Forcing OCR means the original text layer is lost. Force re-processing it will not recover the lost text layer. You will need to re-upload the original document. If you do not select force OCR, the text layer remains intact. \r\n\r\n * **Can I OCR a document even if it has text embedded in it?** Yes. Double-click the document to open it in the document workspace. In the sidebar, click “Reprocess Text.” In the dialog, click “Force OCR.”\r\n\r\n * **How do I download all the text from a document?** Open the document, in the bottom right hand corner toggle the drop-down menu to \"Plain Text\", from there you can copy & paste the plaintext from the document. \r\n\r\n### Annotating\r\n\r\n * **What are notes?** In DocumentCloud, notes are a way to highlight important sections of documents with a short headline and explanatory text. Notes can either be private — viewable only by you, Collaborator- meaning anyone who has been added as a collaborator on the document can view it — or public, meaning anyone who has view access to the document can see the annotations.\r\n\r\n * **How do I add a note?** To add a note:\r\n\t* Click on the document to open it in the document workspace. In the sidebar, click \"Add Note\"\"\r\n\t* Drag your cursor to draw a box over the area of the document you want to highlight.\r\n\t* When you release your cursor, you’ll see a dialog box that lets you add a short headline and some explanatory text. It will also show you the access control restrictions (public, collaborator, private) when you create the note. \r\n\t* When done, click “Save.”\r\n\r\n * **How do I edit an existing note?** Find the note on your document and select it. Click the pencil icon to the right of the headline to edit the note.\r\n\r\n * **What is the difference between public and private notes?** Public notes are visible to anyone who has access to the document. Private notes are viewable only by the person who uploaded the document.\r\n\r\n * **Can I make a private note public or vice-versa?** Yes, click on the note you'd like to change the access level on, click the pencil icon, and change from private to public or collaborator as needed. \r\n\r\n * **What is a page note, and how do I add one?** Instead of highlighting a portion of a document, you can create a note that appears at the top of a page. To do this, follow the directions for creating a note and, rather than drawing a box on a page, click in between any two pages (or above the first page).\r\n\r\n * **How can I format the text of notes to make words bold, italic, etc.?** You can format text in notes by using some basic HTML codes. For example, to bold a phrase, precede it with a b tag and end it with a closing b tag.\r\n\r\n * **How do I publish a note?** Any public notes you create are visible (i.e., published) as soon as you set the document’s access level to “public.”\r\n\r\n * **How do I link directly to a note from a website?** Each public note has a specific URL that you can share. To find it, select the note. Then click the chain-link icon to the right of the small headline. In your browser, the URL will change to the note link. Copy that and use it on your website. When a reader clicks the link, they’ll be directed to the document with the note open. If you are seeking to embed a note on your website, open the document that contains the note, click \"Share\" from the right-hand menu,  select \"Share specific note\" and finally select the note you'd like to embed. \r\n\r\n### Projects\r\n\r\n * **What are projects?** Projects are labels you can apply to groups of documents to organize them by topic or project. A document can live in more than one project.\r\n\r\n * **How do I create a project?** In the workspace at left, click the “New Project” button. Give your project a name and click “Save.”\r\n\r\n * **Can I create sub-projects inside a project?** Not at this time. However, if you are looking for a way to easily organize, filter or search your documents, we recommend you add custom data.\r\n\r\n * **How do I add documents to a project?** You can drag and drop the file icon on the project title at the left of the workspace. Or highlight the file in the workspace, click the “Projects” icon, and choose the name of a project.\r\n\r\n * **How do I remove documents from a project?** Select a document in the workspace. Click the “Projects” menu, which displays all your project titles. You’ll see a check mark next to each project the document belongs to. Find the project that you want to remove the document from, and click the title to remove the check mark.\r\n\r\n * **How do I share a project with others?** In the workspace, hover over the name of your project in the project list, then click the pencil icon to show the project editing dialog. Click “Add a collaborator to this project.” You can add email addresses of people who have DocumentCloud accounts — whether they belong to your organization or another.\r\n\r\n * **Can I make all documents in a project public at once?** At this time, no. You can edit the access level of up to 25 documents at a time in the workspace by selecting the blank box next to the \"Edit\" drop-down, selecting \"Edit\", \"Change Access\" and selecting the appropriate access level.  If you have a large number of documents, please contact us by email at info@documentcloud.org to discuss other options.\r\n\r\n### Collaboration\r\n\r\n * **How can I see documents others in my organization have uploaded?** In the search bar, clear the search queries and type organization: and type the name of your organization. You can combine the query with access: to see what documents from your organization have public, private or organization on them.\r\n\r\n* **How do I share documents with a collaborator?** If you’d like to share specific documents with others, first add the documents to a project. Then click the pencil icon next to the project name in the sidebar, and then “Manage Collaborators.”  Collaborators must have an existing DocumentCloud account that’s linked to that email and they must have logged in to DocumentCloud at least once in order to be added to a project. \r\n\r\n### Embedding and Sharing documents\r\n\r\n * **What options do I have for embedding documents?** We currently offer four embed types:\r\n\t* **Document**: A viewer that shows the complete document, including attribution, all notes, and an available sidebar with navigation and attribution.\r\n\t* **Page**: A lightweight, responsive single page that includes attribution and click-through to the full document.\r\n\t* **Note**: A single annotation that includes attribution and click-through to the full document.\r\n\t* **Projects**: A collection of documents organized by project. \r\n\r\n * **How do I embed a document, page, notes or collection of documents on my website?** \r\n**For documents, pages, and notes** \r\nClick on the document you are seeking to embed to open it in the viewer, on the right side-bar click \"Share\" and follow through with the share option that you desire. **For projects**, click on the pencil icon to the right of the project name, and click \"Share/Embed Project\". \r\n\r\n * **How do I make a document visible to the public?** In the workspace, select one or more documents. Under the “Edit” menu, choose “Change Access” Pick one of the three options. “Public Access” means anyone on the internet can search for and view the document. “Private Access” means only you and people with explicit permission (via collaboration) have access. “Private to your organization” means only the people in your organization have access. (No freelancers.)\r\n\r\n * **How do I set a time for a document to become public?** If your document is private, you can set a publication date for it:\r\n\t* Select one or more documents in the workspace. Click \"Edit\" -> \"Change Access\" -> \"Schedule publication\" \r\n\t* Choose the date and time for publication and press “Change” The date and time set will show in the workspace. If you change your mind, re-open the change access menu and de-select the \"Schedule publication\" and hit \"Change\". \r\n\r\n * **How do I get the URL of a document I want to share?** Make sure you have set the document’s access to “public.” Double-click a document to open it in the document workspace. The URL for sharing the document is now in your browser’s URL bar. The format is/documents/[ID Number]-[document-title-words].html\r\n\r\n * **Will DocumentCloud work with my CMS?** DocumentCloud is used by hundreds of organizations worldwide with many different content management systems. If your CMS offers the ability to embed snippets of JavaScript and HTML, you should be fine. We’re available to talk with your CMS’s developers to iron out any questions. Please contact us by email at info@documentcloud.org\r\n\r\n * **Do embeds work on phones?** Yes, all of our embed types can be viewed on phone or tablet screens. However, our page and note embeds are responsive and are the best choice for display on those devices.\r\n\r\n * **How do I embed documents using WordPress?** The best way is to install our custom WordPress plugin, which lets you embed by entering shortcodes into your text. See the [documentation](https://github.com/documentcloud/wordpress-documentcloud) for details. In most installations, you can do this right from your site’s plugin section. Usually, you can just drop a DocumentCloud link right in line and it should embed the document. Depending on your configuration, you might need to use a shortcode formatted like this: [documentcloud url=\"https://www.documentcloud.org/documents/282753-lefler-thesis.html\"]\r\n\r\n * **How do I change the width or height of an embed on my site?** Select the \"Customize Appearance\" drop-down menu when in the embed page. \r\n\r\n * **Can readers make changes to documents I embed?** No, only you or DocumentCloud users in your organization can make changes to your documents.\r\n\r\n * **Can I prevent people from downloading my original document?** If you embed a document with our full viewer, you can disable the link to the original PDF that appears in the sidebar by selecting \"Customize Appearance\" and changing \"PDF Link\" to \"Hidden\" (it is visible by default). Nevertheless, once you set your document to public access, it will appear in Internet search results, and people will be able to download it.\r\n\r\n * **How do I make the sidebar show or hide in the document viewer?** Click \"Customize Appearance\" and change \"Sidebar behavior\" to \"hidden\". When embedding the viewer at narrow widths, hiding the sidebar is usually a good idea.\r\n\r\n### Keyboard Shortcuts\r\n\r\nFrom the *Document* view: \r\n\r\n+ **A:** Start annotating a document.\r\n+ **R:** Start redacting a document.\r\n+ **S:** Add or edit page sections.\r\n+ **Ctrl/CMD+F:** Start searching through page.\r\n+ **Esc:** Cancel the current action.\r\n\r\n### DocumentCloud API\r\n\r\n * **What is the DocumentCloud API?** DocumentCloud’s API provides resources to search, upload, edit, and organize documents as well as to work with projects. In addition, an oEmbed service provides easy integration of embedding documents, pages and notes. Full [documentation](/help/api) is available.\r\n\r\n * **Do you need a DocumentCloud account to use the API?** As with DocumentCloud’s workspace, you need an account to use the API to upload, update or delete documents, or create and modify projects. Other API functions, such as search, do not require an account. Consult the [documentation](/help/api) for details.\r\n\r\n * **What libraries are available for working with the API?** We have provided an open-source Python wrapper for the DocumentCloud API, which is well [documented](https://documentcloud.readthedocs.io/en/latest/). \r\n\r\n * **Are there limits on API use?** Yes, please see the [rate limit documentation](/help/api#rate-limits)."
        },
        {
            "url": "/help/search/",
            "title": "Search",
            "content": "[Back to Help Menu](/help)\r\n\r\n# DocumentCloud Search\r\n\r\n## Contents\r\n\r\n- [Syntax](#syntax)\r\n- [API](#api)\r\n\r\n## Introduction\r\n\r\nDocumentCloud's search is powered by [Solr][1], an open source search engine by the Apache Software Foundation. Most of the search syntax is passed through directly to Solr — you can read [Solr's documentation][2] directly for information on how its syntax works. This document will reiterate the parts of that syntax that are applicable to DocumentCloud, as well as parts of the search that are specific to DocumentCloud.\r\n\r\n## Syntax\r\n\r\n### Specifying Terms\r\n\r\nYou may specify either single words to search for, such as `document` or `report`, or a phrase of multiple words to be matched as a whole, by surrounding it in double quotes, such as `\"the mueller report\"`.\r\n\r\n### Wildcard Searches\r\n\r\nTerms can use `?` to match any single character. For example `?oat` will match both goat and boat. You may use `*` to match zero or more characters, so `J*` will match J, John, Jane or any other word beginning with a J. You may use these in any position of a term — beginning, middle or end.\r\n\r\n*Note:* This feature is only available to authenticated users.  You may register for a free account at <https://accounts.muckrock.com/> to use this feature.\r\n\r\n### Fuzzy Searches\r\n\r\nBy appending `~` to a term you can perform a fuzzy search which will match close variants of the term based on edit distance. [Edit distance][3] is the number of letter insertions, deletions, substitutions, or transpositions needed to get from one word to another. This can be useful for finding documents with misspelled words or with poor OCR. By default `~` will allow an edit distance of 2, but you can specify an edit distance of 1 by using `~1`. For example, `book~` will match book, books, and looks.\r\n\r\n*Note:* This feature is only available to authenticated users.  You may register for a free account at <https://accounts.muckrock.com/> to use this feature.\r\n\r\n### Proximity Searches\r\n\r\nProximity searches allow you to search for multiple words within a certain distance of each other. It is specified by using a `~` with a number after a phrase. For example, `\"mueller report\"~10` will search for documents which contain the words mueller and report within 10 words of each other.\r\n\r\n### Ranges\r\n\r\nRange searches allow you to search for fields that fall within a certain range. For example, `pages:[2 TO 20]` will search for all documents with 2 to 20 pages, inclusive. You can use `{` and `}` for exclusive ranges, as well as mix and match them. Although this is most useful on numeric and date [fields](#fields), it will also work on text fields: `[a TO c]` will match all text alphabetically between a and c.\r\n\r\nYou can also use `*` for either end of the range to make it open ended. For example, `pages:[100 TO *]` will find all documents with at least 100 pages, while `pages:[* to 20]` will find all documents with at most 20 pages.\r\n\r\n### Boosting\r\n\r\nBoosting allows you to alter how the documents are scored. You can make one of your search terms more important in terms of ranking. Use the `^` operator with a number. By default, terms have a boost of 1. For example, `mueller^4 report` will search for documents containing mueller or report but give more weight to the term mueller.\r\n\r\n### Fields\r\n\r\nBy default, text is searched through title and source boosted to 10, description boosted to 5, and text boosted to 1. You can search any field specifically by using `field:term` syntax. For example, to just search for documents with report in the title, you can use `title:report`. The fielded search only affects a single term — so `title:mueller report` will search for mueller in the title, and report in the default fields. You can use `title:\"mueller report\"` to search for the exact phrase \"mueller report\" in the title, or use [grouping](#grouping-terms), `title:(mueller report)` to search for mueller or report in the title.\r\n\r\n### Boolean Operators\r\n\r\nYou can require or omit certain terms, or apply more complex boolean logic to queries. You can require a term by prepending it with `+` and can omit a term by prepending it with `-`. You can also omit a term by preceding it with `NOT`. You can require multiple terms by combining them with `AND`, and require either (or both) terms by combining them with `OR`. For example, `mueller AND report` requires both mueller and report be present. `+mueller -report` would require mueller be present and require report to not be present. By default, multiple terms are combined with `OR` — but see [filter fields](#filter-fields) for how they are handled specially. These boolean operators must be uppercase, or else they will be treated as search terms.\r\n\r\n### Grouping Terms\r\n\r\nYou can use parentheses to group terms, allowing for complex queries, such as `(mueller OR watergate) AND report` to require either mueller or watergate, and report to appear.\r\n\r\n### Specifying Dates and Times\r\n\r\nDate times must be fully specified in the form `\"YYYY-MM-DDThh:mm:ssZ\"` where YYYY is the year, MM is the month, DD is the day, hh is the hour, mm is the minutes, and ss is the seconds. T is the literal T character and Z is the literal Z character. These are always expressed in UTC time. You may optionally include fractional seconds (`\"YYYY-MM-DDThh:mm:ss.fZ\"`).  You must quote these for them to work in search queries.\r\n\r\nYou may also use `NOW` to stand in for the current time. This is most useful when combined with date time math, which allows you to add or subtract time in the following units:\r\n`YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND`. For example `NOW+1DAY` would be one day from now. `NOW-2MONTHS` would be 2 months in the past.\r\n\r\nYou may also use `/` to round to the closest time unit. For example, `NOW/HOUR` is the beginning of the current hour. These can be combined: `NOW-1YEAR+2MONTHS/MONTH` would be the beginning of the month, 2 months past one year ago. These are useful with [ranged](#ranges) searches: `[NOW-1MONTH TO *]` would be all dates in the past month.\r\n\r\n### Sorting\r\n\r\nYou may sort using the syntax `sort:<sort type>`. Possible sortings include:\r\n\r\n- `score` (highest score first; default)\r\n- `created_at` (newest first)\r\n- `page_count` (largest first)\r\n- `title` (alphabetical)\r\n- `source` (alphabetical)\r\n\r\nThese may be reversed by prepending a `-` (`sort:-page_count`). You may use `order` as an alias to `sort`.\r\n\r\n### Escaping Special Characters\r\n\r\nSpecial characters may be escaped by preceding them with a `\\` — for example, `\\(1\\+1\\)` will search for a literal \"(1+1)\" in the text instead of using the characters’ special meanings. If your query contains a syntax error, the parser will automatically escape your query to make a best effort at returning relevant results. The [API response](#api) will contain a field `escaped` informing you if this auto-escape mechanism was triggered.\r\n\r\n### Filter Fields\r\n\r\nThe following fields may be searched on, which will filter the resulting documents based on their properties. By default, all fields included in the query are treated as required (e.g. `user:1 report` will show only documents from user 1 scored by the text query “report”). If you include multiple of the same field, the query is equivalent to applying `OR` between each of the same field (e.g. `user:1 user:2 report` will show documents by user 1 or 2). If you include distinct fields, the query is equivalent to applying `AND` between each set of distinct fields (e.g. `user:1 user:2 tag:email` will find documents by user 1 or 2 and which are tagged as email). If you use any explicit boolean operators (`AND` or `OR`), that will take precedence (e.g. `(user:1 AND tag:email) OR (user:2 AND tag:contract)` would return documents by user 1 tagged as email as well as documents by user 2 tagged as contract. This allows you to make complex boolean queries using any available field.\r\n\r\nAvailable fields:\r\n\r\n- **user** \r\n<br> Specify using the user ID. Also accepts the slug preceding the ID for readability (e.g. `user:mitchell-kotler-1`). `account` is an alias for user.\r\n- **organization**\r\n <br>  Specify using the organization ID. Also accepts the slug preceding the ID for readability (e.g. `organization:muckrock-1`). `group` is an alias for organization.\r\n- **access**\r\n  <br> Specify the access level. Valid choices are `public`, `organization`, and `private`.\r\n- **status**\r\n  <br> Specify the status of the document. Valid choices are `success`, `readable`, `pending`, `error`, and `nofile`.\r\n- **project**\r\n  <br> Specify using the project ID. Also accepts the slug preceding the ID for readability (e.g. `project:panama-papers-1`). `projects` is an alias for project.\r\n- **document**\r\n  <br> Specify using the document ID. Also accepts the slug preceding the ID for readability (e.g. `document:mueller-report-1`). `id` is an alias for document.\r\n- **language**\r\n  <br> Specify the language the document is in. Valid choices include:\r\n    - ara - Arabic\r\n    - zho - Chinese (Simplified)\r\n    - tra - Chinese (Traditional)\r\n    - hrv - Croatian\r\n    - dan - Danish\r\n    - nld - Dutch\r\n    - eng - English\r\n    - fra - French\r\n    - deu - German\r\n    - heb - Hebrew\r\n    - hun - Hungarian\r\n    - ind - Indonesian\r\n    - ita - Italian\r\n    - jpn - Japanese\r\n    - kor - Korean\r\n    - nor - Norwegian\r\n    - por - Portuguese\r\n    - ron - Romanian\r\n    - rus - Russian\r\n    - spa - Spanish\r\n    - swe - Swedish\r\n    - ukr - Ukrainian\r\n- **slug**\r\n  <br> Specify the slug of the document.\r\n- **created_at**\r\n  <br> Specify the [date time](#specifying-dates-and-times) the document was created.\r\n- **updated_at**\r\n  <br> Specify the [date time](#specifying-dates-and-times) the document was last updated.\r\n- **page_count**\r\n  <br>  Specify the number of pages the document has. `pages` is an alias for page_count.\r\n- **data\\_\\***\r\n  <br>  Specify arbitrary key-value data pairs on the document (e.g. the search query `data_color: blue` returns documents with data `color`: `blue`). Note that color is the key and blue is the value. Key/value pairs are case and spelling sensitive. If you want to find any document with a color key you can use `data_color:*`. You can use `-data_color:*` if you want to find any documents that do not have a key/value pair for color. \r\n- **tag**\r\n  <br> This is an alias to `data__tag` which is used by the site as a simple tagging system. Searching for tags is case and spelling sensitive. To  find any documents that are tagged, you can use `tag:*`. You can use - to indicate you want to exclude results with that tag result. For example, `-tag:significant` would remove all documents from the search that are tagged as significant. \r\n\r\n### Text Fields\r\n\r\nText fields can be used to search for text in a particular field of the document. They are used to score the searches and are always treated as optional unless you use `+` or `AND` to require them.\r\n\r\n- **title**\r\n  <br>  The title of the document.\r\n- **source**\r\n  <br> The source of the document.\r\n- **description**\r\n  <br> The description of the document.\r\n- **text**\r\n  <br> The full text of the document, as obtained by text embedded in the PDF or by OCR. `doctext` is an alias for text.\r\n- **page_no\\_\\* **\r\n  <br> You may search the text on the given page of a document. To find all documents which contain the word report on page 2, you could use `page_no_2:report`.\r\n\r\n## Example Queries\r\n### Date ranges:\r\n\r\nFind all documents uploaded by user 102112 in the last month <br>\r\n```+user:102112 created_at:[NOW-1MONTH TO *] ``` <br>\r\n<br>\r\nFind all documents uploaded by user 102112 in the last 11 months. <br>\r\n```+user:102112 created_at:[NOW-11MONTH TO *]``` <br>\r\n<br>\r\nFind all documents uploaded by user 102112 between 11 months ago and 3 months ago. <br>\r\n```+user:102112 created_at:[NOW-11MONTH TO NOW-3MONTH] ```<br>\r\n<br> \r\nFind all documents uploaded by user 102112 in the last month with a page count of 41 pages. <br>\r\n```+user:102112 created_at:[NOW-1MONTH TO *] AND page_count:41```  <br>\r\n\r\n\r\n### Key/value pair existence\r\nFind all documents uploaded by user 102112 that have a _mr_status key (that it exists) <br>\r\n```+user:102112 AND data__mr_status:* ``` <br>\r\nFind all documents uploaded by user 102112 in the last month that do not have a _mr_status_key (the key does not exist) <br>\r\n```+user:102112 AND -data__mr_status:*``` <br>\r\n\r\n### Key/value pair searches\r\nFind all the documents that have an entry for the key \"Folder\" on DocumentCloud <br>\r\n```data_Folder:*``` <br>\r\nFind all documents that have a value of \"From ARMY site - Environmental documents\" for the key Folder <br>\r\n```+data_Folder:\"From ARMY site - Environmental documents\"``` <br>\r\n<br>\r\nFind all documents that have a value of 38 for the key Subfolder and \"From ARMY site - Environmental documents\" for the Folder. <br>\r\n```+data_Folder:\"From ARMY site - Environmental documents\" AND +data_Subfolder:38```\r\n\r\n### Searching Tags\r\nFind all documents that have been labelled with the tag \"significant\" on DocumentCloud <br>\r\n```tag:significant ```<br>\r\n\r\n### Project filter\r\nFind all documents uploaded by user 102112 that are also in the project 214246 <br>\r\n```+user:102112 AND project:214246 ``` <br>\r\n\r\n\r\n### Access level filter\r\nFind all documents uploaded by user 102112 that are also private. <br>\r\n```+user:102112 AND access:private``` <br>\r\n\r\n\r\n### Sorting\r\nFind all documents uploaded by user 102112 in the last month that are in project 214246, sorted by page_count so that the documents with the most pages appear first. <br>\r\n```+user:102112 created_at:[NOW-1MONTH TO *] AND project:214246 sort:page_count ```<br>\r\n\r\n### Wildcard Searches\r\nFind all documents uploaded by user 102112 in the last month that starts with fy2017 <br>\r\n```+user:102112 created_at:[NOW-1MONTH TO *] AND +title:fy2017*``` <br>\r\n\r\n### Text Field Searches \r\nFind all  documents uploaded to DocumentCloud that have Mueller somewhere in the title<br>\r\n```title:Mueller*``` <br>\r\n<br>\r\nFind all documents uploaded to DocumentCloud that have Edwin Mueller somewhere in the title. <br>\r\n```title:\"Edwin Mueller*\"``` <br>\r\n<br>\r\nFind all documents uploaded to DocumentCloud that have Mueller somewhere in the description. <br>\r\n```description:Mueller* ```<br>\r\n<br>\r\nFind all documents uploaded to DocumentCloud that have Mueller somewhere in the description and Barr somewhere in the title. <br>\r\n```description:Mueller* AND title:Barr* ```<br>\r\n<br>\r\nFind all documents uploaded to DocumentCloud that contain the word \"Russian\" in the document text and contain \"Mueller\" in the description and contain \"Barr\" in the title. <br>\r\n```+description:Mueller* AND +title:Barr* AND text:Russian```   <br>\r\n<br>\r\nFind all documents uploaded to DocumentCloud that contain \"Mueller\" in the description, \"Barr\" in the title, and \"Russian\" on page 4 of the document. <br>\r\n```+description:Mueller* AND +title:Barr* AND page_no_4:Russian``` <br>\r\n<br>\r\n\r\n\r\n\r\n## API\r\n\r\nYou may search via the API:\r\n\r\n`GET /api/documents/search/`\r\n\r\nYou may pass the query as described above in the `q` parameter (e.g. `/api/documents/search/?q=some+text+user:1` to search for some text in documents by user 1). For all fielded searches, you may pass them in as standalone query parameters instead of in `q` if you prefer (e.g. `/api/documents/search/?q=some+text&user=1` is the same query as the previous example). You may also negate fields by preceding them with a `-` in this way (e.g. `/api/documents/search/?q=some+text&-user=1` to search for some text in documents not by user 1). You may specify the sort order using either `sort` or `order` as a parameter (e.g. `/api/documents/search/?q=some+text+order:title` and `/api/documents/search/?q=some+text&order=title` both search for some text in documents sorted by their title).\r\n\r\nYou can also specify `per_page`, `page`, and `expand` as you would for `/api/documents/`. `expand` may be `user` or `organization` (or both `user,organization`). The response will be in a JSON object like a list response:\r\n\r\n```\r\n{\r\n    \"count\": <number of results on the current page>,\r\n    \"next\": <next page url if applicable>,\r\n    \"previous\": <previous page url if applicable>,\r\n    \"results\": <list of results>,\r\n    \"escaped\": <bool>\r\n}\r\n```\r\n\r\nwith the addition of the `escaped` property to specify if the query had a syntax error and needed to be autoescaped.\r\n\r\nYou may also enable highlighting by setting the `hl` parameter to `true`.  Each document will then contain a `highlights` property, which will contain relevant snippets from the document containing the given search term.\r\n\r\n```\r\nhttps://api.www.documentcloud.org/api/documents/search?q=report&hl=true\r\n\r\n{\r\n    \"count\": 413,\r\n    \"next\": \"https://api.www.documentcloud.org/api/documents/search/?q=report&page=2&hl=true\",\r\n    \"previous\": null,\r\n    \"results\": [\r\n        {\r\n            \"id\": \"20059100\",\r\n            \"user\": 100000,\r\n            \"organization\": 10001,\r\n            \"access\": \"public\",\r\n            \"status\": \"success\",\r\n            \"title\": \"the-mueller-report\",\r\n            \"slug\": \"the-mueller-report\",\r\n            \"source\": \"gema_georgia_gov\",\r\n            \"language\": \"eng\",\r\n            \"created_at\": \"2020-04-05T13:36:08.507Z\",\r\n            \"updated_at\": \"2020-04-24T18:47:52.985Z\",\r\n            \"page_count\": 448,\r\n            \"highlights\": {\r\n                \"title\": [\r\n                    \"the-mueller-<em>report</em>\"\r\n                ],\r\n                \"page_no_9\": [\r\n                    \"-CrinP6te\\nINTRODUCTION TO VOLUME T |\\n\\nThis <em>report</em> is submitted to the Attorey General pursuant to 28 C-F.R\"\r\n                ]\r\n            },\r\n            \"data\": {},\r\n            \"asset_url\": \"https://assets.documentcloud.org/\"\r\n        },\r\n    ]\r\n}\r\n```\r\n\r\nYou may search within a document using the following endpoint:\r\n\r\n`GET /api/documents/<doc_id>/search/`\r\n\r\nThis will return up to 25 highlights per page for your query. You may use the same search syntax as above, although most of the fielded queries will not be meaningful when searching within a single document.\r\n\r\nExample response:\r\n\r\n```\r\n{\r\n    \"title\": [\r\n        \"the-mueller-<em>report</em>\"\r\n    ],\r\n    \"page_no_9\": [\r\n        \"-CrinP6te\\nINTRODUCTION TO VOLUME T |\\n\\nThis <em>report</em> is submitted to the Attorey General pursuant to 28 C-F.R\",\r\n        \" the Attorney\\nGeneral a confidential <em>report</em> explaining the prosecution or declination decisions [the\",\r\n        \" in detail in this <em>report</em>, the Special Counsel's investigation established that\\nRussia interfered in\"\r\n    ],\r\n    \"page_no_10\": [\r\n        \"\\n‘overview of the two volumes of our <em>report</em>.\\n\\nThe <em>report</em> describes actions and events that the Special\",\r\n        \", the <em>report</em> points out\\nthe absence of evidence or conflicts in the evidence about a particular fact or\",\r\n        \" with\\nconfidence, the <em>report</em> states that the investigation established that certain actions or events\",\r\n        \"\\n‘coordination in that sense when stating in the <em>report</em> thatthe investigation did not establish that the\\n‘Trump\",\r\n        \" Campaign coordinated with the Russian government in its election interference activities.\\n\\nThe <em>report</em> on\"\r\n    ]\r\n}\r\n```\r\n\r\n[1]: https://lucene.apache.org/solr/\r\n[2]: https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html\r\n[3]: https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance"
        },
        {
            "url": "/help/add-ons/",
            "title": "Add-Ons",
            "content": "[Back to Help Menu](https://www.documentcloud.org/help)\r\n\r\n# DocumentCloud Add-Ons\r\n\r\nAdd-Ons make it easy for anyone to add and share additional features within the DocumentCloud platform, ranging from automating repetitive tasks to integrating machine learning and data visualization techniques.\r\n\r\n## Overview\r\n\r\nFor end users, using an Add-On is as simple as selecting documents or executing a search, picking which feature they’d like to apply to the results, and then submitting.\r\n\r\nOn the backend, Add-Ons execute Python scripts organized in a standard way, hosted and processed right on GitHub.\r\n\r\nAdd-Ons can take advantage of the full DocumentCloud API as well as the ability to call other third-party services and a few Add-On specific functions such as the ability to store arbitrary files, send a user emails, and track the progress of an Add-Ons' execution and display messages to the user.\r\n\r\nIn addition to being able to execute Add-Ons via the DocumentCloud user interface, these extensions are also designed to run smoothly on your local computer — simply clone the repository to your local device, [install the DocumentCloud Python wrapper](https://documentcloud.readthedocs.io/en/latest/), and then invoke the main.py file of the Add-On. Invocation requires your DocumentCloud username and password if\r\nthe add-on requires authentication, which is used to fetch a refresh and access\r\ntoken.  They can be passed in as command line arguments (`--username` and\r\n`--password`), or as environment variables (`DC_USERNAME` and `DC_PASSWORD`).\r\n\r\nYou can also pass in a list of document IDs (`--documents`), a search query\r\n(`--query`), and JSON parameters for your Add-On (`--data`) - be sure to\r\nproperly quote your JSON at the command line.\r\n\r\nExample invocation:\r\n```\r\npython main.py --documents 123 --data '{\"name\": \"World\"}'\r\n```\r\n\r\nWe have an [Add-On template hosted on GitHub](https://github.com/MuckRock/documentcloud-hello-world-addon) that demonstrates basic features, as well as a variety of other example Add-Ons that might serve as a useful base for your own work. \r\n\r\n## DocumentCloud Premium and AI Credits\r\n\r\nSome Add-Ons require AI credits to run as they use paid services to perform operations like OCR, document translation or the Add-On uses AI tools which have costs. <br> [DocumentCloud Premium](https://www.documentcloud.org/help/premium) comes with AI credits for both professional and organizational accounts on MuckRock. You can upgrade your account or organization on the [MuckRock Select Plan page](https://accounts.muckrock.com/selectplan/). You can also upgrade your plan by clicking on the drop-down menu named \"Premium\" once you log in. It will link you to the same upgrade plan page. <br> <br> To check your AI credit balance, you can: <br>\r\n1. Click on the name of your organization in the top navigation bar and your monthly allowance will appear there. If you are a freelancer, you click on the second drop-down menu (next to your account name) and your balance should appear there as well. \r\n2. Click on a premium Add-On from the Add-On run menu. Your allowance will appear in the run menu.\r\n3. When uploading a document, your AI credit balance also appears as text in the upload menu where you can select which OCR engine you want to use. \r\n\r\nAI Credit usage is logged. If you want to know how your AI credits were used, [contact us.](mailto:info@documentcloud.org)\r\n\r\nAt this time, there are four premium Add-Ons: <br>\r\n* [**Azure Document Intelligence OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-azure-document-intelligence-ocr-addon) uses Azure's Document Intelligence system to OCR documents. This Add-On requires AI credits. \r\n* [**Google Cloud Vision OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-cloud-vision-ocr) uses Google Cloud Vision OCR engine to OCR documents.  This Add-On requires AI credits. \r\n* **[GPT-3 PlayGround](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-gpt3-playpen-addon)**\r\nUse GPT3 to help analyze your documents, right within DocumentCloud. Give this Add-On a prompt as well as an optional key for a key/value pair to add information as a tag. This Add-On requires AI credits. \r\n* **[Translate Documents](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/google-translate-addon)** Uses Google Translate API to translate documents which will automatically be uploaded to DocumentCloud. This Add-On requires AI credits. \r\n\r\n## Types of Add-Ons\r\nThere are several different types of Add-Ons, including ones that use AI, perform bulk operations, others that specialize in data extraction, ones that calculate DocumentCloud statistics, some are used to export documents or data contained in documents, some monitor websites for changes or for newly uploaded documents, and others transform other types of files DocumentCloud doesn't natively support into more readily analyzable documents. \r\n\r\n### AI-Based Add-Ons\r\n* [**SideKick:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-sidekick-addon) Makes it easy to train a machine learning model to classify documents by an arbitrary type, such as identifying if a document is likely to be an email, a resident complaint, or other categories of records.\r\n* **[GPT-3 PlayGround](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-gpt3-playpen-addon)**\r\nUse GPT3 to help analyze your documents, right within DocumentCloud. Give this Add-On a prompt as well as an optional key for a key/value pair to add information as a tag. This Add-On requires AI credits. \r\n\r\n\r\n### Bulk Operations Add-Ons\r\n* **[Bulk Add To Project](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/bulk-add-to-project-add-on)** Add large sets of documents to a project. \r\n* **[Bulk Delete Annotations](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/bulk-delete-annotations)** Delete all annotations on a set of document(s) in bulk. \r\n* **[Bulk Delete Documents](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/bulk-delete-documents)** Delete more than 25 documents at a time. \r\n* [**Bulk Edit:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-bulk-edit-addon) Update metadata on many documents at once.\r\n* **[Bulk Reprocess](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/bulk-reprocress-addon)** Reprocess more than 25 documents at a time, with optional force OCR option with language selection. \r\n* **[**Bulk Tag**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/Bulk-Tag-AddOn)** Add  tags or key/value pairs to large sets of documents.\r\n* **[**Bulk Delete Tags**](https://www.documentcloud.org/app?q=%2B#add-ons/duckduckgrayduck/Bulk-Delete-Tags)** Remove tags or key/value pairs in large sets of documents. \r\n* **[Change Note Visibility](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/change-note-visibility)** Changes the access level of all annotations that appear in a document set. \r\n* **[Change Visibility](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/change-visibility)** Changes the access level of all documents in a document set. \r\n* **[Clear Failed Uploads](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/clear-failed-uploads)** Deletes all documents in your current view whose status is error or nofile. \r\n* **[Move Account](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-move-account-addon)** Changes the ownership of large sets of documents. \r\n\r\n### Data Extraction & Analysis Add-Ons\r\n* **[Bad Redactions:](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-bad-redactions-addon)** Building off the excellent [X-Ray library](https://free.law/projects/x-ray) from [Free Law Project](https://free.law), Bad Redactions looks for instances where there are redaction fails leaving the underlying data intact. This is useful for both investigating if there's more information than meets the eye as well as making sure you properly and fully delete information from your own uploads. Note that DocumentCloud automatically flatten pages and deletes underlying data when you use our redaction tools or force OCR. We recommend trying it on [the infamous Manafort filing](https://www.documentcloud.org/documents/21855619-manafort-20190108-dc), which the Add-On flagged and highlighted 25 redaction errors for us during our test. You can have the Add-On leave a private annotation around the mis-redacted information or have it go ahead and properly redact it for you.\r\n* [**Regex Extractor:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-regex-addon) Let’s you define a Regex string to pull out specified text matches into a spreadsheet across a selection of documents.\r\n* **[Multiple Regex Extractor](https://www.documentcloud.org/app?q=%2B#add-ons/JamesKunstle/documentcloud-multiple-regex-pattern-addon)** Let's you define multiple regex patterns to search across the document selection and returns a CSV file of all the strings with given regex matches. \r\n* **[PII Detector](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/PII-Detector)**\r\nDetects PII in a document, annotate where, and automatically e-mail you when sensitive PII is detected if you choose. It supports detecting addresses, zipcodes, SSNs, emails, phone numbers, and credit card numbers. \r\n* **[Tabula Spreadsheet Extraction](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-tabula-addon)** Runs the open source Tabula library against a selected PDF and tries to identify and extract any tables. You can provide a Google Drive or Dropbox URL to a Tabula template you have generated already to run against the documents. If no template is provided, tabula will try to guess the boundaries of the tables within the document. \r\n* [**Azure Document Intelligence OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-azure-document-intelligence-ocr-addon) uses Azure's Document Intelligence system to OCR documents. This Add-On requires AI credits. \r\n* [**Google Cloud Vision OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-cloud-vision-ocr) uses Google Cloud Vision OCR engine to OCR documents.  This Add-On requires AI credits. \r\n\r\n### Export Add-Ons\r\n* [**PDF Export:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-pdf-export-addon) Helps you get your PDFs out of DocumentCloud, adding the selected documents into a Zip file that’s then displayed to you.\r\n* [**Note Export:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-note-export-addon) Extracts all the notes on selected documents and saves them as text files you can download.  \r\n* **[Metadata Grabber](https://www.documentcloud.org/app?q=%2B#add-ons/cam-garrison/documentcloud-metadata-grabber)** and **[Custom Metadata Scraper](https://www.documentcloud.org/app?q=%2B#add-ons/DaveG77/documentcloud-custom-metadata-scraper-addon)** are two Add-Ons that export metadata from selected documents, making it easier for you to take your key-value tags, page count and much more into your favorite spreadsheet program for further analysis.\r\n* **[Push to IPFS/Filecoin:](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-filecoin-addon)** Push the selected documents to the decentralized web, making them accessible via [IPFS](https://docs.ipfs.io/concepts/what-is-ipfs/) and [Filecoin](https://filecoin.io) via [Estuary](https://docs.estuary.tech/what-is-estuary).\r\n* **[Internet Archive Export Tool](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/Internet-Archive-Export-Add-On)** Allows you to backup document(s) to [DocumentCloud's Internet Archive Collection](https://archive.org/details/@documentcloudupload) for long term preservation. \r\n\r\n### File Transformation Add-Ons\r\n* [**Transcribe Audio:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-whisper-addon) This Add-On transcribes audio/video files using [OpenAI's Whisper](https://github.com/openai/whisper). You may upload audio files from any publically accessible URL.  You may also use share links from Google Drive, Dropbox, Mediafire, Wetransfer and YouTube. If you use a share link for a folder, it will process all files in that folder.\r\n* **[Translate Documents](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/google-translate-addon)** Uses Google Translate API to translate documents which will automatically be uploaded to DocumentCloud. This Add-On requires AI credits. \r\n* **[Email Conversion Add-On](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/convert-email-add-on)** Converts EML & MSG files to PDFs and uploads them to DocumentCloud. Also has optional attachment extraction which will be presented for download to the user. \r\n* **[PDF Compression Add-On](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/compress-pdf-add-on)** Uses ghostscript to compress large PDFs to upload to DocumentCloud. \r\n* **[PDF Re-Flow Add-On](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/Reflow-Add-On)** Resizes a document using k2pdfopt to optimize the document for reading on smaller screens, such as e-readers and phones. \r\n* **[Document Splitter](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/pdf-splitter-add-on)** Split a document into two along a specified page using this Add-On which uses pdftk in the background. \r\n\r\n### Site Monitoring Add-Ons\r\n* [**Scraper:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-scraper-addon) This Add-On will monitor a given site for documents and upload them to your DocumentCloud account, alerting you to any documents that meet given keyword criteria.\r\n* [**Klaxon Site Monitor:**](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/Klaxon) This Add-On will monitor a given site for changes based on a specified CSS selector and email you when there are changes on the site. It will additionally archive the newly seen page using The Wayback Machine provided by the Internet Archive. \r\n* **[Site Snapshot](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/Site-Snapshot)**\r\nSite Snapshot allows you to provide a set of URLs and it will use pdfkit to create a PDF snapshot of the site as it was seen and upload the snapshot to your DocumentCloud uploads. \r\n\r\n### Statistical Add-Ons\r\n* **[N-Gram Graphs](https://www.documentcloud.org/app?q=%2B#add-ons/cam-garrison/doccloud-n-gram-addon):** Feel like your seeing a term pop up more and more often? Now it's easier to get validation of your hunch — this Add-On maps the occurrence of words over time you input and then compares them to each other across a given search.\r\n* **[Page Stats](https://www.documentcloud.org/app?q=%2B#add-ons/sooryu22/documentcloud-page-stats-addon):** Gives you basic statistics about the total length of a selection of documents, the longest document, shortest document and average pages per document.\r\n* **[User upload frequency graph](https://www.documentcloud.org/app?q=%2B#add-ons/cam-garrison/doccloud-uploads-graph):** Curious whether you're more productive during some months than others? Want to see the progress of your sharing with the public? Use this Add-On to graph your uploads over time. Tip: Put your username in as it appears in the search field (i.e., `michael-morisy-658`)\r\n\r\nNew Add-Ons are being added all the time. Under the Add-Ons dialog, click \"Browse All Add-Ons\" to explore and activate or deactivate Add-Ons. [Register for the DocumentCloud newsletter](http://eepurl.com/dMiXw2) to get updates on additional features and other announcements.\r\n\r\n## Run Your Add-On in DocumentCloud\r\nIf you write your own Add-On, you can run it from with DocumentCloud's user interface through a few simple steps.\r\n\r\nFirst, [install the Github DocumentCloud App](https://github.com/apps/documentcloud-add-on). Note that for this to work properly, you must have your primary Github and MuckRock accounts set to use the same email address. [You can set your primary MuckRock account email here](https://accounts.muckrock.com/accounts/email/).\r\n\r\nAs you add the Github DocumentCloud App, give it access to only those repositories in your Github account that are Add-Ons you want to run. [You can modify this from this page](https://github.com/settings/installations/24953287) once you have the app installed in Github.\r\n\r\n<img src=https://cdn.muckrock.com/news_photos/2022/05/03/Screen_Shot_2022-05-03_at_10.23.49_PM.png alt=\"A screenshot of the above linked webpage, showing a single repository linked to the DocumentCloud Github app.\" width=100%>\r\n\r\nThen your Add-On will appear for you under \"Browse All Add-Ons\" and you can activate it there.\r\n\r\n## Permissions and Security\r\n\r\nCurrently, the DocumentCloud team reviews and vets each Add-On that's integrated directly within the site (i.e., the ones you see in the Add-On dropdown). Add-Ons that a user downloads and runs locally, however, are not necessarily vetted or reviewed by the DocumentCloud team and you should only run Add-Ons that are published by individuals you trust.\r\n\r\nCurrently, Add-Ons are essentially given *full access to your user account,* and can do anything you can while logged in, including reading all of your documents, deleting or modifying them, sharing documents with other users, and much more.\r\n\r\nFor Add-Ons run through the site, they do not see your account credentials, just a unique token granted to that Add-On. For Add-Ons run through a GitHub Action or run locally, there is the potential for a maliciously written Add-On to obtain your credentials so it is particularly important to understand and trust the source of the Add-On before you run it.\r\n\r\nAs we open up Add-Ons to additional third-party contributions, we'll begin to offer more limited access tokens that constrain permissions to just the documents and actions explicitly granted to them, as well as defining certain time scopes for that access.\r\n\r\n## Document Selection\r\n\r\nWhen you run an Add-On via the DocumentCloud web interface, it will take one of four options for what documents to act on:\r\n\r\n* **Selected:** When you run the Add-On, it will try to act on the documents that are currently selected with a check mark.\r\n* **Query:** When you run the Add-On, it will try to act on all of the documents that are currently listed in the search results, including documents that are not in the current view. Note that large numbers of search results or search results that include documents you don't have permissions to will often be more likely to have errors.\r\n* **Both:** Some Add-Ons will let you select between the two options above. If you don't currently have any documents selected, it will default to acting on the documents in the search results while letting you know that you may select documents instead. To do so, cancel the Add-On, select the documents, and pick the Add-On again.\r\n* **Neither:** Some Add-Ons don't actually take any documents as input, such as an Add-On that imports documents from a link or scrapes a webpage for document links.\r\n\r\nNote that currently, these options determine what specific document IDs are sent to the Add-On, but the Add-On still has permissions to your entire document collection. In the future, as we better understand Add-On use cases, we plan to restrict access permissions to only the subset of documents that an Add-On requires to successfully run. \r\n\r\n## Deep Linking Add-Ons\r\n\r\nDocumentCloud Add-Ons have deep linking enabled, meaning you can share the link to a useful Add-On to others with ease. You will notice when clicking on an Add-On it pulls up the configuration menu and change the URL as well. For example, clicking on the PII Detector Add-On allows me to link to the Add-On directly like so: ```https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/PII-Detector```\r\n\r\nAdd-Ons can also be shared with parameters pre-filled by modifying the URL. For example, to share a URL to the PII Detector Add-On with the Detect SSNs field pre-selected, one can do so like this:\r\n```https://www.documentcloud.org/app?q=%2B&ssn=true#add-ons/MuckRock/PII-Detector```\r\n\r\nProperties defined in the Add-On's config.yaml can continue to be chained one after another and deep linked, like this one that specifies both the site to monitor along with the * (all) CSS selector for Klaxon. \r\n```https://www.documentcloud.org/app?q=%2B&site=https://muckrock.com&selector=*#add-ons/MuckRock/Klaxon```\r\n\r\nHourly, daily, or weekly event options for scheduled Add-Ons (like Klaxon and Scraper) can be passed as parameters as well. \r\n```https://www.documentcloud.org/app?q=%2B&site=https://muckrock.com&selector=*&event=hourly#add-ons/MuckRock/Klaxon```\r\n\r\n## Submit an Add-On Suggestion\r\n\r\nYou can submit your Add-On for review to share with all DocumentCloud users by filling out this [form.](https://baserow.io/form/ZlA1x9Oo6QvVk4LRqMNB8zqkT4GwBZrObdBe-G5HzR0)\r\n\r\nIf you have other questions, suggestions, or feedback, please email info@documentcloud.org — we’re excited to see what you do with Add-Ons!"
        },
        {
            "url": "/help/premium/",
            "title": "Premium",
            "content": "[Back to Help Menu](https://www.documentcloud.org/help)\r\n\r\n# DocumentCloud Premium Features\r\n\r\nDocumentCloud premium features are available to both paid professional and organizational accounts on [MuckRock](https://accounts.muckrock.com/selectplan/). \r\n\r\nDocumentCloud Premium gives you the ability to search annotations across DocumentCloud. Wondering what document you left an important note on to review later? With annotation search, it takes no time to find important notes. \r\n\r\nPremium accounts also come with AI credits that you can put towards more powerful OCR or other tools. Professional accounts also include 2,000 AI credits per month and organizational accounts come with 5,000 credits per month for the first 5 users, and 500 additional credits for each additional user. If you’re on a premium account and have used all of your available credits, you can [contact us](mailto:info@documentcloud.org) to purchase additional  credits. AI Credits can be used to run:\r\n\r\n * **[Amazon Textract](https://aws.amazon.com/textract/)**, a powerful OCR (optical character recognition) engine. OCR is the process of converting an image of text into a machine-readable text that you can copy, paste, search and edit. By default, DocumentCloud will OCR your documents with [Tesseract](https://github.com/tesseract-ocr/tesseract), which does a decent job, but it isn't nearly as powerful as Textract, especially on noisy documents. Handwriting, embedded images, fuzzy scans, and other non-machine generated text are all hard to convert into text. Amazon Textract OCR performs [significantly better](https://www.researchgate.net/publication/356446235_OCR_with_Tesseract_Amazon_Textract_and_Google_Document_AI_a_benchmarking_experiment) than Tesseract on noisy documents. Amazon Textract uses 1 AI credit per page.\r\n* **[GPT-3 PlayGround](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/documentcloud-gpt3-playpen-addon)**\r\nUse GPT3 to help analyze your documents, right within DocumentCloud. Give this Add-On a prompt as well as an optional key for a key/value pair to add information as a tag. \r\n* [**Azure Document Intelligence OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-azure-document-intelligence-ocr-addon) uses Azure's Document Intelligence system to OCR documents. Azure uses 1 AI credit per page. \r\n* [**Google Cloud Vision OCR**](https://www.documentcloud.org/app?q=%2B%20#add-ons/MuckRock/documentcloud-cloud-vision-ocr) uses Google Cloud Vision OCR engine to OCR documents.  GCV uses 1 AI credit per page. \r\n* **[Translate Documents](https://www.documentcloud.org/app?q=%2B#add-ons/MuckRock/google-translate-addon)** allows you to use the Google Translate API to translate documents page by page. To see if a language is supported, visit the [Google Translate language code guide](https://cloud.google.com/translate/docs/languages). 1 AI credit translates 75 characters of text. \r\n\r\n\r\nThat's not all! We're still building. Our [DocumentCloud Add-Ons 101](https://www.youtube.com/watch?v=yxQkZgGmjXQ) workshop is a great way to learn more about our Add-On ecosystem.\r\n\r\n## Checking your AI Credit Balance & Usage\r\nTo check your AI credit balance, you can: <br>\r\n1. Click on the name of your organization in the top navigation bar and your monthly allowance will appear there. If you are a freelancer with a professional account, you click on the second drop-down menu (next to your account name) and your balance should appear there as well. \r\n2. Click on a premium Add-On from the Add-On run menu. Your allowance will appear in the run menu.\r\n3. When uploading a document, your AI credit balance also appears as text in the upload menu where you can select which OCR engine you want to use. \r\n\r\nAI Credit usage is logged. If you want to know how your AI credits were used, [contact us.](mailto:info@documentcloud.org)"
        }
    ]
}