Python Quickstart
A rate-limited API developer key can be provisioned from the Sapling API dashboard. Developer keys allow for processing of 50,000 characters every 24 hours. Subscribe for production access and usage-based pricing.
Installation
Install the sapling-py
package with pip
python -m pip install sapling-py
Getting Edits
Here's a sample script using the Sapling package:
from sapling import SaplingClient
api_key ='<api-key>'
client = SaplingClient(api_key=api_key)
edits = client.edits('Lets get started!', session_id='test_session')
The result of running the script should be an array of edits of this form:
[{
"id": "aa5ee291-a073-5146-8ebc-c9c899d01278",
"sentence": "Lets get started!",
"sentence_start": 0,
"start": 0,
"end": 4,
"replacement": "Let's",
"error_type": "R:OTHER",
"general_error_type": "Other",
}]
Applying Edits
After you've sent the Sapling API text and gotten JSON edits in the response, how do you apply the edits to get updated text?
The simplest way is to use the auto_apply
argument. By setting this to true, the returned response will have an extra field, applied_text
, that contains the text with the edits applied.
However, you can also easily apply edits programmatically.
Programmatically Applying Edits
Recall the Edit data structure:
{
"id": <str, UUID>, // Opaque edit id, used to give feedback
"sentence": <str>, // Unedited sentence
"sentence_start": <int>, // Offset of sentence from start of text
"start": <int>, // Offset of edit start relative to sentence
"end": <int>, // Offset of edit end relative to sentence
"replacement": <str>, // Suggested replacement
"error_type": <str>, // Error type, see "Error Categories"
"general_error_type": <str>, // See "Error Categories"
}
When programmatically applying edits, go in reverse start offset (sentence_start + start
) order so changes don't affect the offsets of the remaining edits.
For example, consider the sentence: Lets go to the housee.
where Sapling returns the following list of edits:
[
{
'sentence_start': 0,
'start': 0,
'end': 4,
'replacement': "Let's",
...
},
{
'sentence_start': 0,
'start': 15,
'end': 21,
'replacement': 'house',
...
}
]
The simplest way to apply the edits to your text is in reverse order:
- Replace characters
15-21
withhouse
. - Replace characters
0-4
withLet's
.
If the characters for Lets
are replaced before housee
, the offsets for other edits would need to be updated.
Sample Code
We provide sample code below for applying edits.
A few things to keep in mind:
- The
edits
array is ordered by starting position, though we include logic below to ensure this is the case. - For some languages where assignment is by reference, you will want to create a copy of the original string before modifying it.
text = str(text)
edits = sorted(edits, key=lambda e: (e['sentence_start'] + e['start']), reverse=True)
for edit in edits:
start = edit['sentence_start'] + edit['start']
end = edit['sentence_start'] + edit['end']
if start > len(text) or end > len(text):
print(f'Edit start:{start}/end:{end} outside of bounds of text:{text}')
continue
text = text[: start] + edit['replacement'] + text[end:]
return text
Processing Files
The Sapling API can be used to process files as well. Sapling's Edit API currently has a 50,000 character limit, so larger documents will need to be chunked.
Sapling provides a pre-processing API endpoint that helps with chunking. This endpoint breaks long documents into pieces, prioritising splitting on things like page and paragraph breaks in order to preserve overall text context.
from sapling import SaplingClient
file_name = '<FILE_TO_PROCESS>'
api_key = '<api-key>'
text = ''
with open(file_name) as f:
text = f.read().strip()
client = SaplingClient(api_key=api_key)
chunks = client.chunk_text(text, max_length=20000)
for chunk in chunks:
edits = client.edits(chunk, session_id=file_name)
More information about the Chunking/Preprocessing endpoint can be found here.
More Details
More detail on the API request options and response structure can be found here.
Documentation on SaplingClient
is available on Read the Docs.