paperweight.document¶
Object oriented access to LaTeX documents.
The paperweight.document
module provides object-oriented interfaces
for manipulating or mining LaTeX documents.
Much of the functionality of the paperweight.texutils
,
paperweight.gitio
and paperweight.nlputils
modules can be
accessed through this interface.
Depending on how the LaTeX document is stored, you should use either of two
document classes.
paperweight.document.FilesystemTexDocument
should be used for regular
documents in the filesystem.
If you wish to operate on documents stored within a certain commit of a
checked-out Git repository, then use
paperweight.document.GitTexDocument
.
The interfaces for both classes are consistent since they inherit from
paperweight.document.TexDocument
under the hood.
-
class
paperweight.document.
FilesystemTexDocument
(path, recursive=True)¶ Bases:
paperweight.document.TexDocument
A TeX document derived from a file in the filesystem.
Parameters: filepath : unicode
Path to the ‘.tex’ on the filesystem.
recursive : bool
If True (default), then tex documents input by this root document will be opened.
Attributes
bib_keys
List of all bib keys in the document (and input documents). bib_name
Name of the BibTeX bibliography file (e.g., 'mybibliography.bib'
).bib_path
Absolute file path to the .bib bibliography document. bibitems
List of bibitem strings appearing in the document. sections
List with tuples of section names and positions. Methods
extract_citation_context
([n_words])Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document. find_input_documents
()Find all tex documents input by this root document. inline_bbl
()Inline a compiled bibliography (.bbl) in place of a bibliography environment. inline_inputs
()Inline all input latex files references by this document. remove_comments
([recursive])Remove latex comments from document (modifies document in place). write
(path)Write the document’s text to a path
on the filesystem.-
bib_keys
¶ List of all bib keys in the document (and input documents).
-
bib_name
¶ Name of the BibTeX bibliography file (e.g.,
'mybibliography.bib'
).
-
bib_path
¶ Absolute file path to the .bib bibliography document.
-
bibitems
¶ List of bibitem strings appearing in the document.
-
extract_citation_context
(n_words=20)¶ Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document.
For example, suppose
'Sick:2014'
is cited twice within a document. Then the dictionary returned by this method will have a length-2 list under the'Sick:2014'
key. Each item in this list will be a dictionary providing metadata of the context for that citation. Fields of this dictionary are:position
: (int) the cumulative word count at which the citation occurs.wordsbefore
: (unicode) text occuring before the citation.wordsafter
: (unicode) text occuring after the citation.section
: (unicode) name of the section in which the citation occurs.
Parameters: n_words : int
Number of words before and after the citation to extract for context.
Returns: bib_keys : dict
Dictionary, keyed by BibTeX cite key, where entires are lists of instances of citations. See above for the format of the instance metadata.
-
find_input_documents
()¶ Find all tex documents input by this root document.
Returns: paths : list
List of filepaths for input documents. Paths are relative to the document (i.e., as written in the latex document).
-
inline_bbl
()¶ Inline a compiled bibliography (.bbl) in place of a bibliography environment. The document is modified in place.
-
inline_inputs
()¶ Inline all input latex files references by this document. The inlining is accomplished recursively. The document is modified in place.
-
remove_comments
(recursive=True)¶ Remove latex comments from document (modifies document in place).
Parameters: recursive : bool
Remove comments from all input LaTeX documents (default
True
).
-
sections
¶ List with tuples of section names and positions. Positions of section names are measured by cumulative word count.
-
write
(path)¶ Write the document’s text to a
path
on the filesystem.
-
-
class
paperweight.document.
GitTexDocument
(git_path, git_hash, repo_dir='.', recursive=True)¶ Bases:
paperweight.document.TexDocument
A tex document derived from a file in the git repository.
Parameters: git_path : str
Path to the document in the git repository, relative to the root of the repository.
git_hash : str
Any SHA or git tag that can resolve into a commit in the git repository.
repo_dir : str
Path from current working directory to the root of the git repository.
Attributes
bib_keys
List of all bib keys in the document (and input documents). bib_name
Name of the BibTeX bibliography file (e.g., 'mybibliography.bib'
).bib_path
Absolute file path to the .bib bibliography document. bibitems
List of bibitem strings appearing in the document. sections
List with tuples of section names and positions. Methods
extract_citation_context
([n_words])Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document. find_input_documents
()Find all tex documents input by this root document. remove_comments
([recursive])Remove latex comments from document (modifies document in place). write
(path)Write the document’s text to a path
on the filesystem.-
bib_keys
¶ List of all bib keys in the document (and input documents).
-
bib_name
¶ Name of the BibTeX bibliography file (e.g.,
'mybibliography.bib'
).
-
bib_path
¶ Absolute file path to the .bib bibliography document.
-
bibitems
¶ List of bibitem strings appearing in the document.
-
extract_citation_context
(n_words=20)¶ Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document.
For example, suppose
'Sick:2014'
is cited twice within a document. Then the dictionary returned by this method will have a length-2 list under the'Sick:2014'
key. Each item in this list will be a dictionary providing metadata of the context for that citation. Fields of this dictionary are:position
: (int) the cumulative word count at which the citation occurs.wordsbefore
: (unicode) text occuring before the citation.wordsafter
: (unicode) text occuring after the citation.section
: (unicode) name of the section in which the citation occurs.
Parameters: n_words : int
Number of words before and after the citation to extract for context.
Returns: bib_keys : dict
Dictionary, keyed by BibTeX cite key, where entires are lists of instances of citations. See above for the format of the instance metadata.
-
find_input_documents
()¶ Find all tex documents input by this root document.
Returns: paths : list
List of filepaths for input documents. Paths are relative to the document (i.e., as written in the latex document).
-
remove_comments
(recursive=True)¶ Remove latex comments from document (modifies document in place).
Parameters: recursive : bool
Remove comments from all input LaTeX documents (default
True
).
-
sections
¶ List with tuples of section names and positions. Positions of section names are measured by cumulative word count.
-
write
(path)¶ Write the document’s text to a
path
on the filesystem.
-
-
class
paperweight.document.
TexDocument
(text)¶ Bases:
object
Baseclass for a tex document.
Parameters: text : unicode
Unicode-encoded text of the latex document.
Attributes
text (unicode) Text of the document as a unicode string. Methods
extract_citation_context
([n_words])Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document. find_input_documents
()Find all tex documents input by this root document. remove_comments
([recursive])Remove latex comments from document (modifies document in place). write
(path)Write the document’s text to a path
on the filesystem.-
bib_keys
¶ List of all bib keys in the document (and input documents).
-
bib_name
¶ Name of the BibTeX bibliography file (e.g.,
'mybibliography.bib'
).
-
bib_path
¶ Absolute file path to the .bib bibliography document.
-
bibitems
¶ List of bibitem strings appearing in the document.
-
extract_citation_context
(n_words=20)¶ Generate a dictionary of all bib keys in the document (and input documents), with rich of metadata about the context of each citation in the document.
For example, suppose
'Sick:2014'
is cited twice within a document. Then the dictionary returned by this method will have a length-2 list under the'Sick:2014'
key. Each item in this list will be a dictionary providing metadata of the context for that citation. Fields of this dictionary are:position
: (int) the cumulative word count at which the citation occurs.wordsbefore
: (unicode) text occuring before the citation.wordsafter
: (unicode) text occuring after the citation.section
: (unicode) name of the section in which the citation occurs.
Parameters: n_words : int
Number of words before and after the citation to extract for context.
Returns: bib_keys : dict
Dictionary, keyed by BibTeX cite key, where entires are lists of instances of citations. See above for the format of the instance metadata.
-
find_input_documents
()¶ Find all tex documents input by this root document.
Returns: paths : list
List of filepaths for input documents. Paths are relative to the document (i.e., as written in the latex document).
-
remove_comments
(recursive=True)¶ Remove latex comments from document (modifies document in place).
Parameters: recursive : bool
Remove comments from all input LaTeX documents (default
True
).
-
sections
¶ List with tuples of section names and positions. Positions of section names are measured by cumulative word count.
-
write
(path)¶ Write the document’s text to a
path
on the filesystem.
-