Provided by: libmsoffice-word-template-perl_2.05-3_all 

NAME
MsOffice::Word::Template - generate Microsoft Word documents from Word templates
SYNOPSIS
my $template = MsOffice::Word::Template->new($filename);
my $new_doc = $template->process(\%data);
$new_doc->save_as($path_for_new_doc);
DESCRIPTION
Purpose
This module treats a Microsoft Word document as a template for generating other documents. The idea is
similar to the "mail merge" functionality in Word, but with much richer possibilities. The whole power of
a Perl templating engine can be exploited, for example for
• dealing with complex, nested datastructures
• using control directives for loops, conditionals, subroutines, etc.
• defining custom data processing functions or macros
Template authors just use basic highlighing in MsWord to mark the templating directives :
• fragments highlighted in yelllow are interpreted as data directives, i.e. the template result will be
inserted at that point in the document, keeping the current formatting properties (bold, italic,
font, etc.).
• fragments highlighted in green are interpreted as control directives that do not directly generate
content, like loops, conditionals, etc. Paragraphs or table rows around such directives are
dismissed, in order to avoid empty paragraphs or empty rows in the resulting document.
The syntax of data and control directives depends on the backend templating engine. The default engine
is the Perl Template Toolkit; other engines can be specified as subclasses -- see the "TEMPLATE ENGINE"
section below.
Status
This distribution is a major refactoring of the first version, together with a refactoring of
MsOffice::Word::Surgeon. New features include support for headers and footers, for metadata and for image
insertion. The internal object-oriented structure has been redesigned.
This module has been used successfully for a pilot project in my organization, generating quite complex
documents from deeply nested datastructures. However it has not been used yet at large scale in
production, so it is quite likely that some youth defects may still be discovered. If you use this
module, please keep me informed of your difficulties, tricks, suggestions, etc.
METHODS
new
my $template = MsOffice::Word::Template->new($docx);
# or : my $template = MsOffice::Word::Template->new($surgeon); # an instance of MsOffice::Word::Surgeon
# or : my $template = MsOffice::Word::Template->new(docx => $docx, %options);
In its simplest form, the constructor takes a single argument which is either a string (path to a docx
document), or an instance of MsOffice::Word::Surgeon. Otherwise the constructor takes a list of named
parameters, which can be
docx
path to a MsWord document in docx format. This will automatically create an instance of
MsOffice::Word::Surgeon and pass it to the constructor through the "surgeon" keyword.
surgeon
an instance of MsOffice::Word::Surgeon. This is a mandatory parameter, either directly through the
"surgeon" keyword, or indirectly through the "docx" keyword.
data_color
the Word highlight color for marking data directives (default : yellow)
control_color
the Word highlight color for marking control directives (default : green). Such directives should
produce no content. They are treated outside of the regular text flow.
part_names
an arrayref to the list of package parts to be processed as templates within the ".docx" ZIP archive.
The default list is the main document ("document.xml"), together with all headers and footers found
in the ZIP archive.
property_files
an arrayref to the list of property files (i.e. metadata) to be processed as templates within the
".docx" ZIP archive. For historical reasons, MsWord has three different XML files for storing
document properties : "core.xml", "app.xml" and "custom.xml" : the default list contains those three
files. Supply an empty list if you don't want any document property to be processed.
In addition to the attributes above, other attributes can be passed to the constructor for specifying a
templating engine different from the default Perl Template Toolkit. These are described in section
"TEMPLATE ENGINE" below.
process
my $new_doc = $template->process(\%data);
$new_doc->save_as($path_for_new_doc);
Processes the template on a given data tree, and returns a new document (actually, a new instance of
MsOffice::Word::Surgeon). That document can then be saved using "save_as" in MsOffice::Word::Surgeon.
AUTHORING TEMPLATES
Textual content
A template is just a regular Word document, in which the highlighted fragments represent templating
directives.
The data directives, i.e. the "holes" to be filled must be highlighted in yellow. Such zones must contain
the names of variables to fill the holes. If the template engine supports it, names of variables can be
paths into a complex datastructure, with dots separating the levels, like "foo.3.bar.-1" -- see "GET" in
Template::Manual::Directive and Template::Manual::Variables if you are using the Template Toolkit.
Control directives like "IF", "FOREACH", etc. must be highlighted in green. When seeing a green zone, the
system will remove XML markup for the surrounding text and run nodes. If the directive is the only
content of the paragraph, then the paragraph node is also removed. If this occurs within the first cell
of a table row, the markup for that row is also removed. This mechanism ensures that the final result
will not contain empty paragraphs or empty rows at places corresponding to control directives.
In consequence of this distinction between yellow and green highlights, templating zones cannot mix data
directives with control directives : a data directive within a green zone would generate output outside
of the regular XML flow (paragraph nodes, run nodes and text nodes), and therefore MsWord would generate
an error when trying to open such content. There is a workaround, however : data directives within a
green zone will work if they also generate the appropriate markup for paragraph nodes, run nodes and text
nodes.
To highlight using LibreOffice, set the Character Highlighting to Export As "Highlighting" instead of the
default "Shading". See https://help.libreoffice.org/7.5/en-US/text/shared/optionen/01130200.html.
See also MsOffice::Word::Template::Engine::TT2 for additional advice on authoring templates based on the
Template Toolkit.
Images
Insertion of generated images such as barcodes is done in two steps:
• the template must contain a placeholder image : this is an arbitrary image, positioned within the
document through usual MsWord commands, including alignment instructions, border, etc. That image
must be given an alternative text -- see
https://support.microsoft.com/en-us/office/add-alternative-text-to-a-shape-picture-chart-smartart-graphic-or-other-object-44989b2a-903c-4d9a-b742-6a75b451c669).
That text will be used as a unique identifier for the image.
• somewhere in the document (it doesn't matter where), a directive must replace the placeholder image
by a generated image. For example for a barcode, the TT2 directive looks like :
[[ PROCESS barcode type="QRCode" img="my_image_name" content="some value for the QR code" ]]
See "barcodes" in MsOffice::Word::Template::Engine::TT2 for details. The source code can be used as
an example of how to implement other image generating blocks.
Metadata (also known as "document properties" in MsWord parlance)
MsWord documents store metadata, also called "document properties". Each property has a name and a value.
A number of property names are builtin, like 'author' or 'description'; other custom properties can be
defined. Properties are edited from the MsWord "Backstage view" (the screen displayed after a click on
the File tab).
For feeding values into document properties, just use the regular syntax of the templating engine. For
example with the default Template Toolkit engine, directives are enclosed in '[% ' and ' %]'; so you can
write
[% path.to.subject.data %]
within the 'subject' property of the MsWord template, and the resulting document will have its subject
filled with the given data path.
Obviously, the reason for this different mechanism is that MsWord has no support for highlighting
contents in property values.
Unfortunately, this mechanism only works for document properties of type 'string'. MsWord would not
allow specific templating syntax within fields of type boolean, number or date.
TEMPLATE ENGINE
This module invokes a backend templating engine for interpreting the template directives. The default
engine is MsOffice::Word::Template::Engine::TT2, built on top of Template Toolkit. Another engine
supplied in this distribution is MsOffice::Word::Template::Engine::Mustache, mostly as an example. To
implement another engine, just subclass MsOffice::Word::Template::Engine.
To use an engine different from the default, the following arguments must be supplied to the "new" method
:
engine_class
The name of the engine class. If the class sits within the MsOffice::Word::Template::Engine
namespace, just the suffix is sufficient; otherwise, specify the fully qualified class name.
engine_args
An optional list of parameters that may be used for initializing the engine
After initialization the engine will receive a "compile_template" method call for each part in the
".docx" package. The default parts to be handled are the main document body ("document.xml"), and all
headers and footers. A different list of package parts can be supplied through the "part_names" argument
to the constructor.
In addition to the package parts, templates are also compiled for the property files that contain
metadata such as author name, subject, description, etc. The list of files can be controlled through the
"property_files" argument to the constructor.
When processing templates, the engine must make sure that ampersand characters and angle brackets are
automatically replaced by the corresponding HTML entities (otherwise the resulting XML would be incorrect
and could not be opened by Microsoft Word). The Mustache engine does this automatically. The Template
Toolkit engine would normally require to explicitly add an "html" filter at each directive :
[% foo.bar | html %]
but thanks to the Template::AutoFilter module, this is performed automatically.
TROUBLESHOOTING
If a document generated by this module cannot open in Word, it is probably because the XML generated by
your template is not equilibrated and therefore not valid. For example a template like this :
This paragraph [[ IF condition ]]
may have problems
[[END]]
is likely to generate incorrect XML, because the IF statement starts in the middle of a paragraph and
closes at a different paragraph -- therefore when the condition evaluates to false, the XML tag for
closing the initial paragraph will be missing.
Compound directives like IF .. END, FOREACH .. END, TRY .. CATCH .. END should therefore be
equilibrated, either all within the same paragraph, or each directive on a separate paragraph. Examples
like this should be successful :
This paragraph [[ IF condition ]]has an optional part[[ ELSE ]]or an alternative[[ END ]].
[[ SWITCH result ]]
[[ CASE 123 ]]
Not a big deal.
[[ CASE 789 ]]
You won the lottery.
[[ END ]]
AUTHOR
Laurent Dami, <dami AT cpan DOT org<gt>
COPYRIGHT AND LICENSE
Copyright 2020-2024 by Laurent Dami.
This program is free software, you can redistribute it and/or modify it under the terms of the Artistic
License version 2.0.
POD ERRORS
Hey! The above document had some coding errors, which are explained below:
Around line 304:
alternative text 'https://help.libreoffice.org/7.5/en-US/text/shared/optionen/01130200.html' contains
non-escaped | or /
Around line 320:
alternative text
'https://support.microsoft.com/en-us/office/add-alternative-text-to-a-shape-picture-chart-smartart-graphic-or-other-object-44989b2a-903c-4d9a-b742-6a75b451c669'
contains non-escaped | or /
perl v5.40.0 2024-10-31 MsOffice::Word::Template(3pm)