Empty lines are lost when copypasting text from MS Office into Zammad


#1

Infos:

  • Used Zammad version: 2.9.x
  • Used Zammad installation source: docker-compose
  • Operating system: Debian 9
  • Browser + version: Chrome 72.0.3626.121

Expected behavior:

  • When copypasting the contents of a Microsoft Word document, empty lines should be preserved.

    E.g. this text:

    This is a paragraph with some text.
    And some more text.
    
    This is another paragraph with some text.
    Maybe some more text.
    And some more text.
    

    should look like this after it has been copied from Microsoft Word to Zammad:

    This is a paragraph with some text.
    And some more text.
    
    This is another paragraph with some text.
    Maybe some more text.
    And some more text.
    

Actual behavior:

  • All empty lines between paragraphs are removed.

    E.g. the example text from above now looks like this:

    This is a paragraph with some text.
    And some more text.
    This is another paragraph with some text.
    Maybe some more text.
    And some more text.
    

    Several other web sites/applications I’ve tried are not affected by this:

    • local editor (Notepad on Windows, Pluma on Ubuntu Mate)
    • our own IServ webmailer, both configured for plain text and for formatted text
    • GMail (both configured for plain text and formatted text)
    • community.zammad.org Discourse (only plaintext, no formatted text support AFAIK)
    • Kayako (only plain text because I disabled formatted text in Kayako long ago)

    This leads me to believe that Zammad somehow doesn’t understand the way that Office formats its empty lines.

Steps to reproduce the behavior (with Microsoft Office):

  • Create a new text document in Word, and type e.g. the following:

    Foo
    
    Bar
    
  • Copypaste the whole document into a new Zammad ticket.

  • Note that the empty line between Foo and Bar is gone.

The problem is not reproducible with LibreOffice.

Steps to reproduce the behavior (without Microsoft Office):

As the problem occurred on a colleague’s computer, and I myself am using Ubuntu + LibreOffice, I had a hard time figuring out what exactly was going on here - the problem is not reproducible with LibreOffice. I installed MS Office into a VM and copypasted an example text, and then I used http://www.freeclipboardviewer.com/ to figure out what exactly Office puts into the clipboard.

Apparently the Windows clipboard is able to store the same information in many different formats, amongst them RTF, HTML, plain, Unicode plain, native, and a whole lot of other Windowsy stuff I don’t understand. I immediately suspected the HTML representation as the cause, so I concentrated on that.

Office generates a whopping 38 KB HTML file for a few lines of text; I’ve uploaded it here: https://pastebin.com/VcpQtbwv

Interesting are these few lines at the end:

<body lang=DE style='tab-interval:35.4pt'>
<!--StartFragment-->

<p class=MsoNormal><a name="OLE_LINK1">Zeile 1<o:p></o:p></a></p>

<p class=MsoNormal><span style='mso-bookmark:OLE_LINK1'>Zeile 2<o:p></o:p></span></p>

<p class=MsoNormal><span style='mso-bookmark:OLE_LINK1'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='mso-bookmark:OLE_LINK1'>Absatz</span><o:p></o:p></p>

<!--EndFragment-->
</body>

I’ve reduced the HTML to the absolute minimum that still triggers the issue:

<p>Zeile 1</p>
<p>Zeile 2</p>
<p><o:p>&nbsp;</o:p></p>
<p>Absatz</p>

When you save that as HTML file, open it in the browser and then copypaste the displayed text into Zammad, you should be able to reproduce the issue.

The cause is obviously the <o:p> tag - when you remove that and leave the &nbsp; intact, the problem is no longer reproducible. According to this SO posting, Office inserts its o: tags into the HTML to ensure that no information is lost when you paste the resulting HTML back into Office.

My suspicion is that Zammad filters the incoming HTML for security reasons, and removes the <o:p> tag along with its contents because it’s not standard. If this is the case, would it be possible to adapt the filtering so that it doesn’t break copypasted Office HTML?


#2

Hey,

thanks for your contribution (again :slight_smile: ).
This seems to be a bug, we’re currently not sure if this might be a regression.

Could you please create a issue over at Github and mention Martin (martini)?
He might have information regarding the root cause of this.

We totally agree that it should be possible to simply use “two new lines” as a new paragraph like behaviour.

Bests


#3

Done!