An interview with Signe Oksefjell Ebeling
(XLX: Xiuling Xu; SOE: Signe Oksefjell Ebeling)
XLX: Could you briefly talk about the connection between the University of Oslo with LOB and ICAME in the 1970s?
SOE: This is well before my time.The connection between the University of Oslo with LOB and ICAME is very much down to Stig Johansson.Some of this is discussed in Leech and Johansson's article “The coming of ICAME” (ICAME Journal No.33,2009).As I've understood it,Geoffrey Leech and his team at Lancaster in the UK were building the Lancaster Corpus,as they called it,and they had received some funding,but they encountered tremendous problems not only with funding in the end but also with copyright.So by the mid- to late 1970s,around 1977,I think,Leech had more or less given up.Around this time Stig spent a year at Lancaster and developed a keen interest in corpus linguistics.In 1977,Stig attended a computing for the humanities course in Bergen where he met and befriended Knut Hofland.I think Stig wrote to Leech and suggested that maybe he could help to complete the Lancaster Corpus here in Oslo with support from Knut in Bergen,therefore the name the Lancaster-Oslo/Bergen Corpus.This is also connected to the whole “coming of ICAME”,because since they ran into severe problems with copyright,they thought maybe they could try to impress the publishers with an international organization with this computer archive,and the story goes that ICAME was founded at Stig's kitchen table in 1977.
XLX: The next question is about corpus research here in later periods.The University of Oslo has a long-standing tradition of corpus-based contrastive studies dating back to the early 1990s.How did Prof.Stig Johansson come up with the idea of building a parallel corpus (i.e.English-Norwegian Parallel Corpus) at that time? Because before that time,it seems that he did research on English language only,or in other words monolingual.
SOE: It's a very good question and I'm not sure whether I know why exactly at this time.But Stig,ever since the mid-1970s at least,had been interested in contrastive issues.He'd been interested in error analysis,language learning and teaching.He had a publication entitled Papers in Contrastive Linguistics and Language Testing in 1975.Ideas about the benefits of contrastive analysis,both in an applied and a more theoretical perspective had been with him for a long time.In addition,I think the compilation of a parallel corpus had much to do with technical developments and knowing the right people.I don't know to what extent Stig discussed this with Gale and Church who had an article in 1991 about alignment.It was possible that they met people at conferences and discussed these matters,so the necessary technology was sort of starting to be available.Again,Stig approached his friend and colleague Knut Hofland who had the know-how to deal with alignment and other technical matters.Other people who had an impact were Bengt Altenberg and Karin Aijmer in particular.So,in that sense,I think,for Stig,the time felt right to do this kind of thing.Yet another thing that might come into it is that he ran a project here at the University that was called English in Norway ,with a focus on Anglicisms and English influence on the Norwegian language,so a corpus like this could also be useful for research in this field.
XLX: One more question I want to ask is the relationship between the ENPC and Mona Baker's Translational English Corpus.In his article “Contrastive linguistics in a new key”,Ebeling (2016: 9) mentioned that in July 1993,Mona Baker wrote to Stig Johansson saying that John Sinclair had shown her a copy of Stig's proposal for an English-Norwegian corpus,and that she hoped to be setting up her own corpus of translated texts soon at The University of Manchester Institute of Science and Technology (UMIST).Do you think Baker's TEC was to some extent inspired by the ENPC?
“知识改变命运”,这是现代人公认的真理。可惜的是,在懵懂的转型时期,经受着阵痛的人们对此并没有清晰的认知。在此情况下,他们既无法与伴侣进行良好的沟通,更难以摆脱贫困潦倒的生存现实。因为不明就里,即便有偶而的挣脱,也显得苍白无力。
SOE: I'm not sure actually.I don't know how far in her thinking she had got before she got in touch with Stig.I think perhaps that those two initiatives were two parallel things going on at the same time.It's very often the case.So to say that the TEC was definitely inspired by the ENPC,I'm not sure.
XLX: Yeah,actually I also came across this problem when I built the English-Chinese parallel corpus,because few journal articles are translated from English into Chinese nowadays.Many Chinese can read English articles and they can even publish in English,so they don't need the translation.
XLX: I think both of them are quite pioneering.One is a bidirectional parallel corpus and the other a monolingual translational corpus.
SOE: Yeah,and they are based on more or less similar thoughts coming up at the same time in a way.
XLX: You were a core member of the ENPC team.Could you tell me how you got involved in the project? Was that your first contact with corpus linguistics?
SOE: Let me answer the second question first.This was not my first contact with corpus linguistics.I came as a student to Oslo in the early 1990s,and already in my undergraduate days,we were introduced to corpora,and we knew there was an allocated room with one computer in the middle of the room where you could search the LOB and Brown corpora.So I used LOB,Brown and the Kolhapur Corpus for my master's thesis in 1994.And Stig Johansson was my MA supervisor.
And how did I get involved in the ENPC project? Well,I was done with my master's studies,and the research assistant job on the project became vacant.I applied for it,and got it.
安装水表时,从安全性、可靠性、便利性入手,技术要点如下:①将水表安装在直管段,确保满管运行;②出水口一侧的直管段,长度要控制在口径的5倍以内;进水口一侧的直管段,长度控制在口径的10倍以上。③安装水表前,通过清洗管道,确保管道内侧没有杂质,避免管道堵塞。④如果水表安装在室外,应该增设保护盒,不能安装在阳光直射部位,且冬季做好防冻措施。⑤安装水表时,将倾斜度控制在1.5°以内,而且字面朝上,箭头和流水方向一致。⑥水表的规格不同,安装方式也不同,严格按照规范标准执行。
XLX: Parallel corpora were rather new in the early 1990s.Were there any challenges that the team encountered in the compilation of the ENPC?
SOE: You said parallel corpora were rather new at that time.In fact,the ENPC is what we count as the first parallel corpus of its kind.We encountered a lot of challenges in compiling the ENPC.One sad thing is that some of the challenges are still with us today.In particular,the challenge that we heard about for the LOB Corpus,namely copyright,is not getting better,particularly for English texts.It's fair to say that it's easier in Norway these days to get copyright clearance,but for a parallel corpus you need copyright clearance for more than one language,often in countries with different copyright laws.From the very start we had the challenge of(sentence) alignment,for instance.That has now been reasonably well solved.Technical matters were solved along the way and with technological developments all of that has become easier with Unicode for instance.We also had the challenge of getting funding.The ENPC wasn't all that well funded; in fact,I don't think we got much funding beyond the Department and Faculty.More substantial funding only came a bit later with the Nordic Project.
XLX: How about the challenge of the selection of materials? There were more texts translated from English into Norwegian than the other way around.
传统数理统计的内容包括哪些呢?虽然不同的教科书在编排上有所不同,但大同小异,主要包括:样本及抽样分布(随机样本、直方图、统计量、抽样分布)、参数估计(参数的点估计、估计量的评选标准、参数的区间估计、正态总体均值与方差的区间估计、置信区间)、假设检验(正态总体均值与方差的假设检验、分布拟合检验)、方差分析和回归分析(单因素试验的方差分析、一元回归分析).
SOE: In the selection of material to be included you obviously need publications that have been translated between the two languages.We also had to leave out a couple of publications,because they didn't have any kind of punctuation for instance,and it would be difficult to tackle them with sentence alignment.The corpus comprises both a fictional part and nonfictional part,but the non-fictional part is very fragmented,due to the fact that few non-fiction texts are translated from Norwegian into English.Because of that,the selection of non-fiction texts is not very robust.I remember Bengt Altenberg and myself sat down with the non-fiction texts to try to categorize them according to the Dewey Decimal Classification system and we only got one text per slot,really.There was nothing much we could do about that; this has been one of the criticisms of the ENPC,the fact that the non-fiction part is very heterogeneous.
XLX: It is still a big problem for parallel corpora.
SOE: Yes,and maybe more and more so.I know that there is an ongoing project in Sweden.Magnus Levin and his colleagues at Linnaeus University gave a talk at ICAME39 Conference on a translation corpus of only non-fiction texts,English-Swedish-German.They have encountered similar problems,but they keep adding texts,so hopefully this will become a really good resource for translation studies and contrastive analysis.
淋巴癌属于恶性肿瘤的一种,临床临床治疗方案多选择化疗治疗,化疗药物中美罗华应用比较广泛,可是用药后,患者临床不良反应较多,患者不易耐受,从而不依从治疗而影响了治疗效果[1]。本次研究中,选择我院收治的淋巴癌患者48为研究对象分组配合不同护理措施,对比如下。
XLX: When was that?
XLX: What about the proportion of fiction and non-fiction texts? How did you make the decision?
SOE: I think the idea was 50-50,but we ended up with 60% fiction and 40%non-fiction.There are 30 original fiction texts and 20 non-fiction texts in each language.The fiction part was easier to compile and I think,had it not been for copyright,it would be easier to do the same today with fiction,because Norwegian fiction has gained ground over the last few years and it's translated more and more.
XLX: What are the strengths and limitations of the ENPC model?
British Academic Written English (BAWE)
XLX: So that was the idea from the beginning.
XLX: The design and compilation of the ENPC were carried out in close cooperation with sister projects in Sweden and Finland.Could you tell me more about those collaborations?
SOE: I think,first of all,that Stig saw the opportunity not only for funding but also for a larger project that would gain interest across a number of languages.I don't know if he had typological studies in mind,but that may have come into it as well.He had good friends in academia,particularly in Sweden,and also Kari Sajavaara in Finland.They were really interested in this project,so collaboration was easy in the sense that these people knew each other and had similar interests.The project resulted in the English-Swedish Parallel Corpus and the ENPC: both have been used in a number of contrastive studies.The Finnish Corpus is unidirectional (En-Fi) and to my knowledge it has not been widely used for research.
XLX: Yeah,and these corpora used many identical English original texts.
SOE: Yes,this was the idea—to have a pool of English texts that all three corpora could use.
1)64卦序数分布,呈多回字型对称规律(图7),解决了《易经》64卦序分布无数量逻辑对称规律的缺陷,在数理逻辑上整个系统对称平衡,每一卦序的位置、大小具有稳定性和唯一性。
SOE: It's bidirectional,which means it's aligned to and from both languages.So that's a strength.This means that we believe that we have a strong tertium comparationis ; we take translations to indicate that meaning and function are reproduced in the translation and we can check for that due to the bidirectional model.And it makes it possible to discover sets of crosslinguistic correspondences in material like this,compared to comparable corpora where you can't really be sure that what you identify in the two languages really match.So it goes back to the tertium comparationis again.With the translation paradigms going in both directions,this is what we get.And again,the fact that the corpus is bidirectional makes it possible to check for what has been called translationese or translation effects.For example,if you suspect that something is unidiomatic English in the translated text,you can check that against original English texts,and at least be aware of the fact that this may have to do with characteristics of translation rather than the nature of English in general.So these are the strengths.The limitations we've touched on,such as those that have to do with the selection of texts that you actually have,because not all kinds of texts are translated.And also the balance of the text types between the two languages is hard to maintain in a corpus like this,and probably even worse,for example,between Chinese and English or between Chinese and other Germanic languages.So those are clearly limitations.But I still think the strengths outweigh the limitations.
SOE: Yeah,once the Nordic Project got started.But I think the ENPC was a bit ahead of the others,so they followed suit; even so there is not a full overlap of English texts.The ESPC,for example,contains some English original texts that we don't have and the other way around.
3.6 加强高危时段的监测 表2显示,实验组与对照组患者发生非计划性拔管的时段多为午夜或凌晨时分这亦证实了范河谷等[4]提出的易拔管的高危时段(23:00~2:00,6:00 ~8:00),出现意识状态改变的患者清醒期与意识模糊期交替出现,昼轻夜重。其原因在于夜间迷走神经兴奋,心率、呼吸频率降低,肺泡通气不足,CO2潴留,SpO2较清醒时低,易出现烦躁、幻觉等精神障碍[11]。另外对照组1例发生于14:00~16:00,在患者全身麻醉手术后5 h内,说明全身麻醉术后5 h内也是非计划性拔管的高发时段。应加强患者意识状态和SpO2的评估,尽早发现患者的意识变化并加强护理措施。
XLX: How was the ENPC extended to Oslo Multilingual Corpus?
SOE: This was towards the end of the 1990s.Particularly colleagues from the German Department (we used to be separate departments back then) got interested in what we'd done with the English-Norwegian Parallel Corpus.In particular Cathrine Fabricius-Hansen,professor of German,teamed up with Stig to include German.First of all,they wanted to do a trilingual multidirectional corpus with English,Norwegian and German based on the ENPC,but in the meantime we had also,for smaller projects,collected translational texts of English and Dutch and English and Portuguese.So some languages had already been added,but the OMC only materialized when they got funding for the project Languages in Contrast (SpråkiKontrast,SPRIK).So,you could say that the ENPC generated interests from professors of other languages mainly German,but also from French (with Hans Petter Helland and Marianne Hobæk Haff,both professors of French at Oslo); one of the sub-corpora in the OMS is the French-Norwegian Parallel Corpus (FNPC).There was also some collaboration with the French department at the University of Bergen,and they contributed some of the French-Norwegian texts to the FNPC.
XLX: In addition to parallel corpora,the corpus research team at the University of Oslo have also got involved in several learner corpus projects,for example the International Corpus of Learner English (ICLE),the Varieties of English for Specific Purposes dAtabase (VESPA) and the Idiomaticity Project.Could you say a few words about them?
SOE: I think in a way that goes back to what I just said.Yes,corpora are very good.They give you empirical objective data but remember not to lose track of what we're really using them for,which is linguistic research.First of all,know your corpus.And know your corpus tool and what it can do for you or what you can use it for in order to find out something about language.So that's probably my main piece of advice.And obviously go beyond numbercrunching.Also,in recent years,the focus is more and more on statistics in corpus research.Young scholars should propably take this into account more than I have done so far,and maybe from an early stage find a collaborator who knows statistics.Especially for corpus-based contrastive linguistics,I always advise my students to try to use already-existing resources,because,as you know,it takes a lot of time to build your own corpus and also to use standard,existing tools.This not only goes for contrastive studies of course.But the thing about contrastive linguistics is that we need to take into account more than one language,which is twice the amount of work sometimes.And very often one language is better described than the other,typically English,so how can you give insights into both languages and not only one? So it is important to keep that in mind when you carry out contrastive research.And I think the method in contrastive linguistics is essential,because some people who are doing contrastive research are perhaps doing research that is closer to translation studies than contrastive linguistics.It's a fine line,but to be aware of these slight differences is important.
SOE: Exactly,which means that it will be a challenge to compile parallel corpora also in the future.
SOE: In 2008 or 2009.With the increased awareness and interest in genre/register differences in language,we wanted to set up a corpus of L2 English student writing in different university disciplines to match the British Academic Written English (BAWE) Corpus which I had previously worked on in the UK.So my experience from working on that corpus project came in handy when we were setting up the VESPA project,although BAWE is not really a resource for learner language research.It's about university disciplines and university disciplinary genres,that is novice writers in the UK,mainly L1 English speakers,but not only.The criterion for a paper to be included in the BAWE was that it had received a good mark.
XLX: So it's a different research focus.That was student writing,whether L1 or L2.
为充分发挥司法行政职能作用,积极为民营企业发展提供坚实法治保障、营造良好法治环境、提供优质法律服务,司法部最近印发《关于充分发挥职能作用 为民营企业发展营造良好法治环境的意见》,围绕减轻民营企业负担、解决民营企业发展难题、营造公平竞争环境、完善政策执行方式、保护企业合法权益等方面,提出了20条意见。司法部还印发通知,要求在今年年底前组织律师为重点民营企业开展一次全面“法治体检”。
SOE: Exactly.And the overarching project had to do with university genres or disciplinary genres.So,the VESPA Corpus,with its L2 writing in the disciplines can easily be compared with data from the BAWE.And the Idiomaticity Project that you mentioned really links up with all of this,both contrastive and Learner Corpus Research.The Integrated Contrastive Model lies at the core of that.In the Idiomaticity Project,then,we typically use the corpora that we have built ourselves (ENPC,ICLE,VESPA) and also draw on some others for comparison.
XLX: In terms of the recent developments in corpus-based contrastive linguistics,could you talk about the new International Comparable Corpus,i.e.the newly launched international collaborative project you are involved in?
SOE: The International Comparable Corpus or ICC[ik] or ICC[,a si ‘si].We are still debating how we are going to pronounce that.This project was initiated by Anna Čermáková and John Kirk last year (2017),and they invited colleagues and people they know in the corpus linguistics world to take part in this project,where each national team is supposed to collect their national component for the comparable corpus.The idea is that we should reuse as much material as we can from other corpora to facilitate the whole compilation process.So myself and Jarle Ebeling are in charge of the Norwegian part of the ICC,trying to collect or put together a Norwegian component of this particular corpus.The other collaborators so far are people based in the Czech Republic,Slovakia,Poland,Finland,Sweden,Great Britain,Germany.I think nine languages all together so far,including French.And the idea also is that the whole design of the ICC is supposed to follow the ICE (i.e.International Corpus of English).In terms of design it should contain 60% spoken language and 40% written language representing different text types.We had a kick-off meeting in Prague last year where we discussed how to go about the whole thing,and it turns out that it's hard to get hold of all these “old” texts and incorporate them into a new corpus.It has to do with copyright again,and it also has to do with suitability and comparability of what has already been collected.For Norwegian for instance we have very little spoken material that has already been collected that is suitable for this corpus.The plan is to have collected the written part by mid-2019,as this part turns out to be easier to compile.We'll have a poster presentation at a conference in Louvain in September this year (2018) and also have a workshop there to discuss matters to do with the compilation.I think it's a great initiative,but there are challenges.And also the corpus will only contain 1 million words,which may turn out to be very small for contrastive comparisons in some cases.So we'll see what can come out of it,but I think it's worth a try.
从晚唐五代衰世到南宋名“中兴”实苟安时代,士人人格精神无疑既有延续,也有位移。那么,对晚唐诗词持有矛盾价值观的陆游,比起其前辈如温庭筠,在诗学精神和士人精神上其天壤之别处究竟在哪里?近世文化下士人人格有何内在变迁——即世俗化转向的儒家知识化运动中有无个体伦理意识的觉醒和自觉的可能?陆游在词中曾道:“尽今生、拼了为伊,任人道错”(《解连环》),是指向“道”境的家国之慨抑或是个体之情思(情绪),更能有效确证着生命的实存与存在?又该如何、又可如何在已有的诸如“感性与理性分裂说”“公私二重人格说”的解释之外理解?这是本文经过如此一番“探析”后依然留下的诸多困惑和不解,谨俟高明博雅有以教之!
XLX: Yeah,it'll facilitate contrastive studies between different languages.
SOE: And particularly the fact that you'll get a comparable corpus of spoken language.This will be the main contribution of ICC,I think.
XLX: How do you view your own role in corpus research at the University of Oslo and your contribution to corpus linguistics beyond UiO and Norway?
SOE: Well,as for my own role at this university,I'd say I teach,facilitate and do research.I'm involved in introducing students to corpora from a very early stage,so even in their first year,the students can be introduced to corpus techniques.The courses I'll be teaching this term to second-year students and master's students are about corpus linguistics,to give them the knowhow to carry out corpus research but I also remind them that at the core of everything is language.We're interested in language,we're not necessarily interested in the corpus as such,because without ourselves and without linguistic knowledge,corpora are “nothing”; they are just texts.In terms of my contribution beyond the University and Norway,I'm co-editor,with Hilde Hasselgård,of the international journal for contrastive linguistics Languages in Contrast .I'm also a member of the ICAME board,and I collaborate with quite a few people,not only on the ICC.There's a great network of corpus linguists out there!
XLX: The last question is: do you have some advice to young scholars who wish to do corpus research,corpus-based contrastive linguistics in particular?
SOE: Stig got involved in Learner Corpus Research in the late 1990s.He thought the idea put forward by Sylviane Granger with the Integrated Contrastive Model really was a good idea,and he wanted to take part in the ICLE initiative to build learner corpora containing texts produced by learners of English with different L1 backgrounds.So he and a student at that time called Lynell Chvala collected the Norwegian ICLE (i.e.NICLE).And later on,Hilde Hasselgård and myself joined the team in Louvain to build the Norwegian VESPA.
参试品种:青薯9号、YS304、费乌瑞它、YS902、兴佳2号、中薯26号、天薯11号、黑美人、宣薯6号等9个,对照品种:安薯 1号(CK)
XLX: Very good suggestions.Do you have some advice on tertium comparationis ?
SOE: I think people should actually pay more attention to the tertium comparationis than is typically done.Because I've done a few contrastive studies based on comparable data,and it is harder to argue that I'm comparing like with like than when you use a bidirectional translation corpus.So I think this is essential and that's also part of the method.But I know there are quite a few people who criticise the use of translation for contrastive studies as translation is seen as the “third code”,that is,e.g.translated English is seen as being fundamentally different from the language originally produced in English.So very often if you use a corpus like the ENPC you have to argue that this is a good thing.I think precisely the presence of a stronger tertium comparationis is a good argument,although people aren't always convinced.So it's a matter of showing that translation can be used in this way,in a systematic and sound way.
XLX: Thank you very much for your time.
Corpora
近两年,中国石化黑龙江石油把全面可持续发展,迈向高质量发展作为第一要务,以打造“一流企业”为愿景,以建设“大一企业”为目标,坚决落实集团公司工作部署,围绕“改革、管理、创新、发展”工作方针,坚持稳中求进工作总基调,统筹拓市场、促改革、抓运行,持续推进新业务转型突破,主营业务扎实开展。1—8月,公司实现经营总量72.5万吨,同比增幅17%。零售总量同比增加9.2万吨,同比增幅24.3%。非油品交易额1.28亿元,同比增幅13%。零售总量增幅排名系统第一,市场份额达到17%,荣获销售企业月度红旗四面。
https://www.coventry.ac.uk/research/research-directories/current-projects/2015/britishacademic-written-english-corpus-bawe/
English-Norwegian Parallel Corpus (ENPC)
https://www.hf.uio.no/ilos/english/services/knowledge-resources/omc/enpc/
采用一期手术清创+VSD持续负压吸引技术+二期植皮加压+VSD持续负压吸引技术治疗糖尿病足临床研究疗效确切,效果良好,一期术后创面生长肉芽活跃,创面肉芽鲜红,分泌物少,二期创面植皮术后皮片容易存活,愈合快,能明显缩短住院时间,且手术简单,值得推广。
English-Swedish Parallel Corpus (ESPC)
https://www.sol.lu.se/engelska/corpus/corpus/espc.html
International Corpus of Learner English (ICLE)
https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html
Lancaster-Oslo/Bergen Corpus (LOB)
http://clu.uni.no/icame/manuals/LOB/INDEX.HTM
Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus
2003年,习近平总书记时任浙江省委书记,第一次系统提出进一步发挥“八个方面的优势”、推进“八个方面的举措”,为浙江量身打造了引领发展的“八八战略”。“八八战略”其中一条,就是要求浙江发挥生态优势,创建生态省,打造“绿色浙江”。建设良好的生态环境,是最普惠的民生福祉。
病人的婆婆来自古老传统的满族家庭,自幼收入丰厚,衣食无忧,喜欢享乐并已习惯享乐。她始终认为她必须出席众多熟人圈子里的每一场婚礼和葬礼。病人父亲的家庭过着相对舒适惬意的生活,这种生活标准远远超出病人丈夫的负担能力。
https://uclouvain.be/en/research-institutes/ilc/cecl/vespa.html
The Norwegian component of VESPA
https://www.hf.uio.no/ilos/english/services/knowledge-resources/vespa/
References
Ebeling,J.2016.Contrastive linguistics in a new key[J].Nordic Journal of English Studies 15(3):7-14.
Gale,W.& K.Church.1991.A program for aligning sentences in bilingual corpora[A].In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL),Berkeley[C].177-184.
Johansson,S.1975.Papers in Contrastive Linguistics and Language Testing [M].Lund:CWK Gleerup.
Leech,G.& S.Johansson.2009.The coming of ICAME[J].ICAME Journal 33: 5-20.
Levin,M.,J.Herold & J.Tyrkkö.From the BBC to the PFC and CAPTCHA.Acronym typology from a cross-linguistic perspective[A].In Proceedings of the ICAME 39 Conference [C].Tampere: University of Tampere.108-109.
University of Oslo,Norway
10:30-11:30,10 August,2018
标签:An论文; INTERVIEW论文; SOE论文; Signe论文; Oksefjell论文; Ebeling论文;