Michael Peter Christen
9dfc9c95d8
updated slf4j and log4j
12 years ago
Michael Peter Christen
95712fdc8b
update to pdf parser
12 years ago
Michael Peter Christen
a1a4d9aa94
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java
12 years ago
Michael Peter Christen
e2c4c3c7d3
migration to solr 4.0.0
12 years ago
Michael Peter Christen
69aa39d664
update to libraries required by solr 4.0.0
12 years ago
sixcooler
9d062873d2
bump to httpclient-4.2.2
12 years ago
sof
5cb244b79b
Merge remote branch 'origin/master'
12 years ago
apfelmaennchen
88b062210c
Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
...
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
12 years ago
sixcooler
9aa21506be
bump to httpcore-4.2.2 (maintenance release)
12 years ago
Michael Peter Christen
d0015df61c
added lucene memory library which is now necessary as solr has to
...
process more complex queries
12 years ago
Michael Peter Christen
e65cecc419
- updated lucene libraries to 3.6.1
...
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
12 years ago
Michael Peter Christen
ff3eaa21b0
added remote search to solr on YaCy peers!
...
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr
12 years ago
Michael Peter Christen
d39463a85c
added deleteByQuery to solr connectors
12 years ago
Michael Peter Christen
2ccf1dba71
upgrade to solr 3.6.1
12 years ago
Michael Peter Christen
ea49a8aa8c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
d988ba50cf
added a very rudimentary, incomplete, non-verified GSA response writer
...
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10
12 years ago
cominch
e74d66e28c
augmented browsing: remove htmlparser library
12 years ago
cominch
e2119f4e76
augmented browsing: replace htmlparser by jsoup, which is more stable
...
and reliable
12 years ago
Michael Peter Christen
bf4968d748
source change in classpath
13 years ago
sixcooler
a99ef68422
bump to httpclient-4.2.1
13 years ago
Michael Peter Christen
7b53be141f
upgraded to pdfbox 1.7.0
...
changes in http://www.apache.org/dist/pdfbox/1.7.0/RELEASE-NOTES.txt
with many bugfixes, including performance related
13 years ago
Michael Peter Christen
fad3b14813
added jetty libraries, needed for future use as web server and as
...
application server for the solr search interface
13 years ago
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
13 years ago
Michael Peter Christen
1be0025a9c
- added test for EmbeddedSolrConnector
...
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
13 years ago
Michael Peter Christen
90b82ce994
using guava for host resolution (non-blocking for ips) and time-out
13 years ago
Michael Peter Christen
3f55dc7c1e
- added solr core and libraries that solr needs (lucene is missing, will
...
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
13 years ago
Michael Peter Christen
5fc6524ca8
- moved triple store to net.yacy.cora.lod (should be generalized there
...
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
13 years ago
cominch
5d20cd324a
Add Triplestore and RDF query interface
...
Conflicts:
build.xml
defaults/yacy.init
source/net/yacy/interaction/AugmentHtmlStream.java
13 years ago
cominch
b21048892b
augmentedParser add features and integrate external html parser to
...
modify existing web pages
Conflicts:
addon/YaCy.app/Contents/Info.plist
build.xml
13 years ago
sixcooler
56087c1f23
bump to httpclient- httpcore-, httpmime- 4.2
13 years ago
Michael Peter Christen
4d3cc02168
replaced old bzip2 library against better documented commons-compress
...
package from http://commons.apache.org/compress/
13 years ago
Michael Peter Christen
1795a7325b
made HandleSet serializable
13 years ago
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
13 years ago
Michael Peter Christen
248299d10f
updated solrj lib
13 years ago
Michael Peter Christen
f838997126
updated commons io from 2.0.1 to 2.1
13 years ago
Michael Peter Christen
eeb57ae824
updated http client libraries
13 years ago
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
13 years ago
Michael Peter Christen
a30b028cc0
updated libraries
13 years ago
Michael Christen
e69afae87e
class path for servlets in eclipse
13 years ago
Al Sutton
8993cac4d8
Initial performance improvements
13 years ago
orbiter
5a7cec59f3
moved ynetSearch to get all files out of htroot/api/util/
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8042 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
65ab067491
migration to solrj 3.4.0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7952 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
52b477cf6f
bump to httpclient-4.1.2, httpcore-4.1.3 - bugfixrelease
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7876 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
48560a44a9
bump to httpcore-4.1.2: a bugfixrelease
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7853 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
c0d9474b31
update to eclipse class path environmen
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7834 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
528b59e078
replaced xerces.jar library that was originally added 2005 with SVN 126 to the libx directory and that was moved to lib in SVN 5781
...
the new replacement is taken from http://xerces.apache.org and has the version 2.11.0 and was inside the file Xerces-J-bin.2.11.0.tar.gz
and consists of two files named xercesImpl.jar and xml-apis.jar
The original purpose of that library was to support:
- content parsers
- optional seed uploader
- SOAP API (which will be committed later)
Since the SOAP API does not exist any more the purpose is to support content parser and an optional seed uploader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7819 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
77fe69395d
added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
efcd21e0ed
new httpclient, httcore (bugfixrelease)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7769 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
761b1c71dc
added latest pdfbox
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7761 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
0abd99621c
correct slip of click in classpath from last commit - I wonder there are 7658'is around
...
apflemaenchen, please don't take this amiss
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7659 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago